CN113722150B - Cloud hard disk data compression backup and recovery method, device, equipment and storage medium - Google Patents

Cloud hard disk data compression backup and recovery method, device, equipment and storage medium Download PDF

Info

Publication number
CN113722150B
CN113722150B CN202110838010.2A CN202110838010A CN113722150B CN 113722150 B CN113722150 B CN 113722150B CN 202110838010 A CN202110838010 A CN 202110838010A CN 113722150 B CN113722150 B CN 113722150B
Authority
CN
China
Prior art keywords
target
data block
hard disk
data
backup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110838010.2A
Other languages
Chinese (zh)
Other versions
CN113722150A (en
Inventor
海鑫
亓开元
轩艳东
马翱宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202110838010.2A priority Critical patent/CN113722150B/en
Publication of CN113722150A publication Critical patent/CN113722150A/en
Priority to PCT/CN2022/078491 priority patent/WO2023000674A1/en
Application granted granted Critical
Publication of CN113722150B publication Critical patent/CN113722150B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a cloud hard disk data compression backup and recovery method, a device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: splitting a source cloud hard disk to obtain a plurality of initial data blocks, and determining initial offset of each initial data block in the source cloud hard disk; compressing non-zero data blocks in the initial data blocks to obtain compressed data blocks, and calculating the data volume of each compressed data block; generating corresponding backup information by utilizing the initial offset corresponding to the compressed data block and the data volume, and determining a preset sequence corresponding to the backup information; writing compressed data blocks into the backup volume; when a recovery request is detected, carrying out data recovery by utilizing target backup information designated by the recovery request and a corresponding target preset sequence; the method can reduce the occupied amount of the storage space on the premise of ensuring the correct recovery of the data.

Description

Cloud hard disk data compression backup and recovery method, device, equipment and storage medium
Technical Field
The present application relates to the field of cloud platforms, and in particular, to a cloud hard disk data compression backup and recovery method, a cloud hard disk data compression backup and recovery device, an electronic device, and a computer readable storage medium.
Background
Cloud computing platforms, also referred to as cloud platforms, refer to services that provide computing, networking, and storage capabilities based on hardware resources and software resources. The cloud hard disk is a device which can be mounted on a cloud host and used as a physical hard disk. In order to make the data safer and more reliable, the cloud hard disk is generally required to be backed up, and when the cloud hard disk fails or the data in the cloud hard disk has logic errors (such as deleting the data by mistake, encountering hacking or virus hazard, etc.), the backed up data can be utilized to quickly restore the data. When the cloud hard disk is backed up, the data in the source cloud hard disk is generally directly written into the backup volume, and the storage capacity actually occupied by the backup volume is consistent with the storage capacity occupied by the source cloud hard disk, so that a large amount of storage space is occupied by the backup data, and the backup service cost is increased.
Therefore, the related art has the problems of more occupied storage space and higher service cost, and is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
Accordingly, the present application is directed to a cloud hard disk data compression backup and recovery method, a cloud hard disk data compression backup and recovery device, an electronic device, and a computer readable storage medium, which reduce the occupation amount of storage space and reduce the service cost on the premise of ensuring correct recovery of data.
In order to solve the technical problems, the application provides a cloud hard disk data compression backup and recovery method, which comprises the following steps:
splitting a source cloud hard disk to obtain a plurality of initial data blocks, and determining initial offset of each initial data block in the source cloud hard disk;
compressing non-zero data blocks in the initial data blocks to obtain compressed data blocks, and calculating the data volume of each compressed data block;
generating corresponding backup information by utilizing the initial offset corresponding to the compressed data block and the data volume, and determining a preset sequence corresponding to the backup information;
writing the compressed data blocks into a backup volume;
and when the recovery request is detected, carrying out data recovery by utilizing the target backup information designated by the recovery request, the target backup volumes and the corresponding target preset sequences.
Optionally, when the recovery request is detected, performing data recovery by using the target backup information, the target backup volume and the corresponding target preset sequence specified by the recovery request, including
If a recovery request is detected, determining the target backup information and the target backup volume designated by the recovery request; the target backup information comprises a plurality of preset data volumes and a plurality of corresponding preset initial offsets;
Reading corresponding target compressed data blocks from the target backup volumes according to the target preset sequence by utilizing the preset data volumes;
decompressing the target compressed data block to obtain candidate non-zero data blocks;
determining a target non-zero data block in the candidate non-zero data blocks according to the target preset sequence, and determining a target preset initial offset corresponding to the target non-zero data block;
and writing the target non-zero data block into the target cloud hard disk based on the matching condition of the target preset initial offset and the current writing position of the target cloud hard disk.
Optionally, the writing the target non-zero data block to the target cloud hard disk based on the matching situation of the target preset starting offset and the current writing position of the target cloud hard disk includes:
if the target preset initial offset is matched with the current writing position, writing the target non-zero data block into the target cloud hard disk according to the target preset initial offset;
if the target preset initial offset is not matched with the current writing position, writing the target non-zero data block into the target cloud hard disk according to the target preset initial offset, and resetting the data between the target non-zero data block and the current writing position before writing the target non-zero data block into the target cloud hard disk.
Optionally, after determining the target backup information and the target backup volume specified by the restore request, the method further includes:
judging whether the target backup information has a compression identifier or not;
if the compression identifier is included, determining to execute the step of reading the corresponding target compressed data blocks from the target backup volume according to the target preset sequence by utilizing the preset data volumes;
and if the compression identifier is not included, reading the corresponding target data blocks from the target backup volume according to the target preset sequence by utilizing the preset data volumes, and splicing the target data blocks to finish the data recovery.
Optionally, the determining the preset sequence corresponding to the backup information includes:
and sequencing the compressed data blocks according to the magnitude relation of the initial offset corresponding to each compressed data block, and determining the sequence of the compressed data blocks as the preset sequence.
Optionally, the compressing the non-zero data block in the initial data block to obtain a compressed data block includes:
zero data block detection is carried out on each initial data block, and a detection result is obtained;
And determining the initial data block with the detection result being non-zero as a non-zero data block, and compressing to obtain the compressed data block.
Optionally, the detecting the zero data block of each initial data block to obtain a detection result includes:
reading the data content of the initial data block, and comparing the data content with a binary null flag bit;
and if any data content is not the binary null flag bit, determining that the detection result corresponding to the initial data block is non-zero.
Optionally, the splitting the source cloud hard disk to obtain a plurality of initial data blocks includes:
obtaining segmentation granularity; the segmentation granularity can be equally divided into 1GB;
and carrying out average segmentation on the source cloud hard disk according to the segmentation granularity to obtain the initial data block.
Optionally, the generating corresponding backup information by using the initial offset and the data volume corresponding to the compressed data block includes:
forming a key value pair by using the initial offset corresponding to the compressed data block and the data volume;
ordering the key value pairs according to the size sequence of the initial offset to obtain a key value pair sequence;
And marking the key value pair sequence by using the hard disk mark of the source cloud hard disk and the volume mark of the backup volume to obtain the backup information.
Optionally, the generating corresponding backup information by using the initial offset and the data volume corresponding to the compressed data block includes:
generating initial backup information by utilizing the initial offset and the data volume corresponding to the compressed data block;
and identifying the initial backup information by using a compression identifier to obtain the backup information.
The application also provides a cloud hard disk data compression backup and recovery device, which comprises:
the splitting module is used for splitting the source cloud hard disk to obtain a plurality of initial data blocks and determining initial offset of each initial data block in the source cloud hard disk;
the compression module is used for compressing the non-zero data blocks in the initial data blocks to obtain compressed data blocks, and calculating the data volume of each compressed data block;
the information generation module is used for generating corresponding backup information by utilizing the initial offset corresponding to the compressed data block and the data volume, and determining a preset sequence corresponding to the backup information;
A writing module for writing the compressed data blocks into a backup volume;
and the recovery module is used for carrying out data recovery by utilizing the target backup information specified by the recovery request, the target backup volume and the corresponding target preset sequence when the recovery request is detected.
According to the cloud hard disk data compression backup and recovery method provided by the application, a source cloud hard disk is segmented to obtain a plurality of initial data blocks, and initial offset of each initial data block in the source cloud hard disk is determined; compressing non-zero data blocks in the initial data blocks to obtain compressed data blocks, and calculating the data volume of each compressed data block; generating corresponding backup information by utilizing the initial offset corresponding to the compressed data block and the data volume, and determining a preset sequence corresponding to the backup information; writing compressed data blocks into the backup volume; and when the recovery request is detected, carrying out data recovery by utilizing the target backup information designated by the recovery request and the corresponding target preset sequence.
Therefore, when the cloud hard disk backup is carried out, the cloud hard disk backup is segmented, and the non-zero data blocks in the cloud hard disk backup are compressed. A non-zero data block is a data block in which non-zero data is recorded, and unlike a zero data block, its specific content cannot be determined at the time of data recovery, and thus it is necessary to compress and save it in order to perform data recovery based thereon. Since different non-zero data blocks have different volumes after being compressed and are stored consecutively between compressed data blocks, in order to enable accurate recovery, the data volumes of the compressed data blocks are recorded so that the compressed data blocks can be read out correctly. In order to characterize the position of the data block in the source cloud hard disk, the initial offset corresponding to the compressed data block, namely the initial offset corresponding to the non-zero data block and the corresponding data volume are utilized to generate backup information, and the corresponding preset sequence is determined. The preset sequence is used for indicating the sequence of selecting the compressed data blocks when the data is recovered. By writing the compressed data blocks into the backup volume, the compressed backup of the source cloud hard disk can be completed. When the recovery request is detected, it is indicated that recovery is desired to be performed on a certain source cloud hard disk, and at this time, the compressed data blocks can be accurately read out by using the target backup information specified by the recovery request, the target backup volume and the target preset sequence, decompressed and spliced to complete data recovery. By removing all-zero data blocks and compressing and storing non-zero data blocks, corresponding backup information and a preset sequence are generated, the storage space required by compression backup can be greatly reduced, the utilization efficiency of the storage space is improved, and the problems of more occupied storage space and higher service cost in the related technology are solved.
In addition, the application also provides a cloud hard disk data compression backup and recovery device, electronic equipment and a computer readable storage medium, which have the same beneficial effects.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it is apparent that the drawings in the following description are only embodiments of the present application, and other drawings may be obtained according to the provided drawings without inventive effort for those skilled in the art.
FIG. 1 is a flowchart of a method for compressing, backing up and recovering cloud hard disk data according to an embodiment of the present application;
fig. 2 is a specific cloud hard disk backup flowchart provided in an embodiment of the present application;
FIG. 3 is a comparison chart of backup time consumption provided in an embodiment of the present application;
FIG. 4 is a diagram illustrating a backup volume capacity comparison according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a cloud hard disk data compression backup and recovery device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, fig. 1 is a flowchart of a method for compressing, backing up and recovering cloud hard disk data according to an embodiment of the present application. The method comprises the following steps:
s101: and splitting the source cloud hard disk to obtain a plurality of initial data blocks, and determining the initial offset of each initial data block in the source cloud hard disk.
When the source cloud hard disk is backed up, the source cloud hard disk needs to be segmented first, and the segmentation can be average segmentation or non-average segmentation. In order to improve the compression effect, the occupation of the storage space is reduced as much as possible, and the whole source cloud hard disk can be divided evenly. Since the size of the source cloud hard disk is in units of GB (gigabytes), the split granularity should be able to be equally divided by 1GB. When the segmentation is carried out, the segmentation granularity capable of equally dividing 1GB can be obtained, and the source cloud hard disk is evenly segmented according to the segmentation granularity, so that an initial data block is obtained.
After each initial data block is obtained, the initial offset of each initial data block in the source cloud hard disk needs to be recorded, wherein the initial offset refers to the position offset of the first bit data in the initial data block in the source cloud hard disk.
S102: and compressing the non-zero data blocks in the initial data blocks to obtain compressed data blocks, and calculating the data volume of each compressed data block.
Because zero data blocks possibly exist in the initial data blocks obtained after the source cloud hard disk is split, in order to improve the utilization rate of storage space, the method only stores the non-zero data blocks, and for the zero data blocks, no effective data is recorded in the zero data blocks, and the data content of the zero data blocks is uniquely determined, so that the zero data blocks can be backed up without backup.
In order to achieve the above effect, the process of obtaining the compressed data block includes the steps of:
step 11: and detecting zero data blocks of the initial data blocks to obtain detection results.
Step 12: and determining the initial data block with the detection result being non-zero as a non-zero data block, and compressing to obtain the compressed data block.
It will be appreciated that the present application must determine which data blocks are non-zero data blocks and which are zero data blocks before compressing the non-zero data blocks. In one embodiment, the corresponding identity information of the data block may be obtained, specifically, the identity information indicating whether each data block is a zero data block after determining the splitting manner. In another embodiment, zero data block detection may be performed and an initial data block that does not pass zero data block detection is determined to be a non-zero data block. The embodiment is not limited to a specific detection manner of the zero data block, and in one embodiment, the step of performing zero data block detection on each initial data block to obtain a detection result may include:
Step 21: and reading the data content of the initial data block, and comparing the data content with the binary empty flag bit.
Step 22: if any data content is not the binary null flag bit, determining that the detection result corresponding to the initial data block is non-zero.
The binary null flag bit is "\x00", the data content refers to specific content recorded in the initial data block, and whether all the binary null flag bits are null can be determined by comparing the specific content with the binary null flag bit. If any data content is not the binary null flag bit, it is indicated that the initial data block is not all zero, so that it can be determined that the initial data block is a non-zero data block, that is, it is determined that the detection result corresponding to the initial data block is represented as non-zero.
After the non-zero data block is determined, the non-zero data block can be compressed to obtain a compressed data block, and specifically, a compression algorithm such as gzip, zip or snappy can be adopted to compress the non-zero data block to obtain a compressed data block. In addition, since the volumes of the compressed data blocks are generally different, in order to be able to correctly read the compressed data blocks, the data volumes of the compressed data blocks need to be counted in order to generate backup information later for correct data recovery.
S103: and generating corresponding backup information by utilizing the initial offset corresponding to the compressed data block and the data volume, and determining a preset sequence corresponding to the backup information.
After the initial offset and the data volume are obtained, backup information can be generated by using the initial offset and the data volume, and the backup information is stored. In addition, the compressed data block is stored in a backup volume, so that the backup of the source cloud hard disk is realized. It should be noted that, the specific execution order of the step S103 and the step S104 is not limited, for example, the step S103 may be executed first, and then the step S104 may be executed; or step S104 may be performed first and then step S103 may be performed; or step S103 and step S104 may be performed simultaneously.
The embodiment is not limited to a specific form and content of the backup information, and in particular, in an implementation manner, the process of generating the backup information may include the following steps:
step 31: key value pairs are formed using the starting offset and data volume corresponding to the compressed data blocks.
Step 32: and sequencing the key value pairs according to the order of the initial offset to obtain a key value pair sequence.
Step 33: and identifying the key value pair sequence by using a hard disk mark of the source cloud hard disk and a volume mark of the backup volume to obtain backup information.
In this embodiment, the correlation between the initial offset and the data volume may be expressed by using a key value pair, so as to obtain a corresponding key value pair. After the key value pair is obtained, the key value pair sequence can be obtained by sequencing the key value pair according to the initial offset. The sequence of the key value pairs in the key value pair sequence can be used as the aforesaid preset sequence, namely the preset sequence in this embodiment is specifically the initial offset magnitude sequence, and is also the position sequence of the non-zero data blocks in the source cloud hard disk. After the key value pair sequence is obtained, the key value pair sequence can be identified by using the hard disk mark of the source cloud hard disk and the volume mark of the backup volume, so that the corresponding relation among the source cloud hard disk, the backup volume and the backup information is established, and the backup data is obtained. Note that, the embodiment is not limited to the specific form of the hard disk label and the volume label, and may take the form of UUID, for example. UUID is an abbreviation for universally unique identification code (Universally Unique Identifier), a standard for software construction. The purpose is to make all elements in the distributed system have unique identification information without the need of specifying the identification information by a central control terminal.
In another embodiment, the system may use different backup strategies to backup different source cloud hard disks, for example, some source cloud hard disks need to be backed up according to the backup mode, and other source cloud hard disks do not need to be duplicated and backed up directly. Thus, to indicate the manner of backup, the generation process of the backup information may include:
step 41: initial backup information is generated using the initial offset and data volume corresponding to the compressed data blocks.
Step 42: and identifying the initial backup information by using the compression identifier to obtain the backup information.
In this embodiment, initial backup information is directly generated using the starting offset and the data volume. The compressed identifier refers to an identifier capable of indicating a backup mode, and a specific form of the compressed identifier is not limited, for example, a status flag bit express may be set for initial backup information, and if the flag bit is set to true, true is the compressed identifier. And identifying the initial backup information by using the compression identifier to obtain the backup information capable of representing the backup mode.
Because the compressed data block in the application is the result of the non-zero data block compression, the generated backup information is also the content corresponding to the non-zero data block. Therefore, in order to accurately determine where to insert the zero data block in order to obtain a correct and complete source cloud hard disk when performing data recovery, a determination rule needs to be preset, and the determination rule is usually related to the writing position of the hard disk and the offset of the non-zero data block, that is, when the writing position and the offset of the non-zero data block are not matched, it is determined that the zero data block needs to be replenished. And when the non-zero data blocks are written into the cloud hard disk, a specific non-zero data block is required to be selected according to a certain sequence, the corresponding initial offset is matched with the writing position of the hard disk, and only the matching result of the initial offset corresponding to the specific non-zero data block and the writing position can indicate whether the zero data block is required to be inserted. It will be appreciated that this order is necessarily related to the content of the judgment rule, and therefore, after the content of the judgment rule is determined, that is, after the rule for judging whether the initial offset matches the writing position is determined, when data backup is performed, a preset order corresponding to each compressed data block needs to be generated, so that when data recovery is performed, non-zero data blocks are sequentially selected based on the preset order.
The embodiment is not limited to the specific content of the preset sequence, and the preset sequence may be adapted to be changed according to the change of the matching rule. In a specific embodiment, the recovery process of the source cloud hard disk is that the source cloud hard disk is recovered from the head end to the tail end of the data in sequence, in order to reduce the complexity of the matching rule, the source cloud hard disk can be set to judge whether the initial offset is larger than the current writing position of the hard disk and is close to the current writing position, if so, the source cloud hard disk and the data are matched, otherwise, the source cloud hard disk and the data are not matched. In this case, the process of generating the preset sequence may include the steps of:
step 51: and sequencing the compressed data blocks according to the magnitude relation of the initial offset corresponding to each compressed data block, and determining the sequence of the compressed data blocks as a preset sequence.
The method comprises the steps of sorting compressed data blocks according to the size sequence of the initial offset, and determining the sequence of the compressed data blocks as a preset sequence after sorting. According to the sequence, compressed data blocks with the initial offset being larger can be selected successively, namely non-zero data blocks with the initial offset being larger in sequence are obtained, and whether zero data block supplementation is needed or not can be judged based on the matching rule in the writing process.
S104: the compressed data blocks are written to the backup volumes.
The backup volume refers to a backup volume for storing compressed data blocks.
S105: and when the recovery request is detected, carrying out data recovery by utilizing the target backup information designated by the recovery request, the target backup volumes and the corresponding target preset sequences.
The recovery request refers to a request for indicating to recover the data in the specified cloud hard disk, and the specific form and content of the recovery request are not limited. It will be appreciated that it is necessary to determine which cloud hard disk data needs to be restored based on the restore request, and thus it is possible to further determine the data required for data restoration, including target backup information (which may also be referred to as target information), target preset order, and target backup volumes.
The target backup volume refers to a data volume designated by the restore request and storing backup data, in which backup data generated when the source Yun Yingpan (i.e., the backed-up cloud hard disk) is backed up is stored. The target information is backup information of the backup data designated by the restore request. It can be appreciated that, since the data is generally required to be blocked when the source cloud hard disk is backed up, the blocked data is continuously written into the target backup volume. Thus, the target backup information should be able to indicate at least the volume of each data block in the target backup volume in order to be able to read out the exact backup data, while also being able to indicate the location of each backup data in the source cloud hard disk in order to reconstruct the data in the source cloud hard disk correctly.
Specifically, the step S105 may further include:
step 61: if the recovery request is detected, determining target backup information and target backup volumes designated by the recovery request.
Step 62: and reading corresponding target compressed data blocks from the target backup volumes according to the target preset sequence by utilizing each preset data volume.
Step 63: and decompressing the target compressed data block to obtain candidate non-zero data blocks.
Step 64: and determining target non-zero data blocks in the candidate non-zero data blocks according to the target preset sequence, and determining target preset initial offset corresponding to the target non-zero data blocks.
Step 65: and writing the target non-zero data block into the target cloud hard disk based on the matching condition of the target preset initial offset and the current writing position of the target cloud hard disk.
In this embodiment, the target information includes a plurality of preset data volumes and a corresponding plurality of preset start offsets. The preset data volume refers to the data volume of each compressed data block in the target backup volume. The preset initial offset refers to the position of the non-zero data block corresponding to each compressed data block in the target backup volume in the source cloud hard disk. The two are in one-to-one correspondence and correspond to each compressed data block in the target backup volume respectively.
It can be understood that the backup volumes and the backup information obtained after the backup of different source cloud hard disks are different, so that the relationship between the backup and the source cloud hard disks can be indicated, and the corresponding relationship among the source cloud hard disks, the backup volumes and the backup information can be generated and stored. Therefore, in a possible implementation manner, the recovery request may include source cloud hard disk information, and after obtaining the source cloud hard disk information, the corresponding target backup information and the target backup volume are determined by using the correspondence. In another embodiment, if there is no correspondence between the source cloud hard disk, the backup volume, and the backup information, the target backup information and the target backup volume may be directly specified in the restore request.
After the target information is obtained, the target information is analyzed to obtain a preset data volume and a preset initial offset, and in general, the number of the preset data volume and the preset initial offset is the same as or more than one, and of course, the number of the preset data volume and the preset initial offset can be one. In this embodiment, in order to increase the utilization efficiency of the backup storage space as much as possible, each data block obtained by segmentation is compressed when the source cloud hard disk is backed up, so as to obtain a compressed data block. Thus, the predetermined data volume is the volume of the target compressed data block, which refers to the compressed data block stored in the target backup volume.
In this embodiment, the target preset sequence may be a storage sequence of the target compressed data blocks in the target backup volume, and in general, the target preset sequence is also a position sequence of the target compressed data blocks in the source cloud hard disk, that is, a size sequence of the corresponding initial offset. According to the target preset sequence, it is possible to determine which target compressed data block is to be read at a certain stage when data recovery is performed, and further determine which preset data volume is to be read.
After each target compressed data block is accurately read out, decompressing is carried out on each target compressed data block, and corresponding candidate non-zero data blocks are obtained. A zero data block refers to a data block comprising only zero data, and a corresponding non-zero data block refers to a data block comprising non-zero data. The decompression method needs to correspond to the compression method of the target compressed data block, and the specific content of the compression method and the decompression method is not limited in this embodiment, and any reversible compression method and the corresponding decompression method may be selected. The reversibility is that the data content is not changed after being compressed and decompressed.
It will be appreciated that, since only non-zero data blocks are compressed and backed up, zero data may exist between adjacent candidate non-zero data blocks in the source cloud hard disk. Therefore, when data recovery is performed, all candidate non-zero data blocks cannot be directly written into the target cloud hard disk in a neighboring manner, but are required to be written one by one, and therefore the target non-zero data blocks need to be determined in the candidate non-zero data blocks. The target non-zero data block refers to a data block which needs to be written into the target cloud hard disk in the current stage. In this embodiment, the target non-zero data blocks need to be determined according to a preset order. After determining the target non-zero data block, the corresponding preset initial offset is the target preset initial offset, which can represent the data position of the target non-zero data block in the source cloud hard disk.
When writing a target non-zero data block, whether zero data exists between the target non-zero data block and a candidate non-zero data block written in the last time is required to be judged, and whether the zero data is required to be supplemented at the same time is further determined, so that the data of the source cloud hard disk can be accurately recovered. That is, the target non-zero data block needs to be written into the target cloud hard disk based on the matching condition of the target preset initial offset and the current writing position of the target cloud hard disk.
Specifically, based on the matching condition of the target preset initial offset and the current writing position of the target cloud hard disk, the process of writing the target non-zero data block into the target cloud hard disk comprises the following steps:
step 71: and if the target preset initial offset is matched with the current writing position, writing the target non-zero data block into the target cloud hard disk according to the target preset initial offset.
Step 72: if the target preset initial offset is not matched with the current writing position, writing the target non-zero data block into the target cloud hard disk according to the target preset initial offset, and resetting the data between the target non-zero data block and the current writing position before writing the target non-zero data block into the target cloud hard disk.
Specifically, the current writing position of the target cloud hard disk refers to the position designated by the data pointer after the target Yun Yingpan writes data last time, where the position pointed by the data pointer changes along with the writing of the data, and the position pointed by the data pointer always points to the position where the data is written last time. If no data is written in the target cloud hard disk, the data pointer points to the initial offset position of the target cloud hard disk.
If the target preset initial offset is matched with the current writing position, the end-to-end connection of the last written candidate non-zero data block and the target non-zero data block is indicated, the two are in close relation, and blank data do not exist in the middle. In this case, the target non-zero data blocks may be written sequentially directly to the target cloud hard disk. The embodiment is not limited to a specific detection mode of whether the target preset initial offset is matched with the current writing position, for example, whether the current writing position is smaller than the target preset initial offset and is close to the target preset initial offset can be judged, and if so, the matching of the target preset initial offset and the current writing position can be determined. Or whether the current writing position and the target preset initial offset are initial offset positions or not can be judged, namely, the first storage position of the whole cloud hard disk is judged, and if yes, the fact that the current writing position and the target preset initial offset are matched can be confirmed.
If the target preset initial offset is not matched with the current writing position, indicating that zero data blocks exist between the last written candidate non-zero data block and the target non-zero data block, or the first plurality of data blocks of the source cloud hard disk are non-zero data blocks. In this case, the target non-zero data block needs to be written into the target cloud hard disk according to the target preset starting offset, and zero data supplementation is also required. It can be understood that, since the writing of the target non-zero data block will cause the change of the current writing position, the current writing position before the writing of the target non-zero data block needs to be used as the interval start point, the target preset initial offset is used as the interval end point, and the data in the interval is cleared to complete the supplementation of the zero data.
Based on the above embodiment, since it is possible to use different backup policies for backup, after determining the target backup information and the target backup volume specified by the restore request, the method may further include the following steps:
step 81: and judging whether the target backup information has a compression identifier or not.
Step 82: and if the compressed data block has the compressed identifier, determining to execute the step of reading the corresponding target compressed data block from the target backup volume according to the target preset sequence by utilizing each preset data volume.
Step 83: and if the compression identifier is not available, reading the corresponding target data blocks from the target backup volume according to the target preset sequence by utilizing each preset data volume, and splicing the target data blocks to finish data recovery.
If the target backup volume does not have the compression mark, the backup mode provided by the application is not adopted for backup, and the target backup volume is not compressed, so that the corresponding target data blocks can be directly read out from the target backup volume by utilizing each preset data volume and spliced. The target data block may include all-zero data blocks and non-zero data blocks.
By applying the cloud hard disk data compression backup and recovery method provided by the embodiment of the application, when the cloud hard disk is backed up, the cloud hard disk is segmented, and the non-zero data blocks in the cloud hard disk are compressed. A non-zero data block is a data block in which non-zero data is recorded, and unlike a zero data block, its specific content cannot be determined at the time of data recovery, and thus it is necessary to compress and save it in order to perform data recovery based thereon. Since different non-zero data blocks have different volumes after being compressed and are stored consecutively between compressed data blocks, in order to enable accurate recovery, the data volumes of the compressed data blocks are recorded so that the compressed data blocks can be read out correctly. In order to characterize the position of the data block in the source cloud hard disk, the initial offset corresponding to the compressed data block, namely the initial offset corresponding to the non-zero data block and the corresponding data volume are utilized to generate backup information, and the corresponding preset sequence is determined. The preset sequence is used for indicating the sequence of selecting the compressed data blocks when the data is recovered. By writing the compressed data blocks into the backup volume, the compressed backup of the source cloud hard disk can be completed. When the recovery request is detected, it is indicated that recovery is desired to be performed on a certain source cloud hard disk, and at this time, the compressed data blocks can be accurately read out by using the target backup information specified by the recovery request, the target backup volume and the target preset sequence, decompressed and spliced to complete data recovery. By removing all-zero data blocks and compressing and storing non-zero data blocks, corresponding backup information and a preset sequence are generated, the storage space required by compression backup can be greatly reduced, the utilization efficiency of the storage space is improved, and the problems of more occupied storage space and higher service cost in the related technology are solved.
Based on the foregoing embodiments, please refer to fig. 2, fig. 2 is a specific cloud hard disk backup flowchart provided in an embodiment of the present application. The source cloud hard disk backup flow is described as follows:
1) And dividing the source cloud hard disk into n chunk, wherein n is a positive integer. chunk is the initial data block.
2) And comparing each chunk with a binary empty flag bit "\x00", and identifying the empty chunk. In this embodiment, it may be assumed that the 1,3,4 … n-1 th chunk is a non-null chunk (i.e., a non-zero chunk), and that the null chunk is represented by 2, n.
3) For non-empty chunk, the gzip tool is used to compress, resulting in a compressed data block (i.e., compressed data block), and its corresponding capacity size (i.e., data volume of the compressed data block) is calculated. Specifically, the capacity of chunk1 is reduced to size1 after compression. The compression step will be skipped as the chunk2 is identified as empty chunk (i.e., zero chunk), again the compressed chunk size will not be calculated. After the chunk3 is compressed, the size of the obtained data block is size3. And so on for subsequent compressed data blocks.
4) Writing the data block 1 obtained by compressing the chunk1 into the backup volume, and recording a key value pair in the database to correspond to the data block. The key value of the key value pair is: the start offset1 when reading this chunk from the source cloud hard disk. The value is: the size of compressed chunk1 of this chunk is size1.
5) Since chunk2 is identified as empty chunk, the step of writing the backup volume with the overcompressed, and the database records key value pairs, all subsequent empty chunks are processed in this manner.
6) Processing of data block 3 is started, since data block 1 has been written to the backup volume for the first time before, the current write offset of the backup volume has been changed from the initial 0 to size1 (because the size of data block 1 is size 1), and data block 3 will continue to be written from offset size1 until compressed chunk3 is completely written. After the writing is completed, the offset of the backup volume becomes size1+size3 to be the start position of the next data block to start writing. After writing the data block 3, a new key value pair is newly added to the database, wherein the key value is as follows: the initial offset when the chunk3 is read from the source cloud hard disk is as follows: size3 of compressed data block 3 of chunk 3.
7) And processing the subsequent data blocks in the processing mode to finish data writing and key value pair recording of the database.
After the cloud hard disk is compressed and backed up, not only is the empty chunk not written into the backup volume, but also the non-empty chunk is compressed, so that the capacity occupied by the backup volume is greatly reduced, and the source volume starting reading initial offset of each complete chunk and the size information of the data block obtained after the compression of each chunk are recorded in the database.
The cloud hard disk data compression backup and recovery flow is as follows:
1) The recorded key value pairs (namely target information) are read from the database, and the data blocks corresponding to the key values are processed in a polling mode.
2) Taking fig. 2 as an example, offset1 is processed first: size1, i.e., the size1 data block is read backward from the backup volume starting from the 0 start offset, and then the data block is decompressed backward using the gzip tool to obtain a new decompressed data block (i.e., a candidate non-zero data block).
3) And positioning the initial offset of the recovered cloud hard disk (namely the target cloud hard disk) to the offset1, and then starting to write the decompressed data block.
4) Since the previous chunk2 is a null chunk, the chunk is not recording the relevant information, and the recovery process of chunk3 will be directly started.
5) Process offset3: size3, the backup volume starts from the initial reading initial offset size1, then reads the data block with size3 backwards, and decompresses to obtain a new data block. Since offset3 must be greater than the current starting offset of the recovery cloud hard disk, offset2 (i.e., the current write location is increased from offset1 to offset2 due to the previously written and decompressed chunk 1). Therefore, before writing the data block (decompressed chunk 3) into the recovery cloud hard disk, the space from offset2 to offset3 needs to be cleared. To ensure that the restored data is consistent with the original backup.
6) And the subsequent processing flow is similar to the above, until the processing of all key value pairs in the database is completed, the recovery of the cloud hard disk backup is completed.
In actual measurement, five source cloud hard disks are compared, and the conditions of each source cloud hard disk are as follows:
A. imageA: creating an empty cloud hard disk with quota of 10G, mounting the empty cloud hard disk to a virtual machine to be formatted into an ext4 file system, and creating an all-zero file with size of 2G in the file system through a dd command.
B. imageB: creating an empty cloud hard disk with quota of 10G, mounting the empty cloud hard disk to a virtual machine to be formatted into an ext4 file system, and creating an all-zero file with size of 5G in the file system through a dd command.
C. imageC: and creating a mirror volume with quota of 10G, namely, the cloud hard disk comprises a system mirror, wherein the mirror volume is 39MB, and the system mirror is a linux minimum installation system.
D. imageD: creating a mirror volume with quota of 10G, namely, the cloud hard disk contains a system mirror, wherein the mirror volume is 2404MB, and is a centos7 installation system.
E. image E: and creating a mirror volume with quota of 10G, namely, the cloud hard disk comprises a system mirror, wherein the mirror volume is 396MB, and the mirror volume is a win minimum installation system.
The above 5 cloud hard disks are respectively used for executing cloud hard disk backup under the following three scenes, and the consumed time and the real capacity occupied by the backup volume after the backup is completed are recorded. The scene is as follows:
1. the community backup driver is used, namely, the empty chunk detection is not started, and the backup compression is not started.
2. By using the optimized drive of the application, only the empty chunk detection is started, and the backup compression is not started.
3. By using the optimized drive of the application, the empty chunk detection is started, and the backup compression is started.
Referring to fig. 3 and fig. 4, fig. 3 is a comparison chart of backup time consumption provided by an embodiment of the present application, and fig. 4 is a comparison chart of backup volume capacity provided by an embodiment of the present application. As can be seen from the analysis of fig. 3, the cloud hard disk backup mechanism adopted in the related art consumes relatively high time for backup; the time consumed for backing up the same volume after the empty chunk detection (backup acceleration) is started is obviously reduced; after the backup compression is turned on again, the time consumed is generally higher than just turning on the empty chunk detection, because there is more time for chunk compression, but still less time consuming than the mechanism employed by the related art backup.
Note that the ordinate of fig. 4 is exponentially distributed. As can be seen from fig. 4, the cloud hard disk backup logic (scenario 1) adopted in the related art has 10G capacity occupation of the backup volume and the quota size of the source cloud hard disk, which seriously consumes the backup storage capacity. After the empty chunk detection is turned on (scenario 2), it can be seen that the capacity occupation of the backup volume has been greatly reduced. While the capacity occupation of the backup volumes will be further reduced (scenario 3) after the backup compression is turned on at the same time (scenario 3). The capacity of scene 3, which is further saved than scene 2, is related to the sparseness of the data in the source cloud hard disk (e.g., the sparseness is high for all-zero files generated by imageA and imageB for dd commands), and also related to the compression algorithm used. The embodiment of the application adopts a gzip compression algorithm to obtain the test results in the figures 3 and 4.
The following describes the cloud hard disk data compression backup and recovery device provided by the embodiment of the present application, and the cloud hard disk data compression backup and recovery device described below and the cloud hard disk data compression backup and recovery method described above can be referred to correspondingly.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a cloud hard disk data compression backup and recovery device according to an embodiment of the present application, including:
the splitting module 110 is configured to split the source cloud hard disk to obtain a plurality of initial data blocks, and determine an initial offset of each initial data block in the source cloud hard disk;
the compression module 120 is configured to compress a non-zero data block in the initial data block to obtain compressed data blocks, and calculate a data volume of each compressed data block;
the information generating module 130 is configured to generate corresponding backup information by using the initial offset and the data volume corresponding to the compressed data block, and determine a preset sequence corresponding to the backup information;
a writing module 140 for writing compressed data blocks into the backup volume;
and the recovery module 150 is configured to, when a recovery request is detected, perform data recovery by using target backup information specified by the recovery request, a target backup volume, and a corresponding target preset sequence.
Optionally, the recovery module 150 includes
The determining unit is used for determining target backup information and target backup volumes designated by the recovery request if the recovery request is detected; the target backup information comprises a plurality of preset data volumes and a plurality of corresponding preset initial offsets;
the reading unit is used for reading corresponding target compressed data blocks from the target backup volumes according to the target preset sequence by utilizing each preset data volume;
the decompression unit is used for decompressing the target compressed data block to obtain candidate non-zero data blocks;
the target determining unit is used for determining target non-zero data blocks in the candidate non-zero data blocks according to a target preset sequence and determining target preset initial offset corresponding to the target non-zero data blocks;
and the writing unit is used for writing the target non-zero data block into the target cloud hard disk based on the matching condition of the target preset initial offset and the current writing position of the target cloud hard disk.
Optionally, the writing unit includes:
the first writing subunit is used for writing the target non-zero data block into the target cloud hard disk according to the target preset initial offset if the target preset initial offset is matched with the current writing position;
And the second writing subunit is used for writing the target non-zero data block into the target cloud hard disk according to the target preset initial offset if the target preset initial offset is not matched with the current writing position, and clearing the data between the target non-zero data block and the target preset initial offset from the current writing position before the target cloud hard disk is written into the target cloud hard disk.
Optionally, the method further comprises:
the compression judging unit is used for judging whether the target backup information has a compression identifier or not;
the determining execution unit is used for determining to execute the step of reading the corresponding target compressed data blocks from the target backup volumes according to the target preset sequence by utilizing each preset data volume if the compressed identification exists;
and the splicing recovery unit is used for reading the corresponding target data blocks from the target backup volume according to the target preset sequence by utilizing each preset data volume if the compression identifier is not available, and splicing the target data blocks to finish data recovery.
Optionally, the information generating module 130 includes:
the ordering unit is used for ordering the compressed data blocks according to the magnitude relation of the initial offset corresponding to each compressed data block, and determining the sequence of the compressed data blocks as a preset sequence.
Optionally, the compression module 120 includes:
the zero data block detection unit is used for detecting zero data blocks of all initial data blocks to obtain detection results;
and the compression unit is used for determining the initial data block with the detection result being non-zero as the non-zero data block to compress the initial data block to obtain a compressed data block.
Optionally, the zero data block detection unit includes:
the content matching subunit is used for reading the data content of the initial data block and comparing the data content with the binary empty flag bit;
and the non-zero determination subunit is used for determining that the detection result corresponding to the initial data block is non-zero if any data content is not the binary null flag bit.
Optionally, the segmentation module 110 includes:
the granularity acquisition unit is used for acquiring the segmentation granularity; the segmentation granularity can be equally divided into 1GB;
and the average segmentation unit is used for carrying out average segmentation on the source cloud hard disk according to the segmentation granularity to obtain an initial data block.
Optionally, the information generating module 130 includes:
a key value pair generating unit, configured to compose a key value pair by using the initial offset and the data volume corresponding to the compressed data block;
a key value pair ordering unit, configured to order each key value pair according to the order of the initial offset, so as to obtain a key value pair sequence;
And the identification unit is used for identifying the key value pair sequence by utilizing the hard disk mark of the source cloud hard disk and the volume mark of the backup volume to obtain backup information.
Optionally, the information generating module 130 includes:
the initial generation unit is used for generating initial backup information by utilizing the initial offset and the data volume corresponding to the compressed data block;
and the compression identification unit is used for identifying the initial backup information by utilizing the compression identification to obtain the backup information.
The electronic device provided by the embodiment of the application is introduced below, and the electronic device described below and the cloud hard disk data compression backup and recovery method described above can be referred to correspondingly.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Wherein the electronic device 100 may include a processor 101 and a memory 102, and may further include one or more of a multimedia component 103, an information input/information output (I/O) interface 104, and a communication component 105.
The processor 101 is configured to control overall operation of the electronic device 100, so as to complete all or part of the steps in the cloud hard disk data compression backup and recovery method described above; the memory 102 is used to store various types of data to support operation at the electronic device 100, which may include, for example, instructions for any application or method operating on the electronic device 100, as well as application-related data. The Memory 102 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as one or more of static random access Memory (Static Random Access Memory, SRAM), electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk.
The multimedia component 103 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 102 or transmitted through the communication component 105. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 104 provides an interface between the processor 101 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 105 is used for wired or wireless communication between the electronic device 100 and other devices. Wireless communication, such as Wi-Fi, bluetooth, near field communication (Near Field Communication, NFC for short), 2G, 3G or 4G, or a combination of one or more thereof, the respective communication component 105 may thus comprise: wi-Fi part, bluetooth part, NFC part.
The electronic device 100 may be implemented by one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), digital signal processors (Digital Signal Processor, abbreviated as DSP), digital signal processing devices (Digital Signal Processing Device, abbreviated as DSPD), programmable logic devices (Programmable Logic Device, abbreviated as PLD), field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGA), controllers, microcontrollers, microprocessors, or other electronic components for performing the cloud hard disk data compression backup and recovery method as described in the above embodiments.
The following describes a computer readable storage medium provided by an embodiment of the present application, where the computer readable storage medium described below and the cloud hard disk data compression backup and recovery method described above may be referred to correspondingly.
The application also provides a computer readable storage medium, wherein the computer readable storage medium is stored with a computer program, and the computer program realizes the steps of the cloud hard disk data compression backup and recovery method when being executed by a processor.
The computer readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms include, comprise, or any other variation is intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The principles and embodiments of the present application have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present application and the core ideas thereof; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (8)

1. The cloud hard disk data compression backup and recovery method is characterized by comprising the following steps of:
splitting a source cloud hard disk to obtain a plurality of initial data blocks, and determining initial offset of each initial data block in the source cloud hard disk;
zero data block detection is carried out on each initial data block, and a detection result is obtained;
determining the initial data block with the detection result being non-zero as a non-zero data block, compressing the initial data block to obtain compressed data blocks, and calculating the data volume of each compressed data block;
determining the initial data block with the detection result being zero as a zero data block, not compressing the zero data block and not writing the zero data block into a backup volume;
generating corresponding backup information by utilizing the initial offset corresponding to the compressed data block and the data volume, and determining a preset sequence corresponding to the backup information;
writing the compressed data blocks into a backup volume;
if the recovery request is detected, determining target backup information and target backup volumes appointed by the recovery request; the target backup information comprises a plurality of preset data volumes and a plurality of corresponding preset initial offsets;
Reading corresponding target compressed data blocks from the target backup volumes according to a target preset sequence by utilizing the preset data volumes;
decompressing the target compressed data block to obtain candidate non-zero data blocks;
determining a target non-zero data block in the candidate non-zero data blocks according to the target preset sequence, and determining a target preset initial offset corresponding to the target non-zero data block;
if the target preset initial offset is matched with the current writing position of the target cloud hard disk, writing the target non-zero data block into the target cloud hard disk according to the target preset initial offset;
if the target preset initial offset is not matched with the current writing position, writing the target non-zero data block into the target cloud hard disk according to the target preset initial offset, and clearing the data between the current writing position before writing the target non-zero data block and the target preset initial offset from the target cloud hard disk.
2. The method for compressing, backing up and recovering cloud hard disk data according to claim 1, wherein determining the preset sequence corresponding to the backup information comprises:
And sequencing the compressed data blocks according to the magnitude relation of the initial offset corresponding to each compressed data block, and determining the sequence of the compressed data blocks as the preset sequence.
3. The method for compressing, backing up and recovering cloud hard disk data according to claim 1, wherein said detecting zero data blocks of each initial data block to obtain a detection result comprises:
reading the data content of the initial data block, and comparing the data content with a binary null flag bit;
and if any data content is not the binary null flag bit, determining that the detection result corresponding to the initial data block is non-zero.
4. The method for compressing, backing up and recovering cloud hard disk data according to claim 1, wherein said splitting the source cloud hard disk to obtain a plurality of initial data blocks comprises:
obtaining segmentation granularity; the segmentation granularity can be equally divided into 1GB;
and carrying out average segmentation on the source cloud hard disk according to the segmentation granularity to obtain the initial data block.
5. The method of claim 1, wherein generating corresponding backup information using the starting offset and the data volume corresponding to the compressed data block comprises:
Forming a key value pair by using the initial offset corresponding to the compressed data block and the data volume;
ordering the key value pairs according to the size sequence of the initial offset to obtain a key value pair sequence;
and marking the key value pair sequence by using the hard disk mark of the source cloud hard disk and the volume mark of the backup volume to obtain the backup information.
6. The cloud hard disk data compression backup and recovery device is characterized by comprising:
the splitting module is used for splitting the source cloud hard disk to obtain a plurality of initial data blocks and determining initial offset of each initial data block in the source cloud hard disk;
the compression module is used for compressing the non-zero data blocks in the initial data blocks to obtain compressed data blocks, and calculating the data volume of each compressed data block;
the information generation module is used for generating corresponding backup information by utilizing the initial offset corresponding to the compressed data block and the data volume, and determining a preset sequence corresponding to the backup information;
a writing module for writing the compressed data blocks into a backup volume;
the recovery module is used for carrying out data recovery by utilizing the target backup information specified by the recovery request, the target backup volume and the corresponding target preset sequence when the recovery request is detected;
The compression module includes:
the zero data block detection unit is used for detecting zero data blocks of all initial data blocks to obtain detection results;
the compression unit is used for determining an initial data block with a detection result being non-zero as a non-zero data block and compressing the initial data block to obtain a compressed data block; the initial data block with the detection result being zero is a zero data block, the zero data block is not compressed, and the zero data block is not written into the backup volume;
the recovery module includes:
the determining unit is used for determining target backup information and target backup volumes designated by the recovery request if the recovery request is detected; the target backup information comprises a plurality of preset data volumes and a plurality of corresponding preset initial offsets;
the reading unit is used for reading corresponding target compressed data blocks from the target backup volumes according to the target preset sequence by utilizing each preset data volume;
the decompression unit is used for decompressing the target compressed data block to obtain candidate non-zero data blocks;
the target determining unit is used for determining target non-zero data blocks in the candidate non-zero data blocks according to a target preset sequence and determining target preset initial offset corresponding to the target non-zero data blocks;
The writing unit is used for writing the target non-zero data block into the target cloud hard disk based on the matching condition of the target preset initial offset and the current writing position of the target cloud hard disk;
the writing unit includes:
the first writing subunit is used for writing the target non-zero data block into the target cloud hard disk according to the target preset initial offset if the target preset initial offset is matched with the current writing position;
and the second writing subunit is used for writing the target non-zero data block into the target cloud hard disk according to the target preset initial offset if the target preset initial offset is not matched with the current writing position, and clearing the data between the current writing position before writing the target non-zero data block and the target preset initial offset of the target cloud hard disk.
7. An electronic device comprising a memory and a processor, wherein:
the memory is used for storing a computer program;
the processor is configured to execute the computer program to implement the cloud hard disk data compression backup and restore method according to any one of claims 1 to 5.
8. A computer readable storage medium for storing a computer program, wherein the computer program when executed by a processor implements the cloud hard disk data compression backup and restore method according to any one of claims 1 to 5.
CN202110838010.2A 2021-07-23 2021-07-23 Cloud hard disk data compression backup and recovery method, device, equipment and storage medium Active CN113722150B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110838010.2A CN113722150B (en) 2021-07-23 2021-07-23 Cloud hard disk data compression backup and recovery method, device, equipment and storage medium
PCT/CN2022/078491 WO2023000674A1 (en) 2021-07-23 2022-02-28 Method and apparatus for data compression, backup and recovery of cloud hard disk, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110838010.2A CN113722150B (en) 2021-07-23 2021-07-23 Cloud hard disk data compression backup and recovery method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113722150A CN113722150A (en) 2021-11-30
CN113722150B true CN113722150B (en) 2023-08-22

Family

ID=78673874

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110838010.2A Active CN113722150B (en) 2021-07-23 2021-07-23 Cloud hard disk data compression backup and recovery method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN113722150B (en)
WO (1) WO2023000674A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113722150B (en) * 2021-07-23 2023-08-22 苏州浪潮智能科技有限公司 Cloud hard disk data compression backup and recovery method, device, equipment and storage medium
CN115865097B (en) * 2023-02-17 2023-05-23 浪潮电子信息产业股份有限公司 Data compression method, system, equipment and computer readable storage medium
CN115982398B (en) * 2023-03-13 2023-05-16 苏州浪潮智能科技有限公司 Graph structure data processing method, system, computer device and storage medium
CN117971612B (en) * 2024-03-29 2024-06-04 苏州元脑智能科技有限公司 Hard disk monitoring method, device, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109597717A (en) * 2018-12-07 2019-04-09 北京金山云网络技术有限公司 A kind of data backup, restoration methods, device, electronic equipment and storage medium
CN109614268A (en) * 2018-12-10 2019-04-12 浪潮(北京)电子信息产业有限公司 A kind of restoration methods of cloud Backup Data, apparatus and system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9542397B1 (en) * 2013-03-14 2017-01-10 EMC IP Holding Company LLC File block addressing for backups
CN109582653B (en) * 2018-11-14 2020-12-08 网易(杭州)网络有限公司 Method and device for compressing and decompressing files
CN111104063A (en) * 2019-12-06 2020-05-05 浪潮电子信息产业股份有限公司 Data storage method and device, electronic equipment and storage medium
CN111104258A (en) * 2019-12-23 2020-05-05 北京金山云网络技术有限公司 MongoDB database backup method and device and electronic equipment
CN111723053A (en) * 2020-06-24 2020-09-29 北京航天数据股份有限公司 Data compression method and device and data decompression method and device
CN112214359A (en) * 2020-10-30 2021-01-12 上海爱数信息技术股份有限公司 Backup and recovery system and method for Oracle database
CN113064760B (en) * 2021-04-06 2022-02-15 广州鼎甲计算机科技有限公司 Database synthesis backup method and device, computer equipment and storage medium
CN113722150B (en) * 2021-07-23 2023-08-22 苏州浪潮智能科技有限公司 Cloud hard disk data compression backup and recovery method, device, equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109597717A (en) * 2018-12-07 2019-04-09 北京金山云网络技术有限公司 A kind of data backup, restoration methods, device, electronic equipment and storage medium
CN109614268A (en) * 2018-12-10 2019-04-12 浪潮(北京)电子信息产业有限公司 A kind of restoration methods of cloud Backup Data, apparatus and system

Also Published As

Publication number Publication date
WO2023000674A1 (en) 2023-01-26
CN113722150A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN113722150B (en) Cloud hard disk data compression backup and recovery method, device, equipment and storage medium
CN106844102B (en) Data recovery method and device
CN107229420B (en) Data storage method, reading method, deleting method and data operating system
CN106681862B (en) Data incremental recovery method and device
US20160034201A1 (en) Managing de-duplication using estimated benefits
CN111125033B (en) Space recycling method and system based on full flash memory array
CN112579327B (en) Fault detection method, device and equipment
US9727309B2 (en) Computer-readable recording medium, encoding apparatus, and encoding method
CN114896641A (en) Data verification method and device, electronic equipment and computer readable storage medium
CN115562905A (en) Backup method, system, device and computer readable storage medium
CN113849388B (en) Test method and device, electronic equipment and storage medium
CN111338759A (en) Virtual disk check code generation method, device, equipment and storage medium
CN113254267B (en) Data backup method and device for distributed database
CN111324295B (en) Data migration method, device, equipment and medium
CN108089942B (en) Data backup and recovery method and device
CN115328696A (en) Data backup method in database
CN113868026A (en) Cloud backup method and related device
CN105138429B (en) A kind of Copy on write Snapshot Method and system
CN110209530B (en) Method and system for recovering IO data of CDP system
CN114138786A (en) Method, device, medium, product and equipment for duplicate removal of online transaction message
CN111125012A (en) Snapshot generation method, device and equipment and readable storage medium
CN112162883A (en) Duplicate data recovery method and system, electronic equipment and storage medium
CN112559533A (en) Continuous database filing method and device and electronic equipment
CN107967188B (en) Processing method and device in data storage
CN112328433A (en) Processing method and device for restoring archived data, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant