WO2019080717A1 - 一种磁盘快照的数据处理方法及装置 - Google Patents

一种磁盘快照的数据处理方法及装置

Info

Publication number
WO2019080717A1
WO2019080717A1 PCT/CN2018/109933 CN2018109933W WO2019080717A1 WO 2019080717 A1 WO2019080717 A1 WO 2019080717A1 CN 2018109933 W CN2018109933 W CN 2018109933W WO 2019080717 A1 WO2019080717 A1 WO 2019080717A1
Authority
WO
WIPO (PCT)
Prior art keywords
data block
block identifier
disk
snapshot
disk snapshot
Prior art date
Application number
PCT/CN2018/109933
Other languages
English (en)
French (fr)
Inventor
廖武钧
鲁振伟
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2019080717A1 publication Critical patent/WO2019080717A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Definitions

  • the present invention relates to data processing technologies, and in particular, to a data processing method and apparatus for a disk snapshot.
  • a disk snapshot is a complete record of the contents of a disk at a point in time. According to the disk snapshot, the disk can be restored to the data content recorded by any disk snapshot, that is, the disk is restored to the state in which the disk snapshot is generated.
  • a snapshot of each disk created for a disk at different points in time can form a snapshot chain. Disk snapshots are mainly used for backup and disaster recovery. To recover disk data, you can roll back the disk data according to the snapshot chain and restore the data on the disk to the data content recorded by any disk snapshot on the snapshot chain. When using disk snapshots for disk data recovery and migration processing, how to improve the processing efficiency of disk snapshots is a problem that needs to be solved.
  • An aspect of the present application provides a method, including: determining, according to a data block identifier of a data block corresponding to a disk snapshot, whether there is a duplicate data block identifier; if there is a duplicate data block identifier, deleting the duplicate data block identifier, Obtaining the deduplicated data block identifier; performing at least one of the following processing according to the deduplicated data block identifier: verifying the data integrity of the disk snapshot, and migrating the disk snapshot across the storage domain.
  • Figure 1 is a schematic view showing an implementation of the present application
  • FIG. 2 is a schematic diagram of deduplication of a disk snapshot
  • Figure 3 is a schematic view of a first embodiment of the present application.
  • Figure 4 is a schematic view of a second embodiment of the present application.
  • Figure 5 is a schematic view of a third embodiment of the present application.
  • Figure 6 is a schematic view of a fourth embodiment of the present application.
  • Figure 7 is a schematic view of a fifth embodiment of the present application.
  • Figure 8 is a schematic view of a sixth embodiment of the present application.
  • Figure 9 is a diagram showing an example of a system provided by the present application.
  • A, B or C means (A), (B), (C), (A and B), (A and C), (B and C), or (A, B and C).
  • A/B means "A or B”.
  • a and/or B means (A), (B), or (A and B).
  • Embodiments of the present application may be implemented in hardware, firmware, software, or a combination thereof. Embodiments of the present application can also be implemented by instructions carried or stored on one or more temporary or non-transitory machine readable mediums (eg, computer readable media), which can be read or executed by one or more processors. .
  • a machine-readable medium can be implemented by any storage device, mechanism, or other physical structure for storing or transmitting information in a machine readable manner (eg, volatile or nonvolatile memory, media disc, or other media device) ).
  • the computer readable medium includes both permanent and non-permanent, removable and non-removable storage media.
  • the storage medium can be stored by any method or technique.
  • the information can be computer readable instructions, data structures, modules of programs, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM electrically erasable programmable read only memory
  • flash memory or other memory technology
  • compact disk read only memory CD-ROM
  • DVD digital versatile disk
  • Magnetic tape cartridges disk storage or other magnetic storage devices or any other non-transportable media that can be used to store information that can be accessed by computing devices.
  • computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.
  • FIG. 1 is a schematic diagram of an implementation of the present application.
  • the storage system may include at least one client computing device (eg, client computing devices 10a through 10n), at least one disk (eg, disks 14a through 14n), and at least one server (eg, server) that is connected to the disk. 12a to 12n).
  • each server can be connected to one or more disks.
  • the client computing device can read and write data to and from the disk through the server.
  • the present application does not limit the type of client computing device, and the client computing device may include a desktop computer, or various portable computers or electronic devices, such as a personal computer, a notebook computer, a smart phone, or other electronic device.
  • server 12a can include a processor 120 and system memory 122.
  • Processor 120 and system memory 122 can be connected by a system bus.
  • the system bus can include at least one of the following types of bus structures: a memory bus or memory controller, a peripheral bus, and a local bus using various bus architectures.
  • System memory 122 can include volatile memory (such as RAM), non-volatile memory (such as ROM), flash memory, or a combination thereof.
  • System memory 122 can include an operating system 124 and a snapshot module 126. Operating system 124 is used to control the operation of server 12a, such as in conjunction with other operating systems or applications.
  • the snapshot module 126 can include a deduplication unit 1264, a snapshot integrity verification unit 1266, and a snapshot migration unit 1268.
  • the processor 120 can implement various snapshot related operations and tasks by executing the snapshot module 126 stored in the system memory 122.
  • the processor 120 can implement a snapshot integrity check on the disk 14a by performing the deduplication unit 1264 and the snapshot integrity checking unit 1266 according to the snapshot integrity check request of the client computing device 10a; or the processor 120 can follow the snapshot.
  • the migration request implements migration of the snapshot of the disk 14a to the disk 14n by executing the deduplication unit 1264 and the snapshot migration unit 1268.
  • the snapshot integrity check refers to checking whether any one of the data blocks included in the snapshot is readable.
  • the storage space of one disk can be divided into multiple intervals according to the address offset. For example, 2MB is used as an interval, and each interval data can be stored as one data block (also called slice and slice) of the disk snapshot.
  • each interval data can be stored as one data block (also called slice and slice) of the disk snapshot.
  • snapshot migration refers to migrating disk snapshots across storage domains.
  • the storage domain refers to a storage area with independent access rights, for example, a computer room, a computer cluster, and the like.
  • a storage domain user (such as a computer or a virtual machine) can directly access storage resources (such as cloud disks and disk snapshots) in the storage domain, but cannot access storage resources of other storage domains.
  • storage resources such as cloud disks and disk snapshots
  • the data of the other storage domain needs to be migrated to the storage domain first, and then the user of the storage domain can read the migrated data in the storage domain.
  • it will also lead to data migration across storage domains.
  • the snapshot module 126 can also be implemented in the operating system 124.
  • the structure of each server can be referred to the server 12a, and therefore will not be described again.
  • the application can also be applied to a cloud server.
  • One or more disk instances can be established on the cloud server for reading and writing as a computer disk.
  • the actual data is performed by one or more physical disks in the background. storage.
  • the data content recorded by different disk snapshots in a snapshot chain often has only a small difference.
  • disk snapshot A corresponds to four data blocks; when creating disk snapshot B, it is assumed that only data block 1-A and If the data of the disk interval corresponding to the data block 3-A is modified, only the data block 1-B and the data block 3-B are newly created, and the disk snapshot B can continue to use the data block 2-A of the disk snapshot A and the data block 4- A.
  • there are duplicate data blocks only one will be stored, as shown in Figure 2, for disk snapshots A and B, although both correspond to data blocks 2-A and 4-A, but only one data is stored. Block 2-A and one data block 4-A.
  • each disk snapshot has corresponding metadata
  • the metadata includes a data block identifier list of the disk snapshot, which is used to record the identifier of the data block used by the disk snapshot (for example, the data block name), so as to read the data block identifier.
  • the indicated data block For example, the metadata of disk snapshot A is used to record the following names: data block 1-A, data block 2-A, data block 3-A, data block 4-A; the metadata of disk snapshot B is used to record the following names: Data block 1-B, data block 2-A, data block 3-B, data block 4-A.
  • the deduplication unit 1264 can be configured to determine whether there is a duplicate data block identifier according to the data block identifier of the data block corresponding to the disk snapshot; if there is a duplicate data block identifier, delete the duplicate data block identifier, Obtaining the deduplicated data block identifier; the snapshot integrity checking unit 1266 may be configured to verify the data integrity of the disk snapshot according to the deduplicated data block identifier; the snapshot migration unit 1268 may be configured to use the deduplicated data block. Identify, migrate disk snapshots across storage domains.
  • the deduplication unit 1264 can be configured to determine, according to the data block identifier of the data block corresponding to the disk snapshot, whether there is a duplicate data block identifier: according to the data block corresponding to the disk snapshot in the disk snapshot chain.
  • the data block identifier determines whether the data block corresponding to different disk snapshots in the disk snapshot chain has a duplicate data block identifier.
  • the data block identifier of the data block corresponding to any disk snapshot is stored in a data block identifier list
  • the deduplication unit 1264 can be used to obtain the deduplicated data block identifier in the following manner:
  • the data block identifier set determines the deduplicated data block identifier corresponding to the multiple disk snapshots.
  • the snapshot integrity verification unit 1266 can be configured to verify the data integrity of the disk snapshot based on the deduplicated data block identification in the following manner:
  • Reading the data block indicated by the deduplicated data block identifier if the data block indicated by each data block identifier after the deduplication is successfully read, determining that the data of the disk snapshot is complete; if at least one data after the deduplication is read If the data block indicated by the block identifier fails, it is determined that the data of the disk snapshot corresponding to the data block indicated by the data block identifier is incomplete.
  • the snapshot migration unit 1268 may also be configured to determine a migration order of the plurality of disk snapshots; the snapshot migration unit 1268 may be configured to migrate the disk snapshots across the storage domains according to the deduplicated data block identifiers in the following manner: Copying the data block indicated by the deduplicated data block identifier from the source storage domain to the destination storage domain according to the determined migration order; copying the metadata of the disk snapshot from the source storage domain to the destination storage domain for the disk snapshot based element The data block corresponding to the data and the disk snapshot rebuilds the disk snapshot in the destination storage domain.
  • each data block identifier after deduplication may have a migration label
  • the snapshot migration unit 1268 may be configured to copy the data block indicated by the deduplicated data block identifier from the source storage domain to the destination storage domain according to the migration order: traversing the deduplicated data block identifier according to the migration order; For each data block identifier, if the migration label corresponding to the data block identifier indicates that the migration label is not migrated, the data block indicated by the data block identifier is copied from the source storage domain to the destination storage domain, and the migration label corresponding to the data block identifier is updated to indicate that the migration label indicates that the migration is performed. If the migration label corresponding to the data block identifier indicates that the migration label has been migrated, it is not necessary to copy the data block indicated by the data block identifier.
  • Figure 3 is a schematic illustration of a first embodiment of the present application.
  • the data block identifier of the data block corresponding to the disk snapshot it is determined whether there is a duplicate data block identifier; if there is a duplicate data block identifier, the duplicate data block identifier is deleted to obtain The deduplicated data block identifier; in block 302, according to the deduplicated data block identifier, at least one of the following processes is performed: verifying the data integrity of the disk snapshot, and migrating the disk snapshot across the storage domain.
  • the data block identifier of the data block corresponding to the disk snapshot can be read from the metadata of the disk snapshot.
  • the metadata for each disk snapshot can include a list of block IDs, including the IDs of all blocks used by the disk snapshot.
  • the data block identifier may be a data block name or an ID (ID), which is not limited in this application.
  • the data blocks stored on the physical disk can be read according to the data block identifier.
  • a data block identifier set that is initially an empty set may be created; a data block identifier list that traverses multiple disk snapshots of the disk (such as a disk snapshot chain), and a data block identifier that is not included in the data block identifier set. Add to the data block identifier set to get the deduplicated data block identifier. In this way, it is ensured that the data block identifier set does not include duplicate data block identifiers, that is, the number of each data block identifier in the data block identifier set is one.
  • this embodiment describes a data integrity check process for a snapshot chain of a disk.
  • the snapshot chain of the disk includes multiple disk snapshots, and each disk snapshot has a data block identifier list, and the data block identifier list includes all data block names used by the disk snapshot.
  • a data block identification set is created, and the data block identification set is initially an empty set; at block 402, the data block identification list of the plurality of disk snapshots in the snapshot chain of the disk is traversed, and the data block is The data block name not included in the identity set is added to the data block identifier set; if the data block identifier list of all disk snapshots is not traversed, block 402 is repeated, and if the data block identifier list of all disk snapshots is traversed, block 403 is performed. After the traversal of the data block identifier list of all disk snapshots of the disk, the obtained data block identifier set is a deduplicated set. That is, the data block name included in the final data block identifier set is not duplicated, and covers all data block names used by all disk snapshots in the snapshot chain of this disk.
  • each data block name in the data block identifier set reading a data block indicated by the data block name; wherein the data block name may be used to indicate a storage location of the data block on the physical disk, according to the data block name Read the corresponding data block from the physical disk. If the data block read fails, it proves that the data block is not available; if the data block is successfully read, it proves that the data block is available.
  • each data block is only read once when checking whether the data block is readable, thereby avoiding many unnecessary read requests and improving data of the disk snapshot. The efficiency of the integrity check.
  • this embodiment describes a process of migrating multiple disk snapshots in a snapshot chain of one disk of a source storage domain to a destination storage domain, where the migration order of the disk snapshots can be set.
  • the snapshot chain of the disk includes multiple disk snapshots
  • the metadata corresponding to each disk snapshot includes a data block identifier list
  • the data block identifier list includes all data block names used by a disk snapshot.
  • a data block identification set is created, and the data block identification set is initially an empty set; at block 502, a data block identification list of a plurality of disk snapshots in a disk is traversed, and the data block identification is performed. The data block name not included in the set is added to the data block identification set; if the data block identification list of all disk snapshots is not traversed, block 502 is repeated.
  • the obtained data block identifier set is a deduplicated set. That is, the data block name included in the finally obtained data block identifier set is not duplicated, and covers all data block names used by multiple disk snapshots of this disk.
  • each data block identifier in the finally obtained data block identifier set corresponds to a migration label, which is used to indicate whether the data block corresponding to the data block name has been migrated; at the beginning (that is, when migration is not started),
  • the migration label for each data block ID in the data block ID set indicates that it is not migrated.
  • the migration label may be represented by 0 or 1. For example, when the migration label is 1, the indication has been migrated, and when the migration label is 0, the indication is not migrated.
  • this application is not limited thereto.
  • the migration order of the multiple disk snapshots of the disk is determined.
  • the order of the migration of the multiple disk snapshots may be specified according to service requirements, for example, the important snapshots are preferentially migrated; or, by default, the time sequence of the disk snapshots may be followed. Perform the migration. This application is not limited thereto.
  • the data block identification list of the plurality of disk snapshots of the disk is traversed according to the migration order; for each data block name in the data block identification list for each disk snapshot, if one of the data blocks within the data block identification set If the migration label corresponding to the name indicates that the migration is not performed, the data block indicated by the data block identifier is copied from the source storage domain to the destination storage domain, and the migration label corresponding to the data block name is updated in the data block identifier set to indicate that the data is migrated; If the migration label corresponding to a data block name in the data block identifier set indicates that the data block has been migrated (that is, the data block indicated by the data block name is copied to the destination storage domain), the data block indicated by the data block name does not need to be performed. copy.
  • the disk snapshot After traversing the list of data block identifiers of a disk snapshot, it is confirmed that the data blocks of a disk snapshot have been copied to the destination storage domain. At this time, at block 505, the metadata of the disk snapshot (including the data block identifier list) is copied. Go to the destination storage domain. In this way, based on the metadata of the disk snapshot and the data block corresponding to the disk snapshot, the disk snapshot can be reconstructed in the destination storage domain. That is, a disk snapshot migration succeeds including metadata and replication of all data blocks. At block 506, it is determined whether the disk snapshot of one disk is successfully migrated. If all of the disks are successfully migrated, the disk snapshot migration of the disk is confirmed. If not, the data block identifier list of the next disk snapshot needs to be traversed to determine. Whether the next disk snapshot is migrated.
  • the process of deduplication processing is added, and each data block that is repeatedly referenced is only copied once, which can greatly reduce the data copy amount of the disk snapshot migration, so as to avoid duplicate data blocks being copied multiple times. . Also, it is allowed to specify the copy order of the snapshot chain to prioritize backup of important disk snapshots.
  • this embodiment describes a process of migrating multiple disk snapshots in a snapshot chain of one disk of a source storage domain to a destination storage domain, where the migration order of the disk snapshots can be set.
  • the difference between this embodiment and the embodiment shown in FIG. 5 is that in this embodiment, batch copying of data blocks can be performed according to the migration sequence.
  • the description of the block 601, the block 602, and the block 603 can be referred to the description of block 501, block 502, and block 503 in FIG. 5, and thus will not be further described herein.
  • the data blocks of any of the disk snapshots are copied in the order of migration.
  • each disk snapshot can be traversed, and the metadata block identifier list of each disk snapshot is obtained by reading the metadata of each disk snapshot. In this way, combined with the migration order and the data block identifier list of each disk snapshot, it can be determined which data block names corresponding to the data blocks need to be copied first, and the data blocks are bulk copied.
  • the migration label corresponding to the data block name indicating the data block needs to be updated in the data block identifier set to indicate that the migration label is migrated, for example, the migration label is updated from 0. Is 1 (the migration label is 1, indicating that it has been migrated; the migration label is 0, indicating that it has not been migrated).
  • the migration label corresponding to any one of the data block identifiers indicates that the migration label is migrated. If all the indications have been migrated, the data block replication is completed, and if the migration label corresponding to the data block name still exists, Migration indicates that the data block has not been copied and proceeds to block 604.
  • the data block identifier list of the plurality of disk snapshots is traversed according to the migration order, and the data block name in the data block identifier list of the disk snapshot is determined according to the migration label corresponding to any one of the data block identifiers in the data block identifier set. Whether the data block is completely migrated.
  • the data block name is found in the data block identifier set, and the migration label of the data block name is read, if the migration label of the data block name indicates that If the migration is performed, the data block indicated by the data block name is copied to the destination storage domain, and the migration label is modified to indicate that the data block has been migrated; if the migration label of the data block name indicates that the migration has been performed, the data block indicated by the data block name is no longer indicated. Perform migration processing.
  • the metadata of the disk snapshot (including the data block identification list) is copied to the destination storage domain. In this way, based on the metadata of the disk snapshot and the data block corresponding to the disk snapshot, the snapshot can be reconstructed in the destination storage domain.
  • block 604 and block 606 may begin execution at the same time, or block 606 may begin after block 604 has been executed for a period of time.
  • the process of deduplication processing is added, and each data block that is repeatedly referenced is copied only once, which can greatly reduce the data copy amount of the snapshot migration, so as to avoid repeated data blocks being copied multiple times. Also, it is allowed to specify the copy order of the snapshot chain to prioritize backup of important snapshots. Disk snapshots do not exist in sequence, and multiple disk snapshots can be migrated at the same time. In this embodiment, the detection process of whether the data block copy process and the data block corresponding to the disk snapshot are all copied is performed separately, and batch copying of the data block can be implemented, thereby improving data migration efficiency.
  • this embodiment describes a process of migrating multiple disk snapshots of one disk of a source storage domain to a destination storage domain, wherein the disk snapshot migration does not have a specified preferred order.
  • the snapshot chain of the disk includes multiple disk snapshots
  • the metadata block corresponding to each disk snapshot includes a data block identifier list
  • the data block identifier list includes all data block names used by a disk snapshot.
  • block 701 and block 702 can be referred to the description of block 501 and block 502 in FIG. 5, and thus will not be further described herein.
  • the data blocks indicated by the plurality of data block names in the data block identifier set are copied to the destination storage domain in batches; in this embodiment, the concurrent capacity of the data block copy is improved by batch copying.
  • the metadata of the plurality of disk snapshots is copied to the destination storage domain to reconstruct multiple disk snapshots of the disk in the destination storage domain based on the metadata and the replicated data blocks.
  • this embodiment describes a process of migrating a snapshot chain of a plurality of disks in a source storage domain to a destination storage domain.
  • the snapshot chain of each disk includes multiple disk snapshots, and the metadata block corresponding to each disk snapshot includes a data block identifier list, and the data block identifier list includes all data block names used by a disk snapshot.
  • a mapping set of disk and disk snapshots is created, initially being an empty set; at block 802, traversing the list of disk snapshots to be migrated on the source storage domain, adding the disk names not in the mapping set Mapping the set, and recording the disk snapshot corresponding to each disk name in the mapping set; thus, according to the mapping set, one or more disk snapshots of each of the disks are to be migrated, that is, each disk name of the mapping set corresponds to one A list of migrated disk snapshots.
  • the disk snapshots to be migrated in the current source storage domain include the following three: disk snapshot A, disk snapshot B, and disk snapshot C.
  • disk snapshot A belongs to disk x
  • disk snapshot B belongs to disk x
  • disk snapshot C belongs to disk y.
  • mapping set is traversed and a disk snapshot migration process is performed on each disk within the mapping set.
  • a disk snapshot migration process is performed on each disk within the mapping set.
  • the embodiment of the present application further provides a method, including: obtaining a data block identifier of a data block corresponding to a disk snapshot, and performing deduplication on the data block identifier; performing at least one of the following processing according to the deduplicated data block identifier: Verify the data integrity of disk snapshots and migrate disk snapshots across storage domains.
  • the deduplication operation of the data block identifier of the data block corresponding to the disk snapshot refer to the operation description of the deduplication unit 1264 in the above embodiment.
  • the processing based on the deduplicated data block identifier refer to the snapshot in the above embodiment.
  • the operation descriptions of the integrity check unit 1266 and the snapshot migration unit 1268 are omitted here.
  • FIG. 9 is an illustration of a system 900 in accordance with various embodiments.
  • System 900 can include one or more processors 904, system control logic 908 coupled to at least one of processors 904, system memory 912 coupled to system control logic 908, and non-easy coupling to system control logic 908.
  • NVM Non-Volatile Memory
  • storage 916 network interface 920 coupled to system control logic 908, and input/output (I/O) device 932 coupled to system control logic 908.
  • Processor 904 can include one or more single or multi-core processors.
  • Processor 904 can include any combination of general purpose processors and special purpose processors (eg, graphics processors, application processors, baseband processors, etc.).
  • system control logic 908 can include any suitable interface controller for providing any suitable interface to at least one of processors 904 and/or any suitable device or component in communication with system control logic 908. .
  • system control logic 908 may include one or more memory controllers for providing an interface to system memory 912.
  • System memory 912 can be used to load and store data and/or instructions to system 900, such as instructions 924.
  • system memory 912 may comprise any suitable volatile memory, such as a suitable dynamic random access memory (DRAM) or the like.
  • DRAM dynamic random access memory
  • NVM/storage 916 may include one or more tangible, non-transitory computer readable media for storing data and/or instructions, such as instructions 924.
  • NVM/storage 916 may include any suitable non-volatile memory, such as flash memory or the like, and/or may include any suitable non-volatile storage device, such as one or more hard disk drives (HDDs), one or Multiple compact disc (CD) drives, and/or one or more digital versatile disc (DVD) drives, and the like.
  • HDDs hard disk drives
  • CD Compact disc
  • DVD digital versatile disc
  • NVM/storage 916 may include storage resources that are physically part of the device on which system 900 is installed, or it may be accessed by the device without necessarily being part of the device.
  • NVM/storage 916 can be accessed via network interface 920 via network instructions and/or through input/output device 932.
  • the system 900 can be caused to implement the method as described in any of the embodiments of Figures 3-8.
  • the instructions 924, or hardware, solid, and/or software portions thereof, may be disposed in additional/alternative elements of the system 900.
  • Network interface 920 can have a transceiver to provide a radio interface to system 900 for communicating over one or more networks and/or with any other suitable device.
  • the transceiver can be integrated with other components of system 900.
  • the transceiver can include a processor of processor 904, a memory of system memory 912, and an NVM/storage of NVM/storage 916.
  • Network interface 920 can include any suitable hardware and/or solids.
  • Network interface 920 can include multiple antennas for providing multiple inputs, multiple output radio interfaces.
  • network interface 920 can include: a wired network adapter, a wireless network adapter, a telephone modem, and/or a wireless modem.
  • At least one of the processors 904 can be packaged with the logic of one or more controllers of the system control logic 908. In an embodiment, at least one of the processors 904 can be packaged with the logic of one or more controllers of the system control logic 908 to form a system in package (SiP). In an embodiment, at least one of the processors 904 can be integrated on the same chip as the logic of one or more controllers of the system control logic 908. In an embodiment, at least one of the processors 904 can be integrated on a chip identical to the logic of one or more controllers of the system control logic 908 to form a system on a chip (SoC).
  • SoC system on a chip
  • input/output device 932 can include a user interface designed to enable user interaction with system 900, a peripheral component interface designed to enable interaction with peripheral components of system 900, and/or designed to determine system 900 Sensors for environmental conditions and/or site information.
  • the user interface can include, but is not limited to, a display (eg, a liquid crystal display, a touch screen display, etc.), a speaker, a microphone, one or more cameras (eg, a camera and/or a camera), a flash (eg, , LED flash), and keyboard.
  • a display eg, a liquid crystal display, a touch screen display, etc.
  • a speaker e.g., a microphone
  • one or more cameras eg, a camera and/or a camera
  • a flash eg, , LED flash
  • the peripheral component interfaces can include, but are not limited to, a non-volatile memory port, a universal serial bus (USB) port, an audio jack, and a power supply interface.
  • a non-volatile memory port can include, but are not limited to, a non-volatile memory port, a universal serial bus (USB) port, an audio jack, and a power supply interface.
  • USB universal serial bus
  • the sensors may include, but are not limited to, a gyro sensor, an accelerometer, a proximity sensor, an ambient light sensor, and a positioning unit.
  • the location unit can also be part of or interact with network interface 920 to communicate with components of a positioning network, such as a Global Positioning System (GPS) satellite.
  • GPS Global Positioning System
  • system 900 can be a mobile computing device. In various embodiments, system 900 can have more or fewer components, and/or different architectures.
  • a method includes: determining, according to a data block identifier of a data block corresponding to a disk snapshot, whether there is a duplicate data block identifier; if there is a duplicate data block identifier, deleting the duplicate data block identifier, Obtaining the deduplicated data block identifier; performing, according to the deduplicated data block identifier, at least one of the following: verifying data integrity of the disk snapshot, and migrating the disk snapshot across the storage domain.
  • determining whether there is a duplicate data block identifier according to the data block identifier of the data block corresponding to the disk snapshot may include:
  • the data block identifier of the data block corresponding to the disk snapshot in the disk snapshot chain it is determined whether the data block corresponding to the different disk snapshots in the disk snapshot chain has a duplicate data block identifier.
  • the at least one of the following processing is performed according to the de-duplicated data block identifier, which may include:
  • At least one of the following processes is performed: verifying the data integrity of the disk snapshot, and migrating the disk snapshot across the storage domain.
  • the data block identifier of the data block corresponding to any disk snapshot is stored in a data block identifier list
  • deleting the duplicated data block identifier to obtain the deduplicated data block identifier may include:
  • the verifying the data integrity of the disk snapshot according to the deduplicated data block identifier may include: reading the deduplicated data block. Identifying the indicated data block; if the data block indicated by each data block identifier after the deduplication is successfully read, determining that the data of the disk snapshot is complete; if the data block indicated by the at least one data block identifier after the deduplication is read If it fails, it is determined that the data of the disk snapshot corresponding to the data block indicated by the data block identifier is incomplete.
  • the method before the migrating the disk snapshot across the storage domain according to the deduplicated data block identifier, the method may further include: determining a plurality of disks Migration order of disk snapshots;
  • Copying according to the migration sequence, the data block indicated by the deduplicated data block identifier from the source storage domain to the destination storage domain; copying the metadata of the disk snapshot from the source storage domain to the destination storage domain, so as to be based on the The metadata of the disk snapshot and the data block corresponding to the disk snapshot rebuild the disk snapshot in the destination storage domain.
  • the de-duplicated data block identifier may correspond to a migration label
  • the data block indicated by the data block identifier is stored from the source.
  • the domain is copied to the destination storage domain, and the migration label corresponding to the data block identifier is updated to indicate that the migration label indicates that the migration label corresponding to the data block identifier is migrated, and the data block indicated by the data block identifier is not required to be copied. .
  • the migrating the disk snapshot across the storage domain according to the deduplicated data block identifier including: indicating the deduplicated data block identifier Copying the data block from the source storage domain to the destination storage domain; copying the metadata of the disk snapshot from the source storage domain to the destination storage domain, so that the metadata based on the disk snapshot and the data block corresponding to the disk snapshot are in the Describe the destination storage domain to rebuild the disk snapshot.
  • the method further comprises: determining any one of the disk and the disk snapshot before migrating the disk snapshot of the plurality of disks across the storage domain. Mapping relations.
  • an apparatus includes: at least one of a snapshot integrity detecting unit and a snapshot migration unit, and a deduplication unit; wherein the deduplication unit is configured to identify a data block according to a data block corresponding to the disk snapshot Determining whether there is a duplicate data block identifier; if there is a duplicate data block identifier, deleting the duplicated data block identifier to obtain a deduplicated data block identifier; and a snapshot integrity checking unit for using the deduplicated data Block identification, verifying data integrity of the disk snapshot; and a snapshot migration unit, configured to migrate the disk snapshot across the storage domain according to the deduplicated data block identifier.
  • the de-duty unit is configured to determine whether there is a duplicate data block identifier according to the data block identifier of the data block corresponding to the disk snapshot in the following manner:
  • the data block identifier of the data block corresponding to the disk snapshot in the disk snapshot chain it is determined whether the data block corresponding to the different disk snapshots in the disk snapshot chain has a duplicate data block identifier.
  • the device according to the exemplary embodiment 10, the data block identifier of the data block corresponding to any disk snapshot is stored in a data block identifier list;
  • the deduplication unit is configured to obtain the deduplicated data block identifier by:
  • the snapshot integrity checking unit may be configured to verify data integrity of the disk snapshot according to the deduplicated data block identifier by: reading Determining the data block indicated by the rewritten data block identifier; if the data block indicated by each data block identifier after the deduplication is successfully read, determining that the data of the disk snapshot is complete; if at least one of the deduplicated reads is read If the data block indicated by the data block identifier fails, it is determined that the data of the disk snapshot corresponding to the data block indicated by the data block identifier is incomplete.
  • the snapshot migration unit is further configured to determine a migration order of the plurality of disk snapshots
  • the snapshot migration unit may be configured to migrate the disk snapshot across the storage domain according to the deduplicated data block identifier in the following manner:
  • Copying according to the migration sequence, the data block indicated by the deduplicated data block identifier from the source storage domain to the destination storage domain; copying the metadata of the disk snapshot from the source storage domain to the destination storage domain, so as to be based on the The metadata of the disk snapshot and the data block corresponding to the disk snapshot rebuild the disk snapshot in the destination storage domain.
  • the deduplicated data block identifier may correspond to a migration label
  • the snapshot migration unit may be configured to: in the following manner, copy the data block indicated by the deduplicated data block identifier from the source storage domain to the destination storage domain according to the migration sequence:
  • the data block indicated by the data block identifier is stored from the source.
  • the domain is copied to the destination storage domain, and the migration label corresponding to the data block identifier is updated to indicate that the migration label indicates that the migration label corresponding to the data block identifier is migrated, and the data block indicated by the data block identifier is not required to be copied. .
  • a system comprising: one or more processors; and one or more machine readable media storing a plurality of instructions when the plurality of instructions are by the one or more
  • the processor when executed, causes the system to implement the method of any of the example embodiments one to nine.
  • a machine readable medium storing a plurality of instructions, when the plurality of instructions are executed by one or more processors, implements the method of any of the example embodiments ninth to ninth embodiment .
  • a method includes: obtaining a data block identifier of a data block corresponding to a disk snapshot, and performing deduplication on the data block identifier; performing at least one of the following according to the deduplicated data block identifier Item processing: Verify the data integrity of the disk snapshot and migrate the disk snapshot across the storage domain.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本文公开了一种磁盘快照的数据处理方法及装置;上述方法,包括:根据磁盘快照对应的数据块的数据块标识,判断是否有重复的数据块标识;如果有重复的数据块标识,删除重复的数据块标识,以获得去重后的数据块标识;根据去重后的数据块标识,进行以下至少一项处理:检验磁盘快照的数据完整性、跨存储域迁移磁盘快照。

Description

一种磁盘快照的数据处理方法及装置
本申请要求2017年10月23日递交的申请号为201710994265.1、发明名称为“一种磁盘快照的数据处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及数据处理技术,尤其涉及一种磁盘快照的数据处理方法及装置。
背景技术
磁盘快照是在一个时间点对磁盘的存储内容的完整记录。根据磁盘快照可以把磁盘恢复成任一个磁盘快照记录的数据内容,即将磁盘恢复成磁盘快照产生时间点的状态。在不同时间点为一个磁盘创建的各个磁盘快照,可以形成一条快照链。磁盘快照主要用于备份和容灾。若需恢复磁盘数据,可以根据快照链进行磁盘数据回滚,把磁盘上的数据恢复为快照链上任一个磁盘快照记录的数据内容。在利用磁盘快照进行磁盘数据恢复和迁移处理时,如何提高磁盘快照的处理效率是需要解决的问题。
发明内容
本申请的一个方面提供一种方法,包括:根据磁盘快照对应的数据块的数据块标识,判断是否有重复的数据块标识;如果有重复的数据块标识,删除所重复的数据块标识,以获得去重后的数据块标识;根据去重后的数据块标识,进行以下至少一项处理:检验磁盘快照的数据完整性、跨存储域迁移磁盘快照。
附图说明
在附图中通过示例而非限制的方式进行说明。为了简单和清楚说明,附图中说明的元件并不需按照比例画出。例如,为清楚起见,一些元件的尺寸可能相对于其他元件被夸大。此外,在认为适当时,在附图中重复附图标记以便指明相应或类似的元件。
图1为本申请的一种实施示意图;
图2为磁盘快照的去重示意图;
图3为本申请的第一种实施例的示意图;
图4为本申请的第二种实施例的示意图;
图5为本申请的第三种实施例的示意图;
图6为本申请的第四种实施例的示意图;
图7为本申请的第五种实施例的示意图;
图8为本申请的第六种实施例的示意图;
图9为本申请提供的一种系统的示例图。
具体实施方式
以下结合附图对本申请实施例进行详细说明,应当理解,以下所说明的实施例仅用于说明和解释本申请,并不用于限定本申请。
虽然本申请的范围易受各种修改和替代形式的影响,但其具体实施例已通过附图中的示例显示,并将在此详细描述。可以理解的是,本申请的范围不受披露的实施例的限制,相反本申请意图覆盖与本申请的精神和权利要求书一致的各种修改、等同和替代形式。
在说明书中对“一个实施例”、“实施例”、“示例实施例”等的引用指明所描述的实施例可包括具体的特征、结构或特性,但是每个实施例可以不需包括该具体的特征、结构或特性。此外,当结合一个实施例描述具体的特征、结构或特性时,谨提出,在本领域技术人员的知识范围内,可以结合其他实施例来实施这种特征、结构或特性(无论其是否被详细描述)。另外,“A、B及C中至少一个”表示(A)、(B)、(C)、(A和B)、(A和C)、(B和C)、或者(A、B和C)。类似地,“A、B或C中至少一个”表示(A)、(B)、(C)、(A和B)、(A和C)、(B和C)、或者(A、B和C)。“A/B”表示“A或B”。“A和/或B”表示(A)、(B)、或者(A和B)。
本申请实施例可以按照硬件、固件、软件或其结合来实现。本申请实施例还可以通过携带或存储在一个或多个暂存或非暂存机器可读介质(例如,计算机可读介质)上的指令实现,指令可由一个或多个处理器读出或执行。机器可读介质可以由任一存储装置、机构、或其他物理结构实现,用于通过机器可读方式存储或传输信息(比如,易失性或非易失性存储器、媒体光盘、或者其他媒体装置)。
其中,计算机可读介质包括永久性和非永久性、可移动和非可移动存储介质。存储介质可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、 静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM),快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
在附图中,一些结构或方法的特点可以表现在具体安排和/或排序。然而,应该认识到这样的具体安排和/或排序可能不需要。相反,在一些实施例中,这些特征可以以不同的方式和/或顺序排列,而不是示例性附图所示。此外,在特定附图中包含结构或方法特征并不意味着在所有实施例中需要这样的特征,并且在某些实施例中,可能不包括或可能与其他特征相结合。
图1为本申请的一种实施示意图。如图1所示,存储系统可以包括至少一个客户端计算设备(比如,客户端计算设备10a至10n)、至少一个磁盘(比如,磁盘14a至14n)以及连接磁盘的至少一个服务器(比如,服务器12a至12n)。其中,每个服务器可以连接一个或多个磁盘。其中,客户端计算设备可以通过服务器对磁盘进行数据读写。本申请并不限定客户端计算设备的类型,客户端计算设备可以包括台式电脑、或者各种可携式电脑或电子设备,比如,个人电脑、笔记型电脑、智能手机或其他电子设备等。
如图1所示,服务器12a可以包括处理器120以及系统存储器122。处理器120和系统存储器122可以通过系统总线连接。系统总线可以包括以下至少一种类型的总线结构:存储总线或存储控制器、外围设备总线、使用各种总线架构的局部总线。系统存储器122可以包括易失性存储器(比如,RAM)、非易失性存储器(比如,ROM)、闪存、或者其组合。系统存储器122可以包括操作系统124以及快照模块126。操作系统124用于控制服务器12a的操作,比如与其他操作系统或应用程序配合执行。其中,快照模块126可以包括:去重单元1264、快照完整性检验单元1266以及快照迁移单元1268。处理器120可以通过执行系统存储器122内存储的快照模块126以实现各种快照相关的操作和任务。比如,处理器120可以根据客户端计算设备10a的快照完整性检验请求,通过执行去重单元1264和快照完整性检验单元1266实现对磁盘14a的快照完整性检验;或者,处理器120可以根据快照迁移请求,通过执行去重单元1264和快照迁移单元1268实现将磁盘14a的快照迁移到磁盘14n。
其中,快照完整性检验指检验快照包括的任一个数据块是否可读。其中,一个磁盘 的存储空间可以按地址偏移分成多个区间,比如2MB作为一个区间,则每个区间数据可以作为磁盘快照的一个数据块(又称切片、分片)进行存储。在磁盘故障或掉电后恢复磁盘数据时,为了保证数据安全可靠,需要检查磁盘快照的数据完整性,即,检查一个磁盘快照中的所有数据块是否都可以被正常读取使用,若存在不能被正常读取的数据块,则磁盘快照存在数据不完整问题。
其中,快照迁移指跨存储域迁移磁盘快照。其中,存储域指一块访问权限独立的存储区域,比如,一个机房、一套计算机集群等。一个存储域的使用者(如计算机、虚拟机等)可以直接访问本存储域中的存储资源(如云磁盘、磁盘快照等),但是不能跨域访问其他存储域的存储资源。当一个存储域的使用者需要使用其他存储域上的数据时,需要将其他存储域的数据先迁移到该存储域,然后,该存储域的使用者才能在本存储域内读取迁移过来的数据。另外,在一些情况下,比如,机房裁撤、存储资源整合等,也会导致跨存储域的数据迁移。
需要说明的是,在其他实现方式中,快照模块126也可以整合在操作系统124中实现。在存在多个服务器时,每个服务器的结构可以参照服务器12a所示,故于此不再赘述。
需要说明的是,本申请还可以应用于云服务器,在云服务器上可以建立一个或多个磁盘实例,用于作为计算机磁盘进行读写使用,实际的数据由后台的一个或多个物理磁盘进行存储。
本实施例中,由于磁盘中部分数据很久才会变动,因此,一个快照链中的不同磁盘快照记录的数据内容往往只有少量区别。为了节省存储空间,可以采用去重方式存储磁盘快照的数据块。即,在一个时间点为一个磁盘创建磁盘快照时,会检查该磁盘的每一个区间;以当前时间点创建的磁盘快照为新磁盘快照,前一个时间点创建的磁盘快照为老磁盘快照为例,如果当前时间点的一个区间的数据与老磁盘快照中对应该区间的数据相比存在变动,则根据当前时间点的该区间的最新数据创建新的数据块,新磁盘快照使用新创建的数据块;否则,新磁盘快照继续使用老磁盘快照中对应该区间的数据块。
如图2所示,假设一个磁盘按地址偏移分为4个区间,在创建磁盘快照A时,磁盘快照A对应四个数据块;在创建磁盘快照B时,假设只有数据块1-A和数据块3-A对应的磁盘区间的数据有修改,则只新创建数据块1-B和数据块3-B,磁盘快照B可以继续使用磁盘快照A的数据块2-A和数据块4-A。为了节省存储空间,若存在重复的数据块,仅会存储一个,如图2中,针对磁盘快照A和B,虽然两者都对应数据块2-A和4-A, 但是仅存储了一个数据块2-A和一个数据块4-A。另外,每个磁盘快照会有对应的元数据,元数据包括磁盘快照的数据块标识列表,用于记录磁盘快照用到的数据块的标识(比如,数据块名称),以便读取数据块标识指示的数据块。比如,磁盘快照A的元数据用于记录以下名称:数据块1-A、数据块2-A、数据块3-A、数据块4-A;磁盘快照B的元数据用于记录以下名称:数据块1-B、数据块2-A、数据块3-B、数据块4-A。
在本实施例中,去重单元1264可以用于根据磁盘快照对应的数据块的数据块标识,判断是否有重复的数据块标识;如果有重复的数据块标识,删除所重复的数据块标识,以获得去重后的数据块标识;快照完整性检验单元1266可以用于根据去重后的数据块标识,检验磁盘快照的数据完整性;快照迁移单元1268可以用于根据去重后的数据块标识,跨存储域迁移磁盘快照。
在示例性实施方式中,去重单元1264可以用于通过以下方式根据磁盘快照对应的数据块的数据块标识,判断是否有重复的数据块标识:根据磁盘快照链内的磁盘快照对应的数据块的数据块标识,判断磁盘快照链内不同磁盘快照对应的数据块是否有重复的数据块标识。
在示例性实施方式中,任一磁盘快照对应的数据块的数据块标识存储在一个数据块标识列表中;
去重单元1264可以用于通过以下方式获得去重后的数据块标识:
创建初始为空集的数据块标识集合;遍历多个磁盘快照的数据块标识列表,将数据块标识集合中没有的数据块标识添加到数据块标识集合中,根据遍历多个磁盘快照后得到的数据块标识集合,确定多个磁盘快照对应的去重后的数据块标识。
在示例性实施方式中,快照完整性检验单元1266可以用于通过以下方式根据去重后的数据块标识,检验磁盘快照的数据完整性:
读取去重后的数据块标识指示的数据块;若成功读取去重后的每个数据块标识指示的数据块,则确定磁盘快照的数据完整;若读取去重后的至少一个数据块标识指示的数据块失败,则确定对应该数据块标识指示的数据块的磁盘快照的数据不完整。
在示例性实施方式中,快照迁移单元1268还可以用于确定多个磁盘快照的迁移顺序;快照迁移单元1268可以用于通过以下方式根据去重后的数据块标识,跨存储域迁移磁盘快照:按照确定的迁移顺序,将去重后的数据块标识指示的数据块从源存储域复制到目的存储域;将磁盘快照的元数据从源存储域复制到目的存储域,以便基于磁盘快照的元数据和磁盘快照对应的数据块在目的存储域重建磁盘快照。
其中,去重后的每个数据块标识可以对应有一个迁移标签;
快照迁移单元1268可以用于通过以下方式按照迁移顺序,将去重后的数据块标识指示的数据块从源存储域复制到目的存储域:按照迁移顺序,遍历去重后的数据块标识;针对每个数据块标识,若数据块标识对应的迁移标签指示未迁移,则将数据块标识指示的数据块从源存储域复制到目的存储域,并更新该数据块标识对应的迁移标签指示已迁移,若数据块标识对应的迁移标签指示已迁移,则无需对该数据块标识指示的数据块进行复制。
图3为本申请的第一种实施例的示意图。如图3所示,在方块301中,根据磁盘快照对应的数据块的数据块标识,判断是否有重复的数据块标识;如果有重复的数据块标识,删除所重复的数据块标识,以获得去重后的数据块标识;在方块302中,根据去重后的数据块标识,进行以下至少一项处理:检验磁盘快照的数据完整性、跨存储域迁移磁盘快照。
其中,磁盘快照对应的数据块的数据块标识可以从磁盘快照的元数据中读取。每个磁盘快照的元数据中可以包括一个数据块标识列表,其中包括该磁盘快照用到的所有数据块的标识。数据块标识可以为数据块名称或者编号(ID),本申请对此并不限定。根据数据块标识可以读取存储在物理磁盘上的数据块。
其中,在方块301中,可以创建初始为空集的数据块标识集合;遍历磁盘的多个磁盘快照(比如一个磁盘快照链)的数据块标识列表,将数据块标识集合中没有的数据块标识添加到数据块标识集合中,从而得到去重后的数据块标识。如此,确保数据块标识集合中不包括重复的数据块标识,即数据块标识集合中每种数据块标识的数目为一个。
下面参照图4至图8,通过多个实施例对本申请进行说明。
如图4所示,本实施例描述一个磁盘的一个快照链的数据完整性检验过程。本实施例中,磁盘的快照链中包括多个磁盘快照,每个磁盘快照具有一个数据块标识列表,数据块标识列表中包括磁盘快照所用到的所有数据块名称。
如图4所示,在方块401,创建一个数据块标识集合,且数据块标识集合初始为空集;在方块402,遍历磁盘的快照链中多个磁盘快照的数据块标识列表,将数据块标识集合中没有的数据块名称加入数据块标识集合;若没有遍历完所有磁盘快照的数据块标识列表,则重复执行方块402,若遍历完所有磁盘快照的数据块标识列表,则执行方块403。其中,在遍历完这个磁盘的所有磁盘快照的数据块标识列表后,得到的数据块标识集合就是一个经过去重后的集合。即最终得到的数据块标识集合中包括的数据块名称没 有重复,且涵盖了这个磁盘的快照链中的所有磁盘快照所用到的所有数据块名称。
在方块403,遍历数据块标识集合中的每一个数据块名称,读取数据块名称指示的数据块;其中,数据块名称可以用于指示数据块在物理磁盘的存储位置,根据数据块名称可以从物理磁盘读取对应的数据块。若数据块读取失败,则证明数据块不可用;若成功读取数据块,则证明数据块可用。
通过方块403,在遍历完数据块标识集合中的全部数据块名称之后,可以确定哪些数据块名称指示的数据块可用,哪些数据块名称指示的数据块不可用;在方块404,判断是否成功读取数据库标识集合指示的全部数据块,即判断数据块标识集合指示的数据块是否全部可用。若数据块标识集合指示的数据块全部可用,则可以确定这个磁盘的快照链中的所有磁盘快照的数据完整;若数据块标识集合指示的数据块存在不可用的,则可以根据不可用的数据块名称以及任一个磁盘快照的数据块标识列表,确定包括不可用的数据块名称的数据块标识列表,从而确定数据不完整的磁盘快照。
本实施例中,通过对数据块标识进行去重处理,在检验数据块是否可读时,每个数据块只被读取一次,避免了很多不必要的读取请求,提高了磁盘快照的数据完整性检验的效率。
如图5所示,本实施例描述将源存储域的一个磁盘的快照链中多个磁盘快照迁移到目的存储域的过程,其中,可以设定磁盘快照的迁移顺序。本实施例中,磁盘的快照链中包括多个磁盘快照,每个磁盘快照对应的元数据包括数据块标识列表,数据块标识列表中包括一个磁盘快照所用到的所有数据块名称。
如图5所示,在方块501,创建一个数据块标识集合,且数据块标识集合初始为空集;在方块502,遍历一个磁盘中的多个磁盘快照的数据块标识列表,将数据块标识集合中没有的数据块名称加入数据块标识集合;若没有遍历完成所有磁盘快照的数据块标识列表,则重复执行方块502。其中,在遍历完这个磁盘的所有磁盘快照的数据块标识列表后,得到的数据块标识集合就是一个经过去重后的集合。即最终得到的数据块标识集合中包括的数据块名称没有重复,且涵盖了这个磁盘的多个磁盘快照所用到的所有数据块名称。其中,在最终得到的数据块标识集合中的每个数据块标识对应有一个迁移标签,用于指示数据块名称对应的数据块是否已经迁移;在起始时(即未开始进行迁移时),数据块标识集合中的每个数据块标识的迁移标签均指示未迁移。一些实现方式中,迁移标签可以采用0或1表示,比如,当迁移标签为1时,指示已迁移,当迁移标签为0时,指示未迁移。然而,本申请对此并不限定。
在方块503,确定磁盘的多个磁盘快照的迁移顺序,其中,多个磁盘快照的迁移顺序可以根据业务需要指定,比如,优先迁移重要的快照;或者,也可以默认按照磁盘快照的时间先后顺序进行迁移。本申请对此并不限定。
在方块504,按照迁移顺序,遍历磁盘的多个磁盘快照的数据块标识列表;针对每个磁盘快照的数据块标识列表中的每个数据块名称,若在数据块标识集合内的一个数据块名称对应的迁移标签指示未迁移,则将该数据块标识指示的数据块从源存储域复制到目的存储域,并在数据块标识集合内更新该数据块名称对应的迁移标签指示已迁移;若在数据块标识集合内的一个数据块名称对应的迁移标签指示已迁移(即,说明该数据块名称指示的数据块已复制到目的存储域),则无需对该数据块名称指示的数据块进行复制。
在遍历完成一个磁盘快照的数据块标识列表之后,即确认一个磁盘快照的数据块均已复制到目的存储域,此时,在方块505,将磁盘快照的元数据(包括数据块标识列表)复制到目的存储域。如此,基于磁盘快照的元数据以及磁盘快照对应的数据块,可以在目的存储域重建这个磁盘快照。即一个磁盘快照迁移成功包括元数据以及全部数据块的复制完成。在方块506,判断是否成功迁移一个磁盘的全部磁盘快照,如果全部成功迁移,则确认完成这个磁盘的磁盘快照迁移,如果没有全部迁移,则需要遍历下一个磁盘快照的数据块标识列表,以判断下一个磁盘快照是否迁移完成。
在本实施例中,添加了去重处理的过程,每一个被重复引用的数据块,只拷贝一次,则可以极大地减少磁盘快照迁移的数据拷贝量,以避免重复的数据块被多次拷贝。而且,允许指定快照链的拷贝顺序,以优先备份重要的磁盘快照。
如图6所示,本实施例描述将源存储域的一个磁盘的快照链中多个磁盘快照迁移到目的存储域的过程,其中,可以设定磁盘快照的迁移顺序。本实施例与图5所示实施例的区别在于:在本实施例中,可以按照迁移顺序,进行数据块批量拷贝。
如图6所示,方块601、方块602及方块603的说明可以参照图5中的方块501、方块502及方块503的说明,故于此不再赘述。
在方块604,按照迁移顺序,复制任一个磁盘快照的数据块。其中,根据在方块603中确定的迁移顺序,可以遍历每一个磁盘快照,通过读取每个磁盘快照的元数据,获得每个磁盘快照的数据块标识列表。如此,结合迁移顺序和每个磁盘快照的数据块标识列表,可以确定哪些数据块名称对应的数据块需要先进行拷贝,并对这些数据块进行批量拷贝。
需要注意的是,在将数据块复制到目的存储域之后,需要在数据块标识集合中将指示该数据块的数据块名称对应的迁移标签更新为指示已迁移,比如,将迁移标签从0更新为1(迁移标签为1,指示已迁移;迁移标签为0,指示未迁移)。
在方块605,判断数据块标识集合中任一个数据块名称对应的迁移标签是否均指示已迁移,若全部指示已迁移,则说明数据块复制完成,若还存在数据块名称对应的迁移标签指示未迁移,则说明数据块没有复制完成,继续执行方块604。
在方块606,按照迁移顺序,遍历多个磁盘快照的数据块标识列表,根据数据块标识集合中任一个数据块名称对应的迁移标签,确定磁盘快照的数据块标识列表中的数据块名称指示的数据块是否全部迁移完成。其中,对于每个数据块标识列表中的每个数据块名称,在数据块标识集合中找到该数据块名称,并读取该数据块名称的迁移标签,若该数据块名称的迁移标签指示未迁移,则将该数据块名称指示的数据块复制到目的存储域,并修改迁移标签指示已迁移;若该数据块名称的迁移标签指示已迁移,则不再对该数据块名称指示的数据块进行迁移处理。在遍历完成一个数据块标识列表之后,即确认一个磁盘快照的数据块均已复制到目的存储域之后,在方块607,将磁盘快照的元数据(包括数据块标识列表)复制到目的存储域。如此,基于磁盘快照的元数据以及磁盘快照对应的数据块,可以在目的存储域重建这个快照。
在方块608,判断是否成功迁移一个磁盘的全部磁盘快照,如果全部成功迁移,则确认完成这个磁盘的磁盘快照迁移,如果没有全部迁移,则需要遍历下一个磁盘快照的数据块标识列表,以判断下一个磁盘快照是否迁移完成。
需要说明的是,方块604和方块606可以同时开始进行执行,或者,在方块604执行一段时间之后开始执行方块606。
在本实施例中,添加了去重处理的过程,每一个被重复引用的数据块,只拷贝一次,则可以极大地减少快照迁移的数据拷贝量,以避免重复的数据块被多次拷贝。而且,允许指定快照链的拷贝顺序,以优先备份重要快照。磁盘快照不存在先后依赖顺序,多个磁盘快照可同时进行迁移。本实施例中,将数据块拷贝过程和磁盘快照对应的数据块是否全部拷贝的检测过程分开执行,能够实现数据块的批量拷贝,从而提高数据迁移效率。
在图5和图6所示的实施例中,去重后的数据块标识集合中,能够明确标定每一个数据块是否已经拷贝,只要一个磁盘快照所用到的数据块都已经拷到目的存储域,就能在目的存储域重建出这个磁盘快照。即在对快照链进行数据块名称的去重处理后,对数据块标识集合的复制顺序没有要求,因此,可以允许指定磁盘快照的迁移顺序,优先把 重要磁盘快照对应的数据块和元数据拷贝至目的存储域,这样可以优先在目的存储域恢复出完整的重要磁盘快照,如此提高了磁盘快照迁移的灵活性。
如图7所示,本实施例描述将源存储域的一个磁盘的多个磁盘快照迁移到目的存储域的过程,其中,磁盘快照迁移不存在指定的优选顺序。本实施例中,磁盘的快照链中包括多个磁盘快照,每个磁盘快照对应的元数据块包括数据块标识列表,数据块标识列表中包括一个磁盘快照所用到的所有数据块名称。
如图7所示,方块701和方块702的说明可以参照图5中的方块501和方块502的说明,故于此不再赘述。
在方块703,批量将数据块标识集合中的多个数据块名称指示的数据块复制到目的存储域;本实施例通过批量拷贝提高数据块复制的并发能力。
在方块704,在完成全部数据块的复制之后,将多个磁盘快照的元数据复制到目的存储域,以便基于元数据和已复制的数据块,在目的存储域重建出磁盘的多个磁盘快照。
如图8所示,本实施例描述将源存储域内的多个磁盘的快照链迁移到目的存储域的过程。其中,每个磁盘的快照链中包括多个磁盘快照,每个磁盘快照对应的元数据块包括数据块标识列表,数据块标识列表中包括一个磁盘快照所用到的所有数据块名称。
如图8所示,在方块801,创建磁盘和磁盘快照的映射集合,初始为空集;在方块802,遍历源存储域上待迁移的磁盘快照列表,将没有在映射集合内的磁盘名称加入映射集合,并在映射集合记录每个磁盘名称对应的磁盘快照;如此,根据映射集合可知其中的每一个磁盘的一个或多个磁盘快照待迁移,即映射集合的每个磁盘名称对应有一份待迁移的磁盘快照列表。
比如,当前源目的存储域内待迁移的磁盘快照包括以下三个:磁盘快照甲、磁盘快照乙、磁盘快照丙;假设磁盘快照甲属于磁盘x,磁盘快照乙属于磁盘x,磁盘快照丙属于磁盘y,则在映射集合中,可以得到磁盘集合{x,y},其中,磁盘x对应的磁盘快照为甲、乙;磁盘y对应的磁盘快照为丙。
在方块803,遍历映射集合,对映射集合内的每个磁盘执行磁盘快照迁移流程。其中,每个磁盘的磁盘快照迁移流程可以参照图5至图7任一实施例的描述。故于此不再赘述。
此外,本申请实施例还提供一种方法,包括:获得磁盘快照对应的数据块的数据块标识,对数据块标识进行去重;根据去重后的数据块标识,进行以下至少一项处理:检验磁盘快照的数据完整性、跨存储域迁移磁盘快照。
关于对磁盘快照对应的数据块的数据块标识的去重操作可以参照上述实施例中对去重单元1264的操作说明,基于去重后的数据块标识进行的处理可以参照上述实施例中对快照完整性检验单元1266和快照迁移单元1268的操作说明,故于此不再赘述。
图9为根据各种实施例的一系统900的示例图。系统900可包含一个或多个处理器904、与处理器904中的至少一个耦合的系统控制逻辑908、与系统控制逻辑908耦合的系统存储器(memory)912、与系统控制逻辑908耦合的非易失存储器(NVM,Non-Volatile Memory)/存储装置(storage)916、与系统控制逻辑908耦合的网络接口920以及与系统控制逻辑908耦合的输入/输出(I/O)装置932。
处理器904可以包括一个或多个单核或多核处理器。处理器904可包括通用处理器和专用处理器(例如,图形处理器、应用处理器、基带处理器等)的任意组合。
在一实施例中,系统控制逻辑908可包括任何适合的接口控制器,用于提供任何适合的接口给处理器904中的至少一个和/或与系统控制逻辑908通信的任何适合的装置或部件。
在一实施例中,系统控制逻辑908可以包括一或多个存储器控制器,用于提供接口给系统存储器912。系统存储器912可以用于加载和存储给系统900的数据和/或指令,例如指令924。在一实施例中,系统存储器912可以包括任意合适的易失性存储器,例如适合的动态随机存取存储器(DRAM)等。
NVM/存储装置916可以包括一或多个有形的非暂时计算机可读介质,用于存储数据和/或指令,例如指令924。NVM/存储装置916可以包括任意合适的非易失性存储器,比如闪速存储器等,和/或可包括任意合适的非易失性存储装置,比如一个或多个硬盘驱动器(HDDs)、一个或多个压缩盘(CD)驱动器,和/或一个或多个数字多用途盘(DVD)驱动器等。
NVM/存储装置916可包括物理上是系统900安装在其上的装置的一部分的存储资源,或者它能够由该装置访问而不必定为该装置的一部份。例如,NVM/存储装置916可经由网络接口920通过网络指令和/或通过输入/输出装置932访问。
当由处理器904中的一个或多个执行指令924时,可引起系统900实施如图3至图8中任一实施例所述的方法。在各种实施例中,指令924,或其硬件、固体、和/或软件部份,可设置在系统900的另外/备选元件中。
网络接口920可以具有一收发器来提供无线电接口给系统900,用于通过一或多个网络通信和/或与任意其他适合的装置通信。在各种实施例中,收发器可以与系统900的 其他部件集成。例如,收发器可以包括处理器904的处理器、系统存储器912的存储器和NVM/存储装置916的NVM/存储装置。网络接口920可包括任意合适的硬件和/或固体。网络接口920可包括多个天线,用于提供多个输入、多个输出无线电接口。在一实施例中,网络接口920可以包括:有线网络适配器、无线网络适配器、电话调制解调器和/或无线调制解调器。
在一实施例中,处理器904的至少一个可与系统控制逻辑908的一个或多个控制器的逻辑封装在一起。在一实施例中,处理器904的至少一个可与系统控制逻辑908的一个或多个控制器的逻辑封装在一起来形成系统级封装(SiP)。在一实施例中,处理器904中的至少一个可集成在与系统控制逻辑908的一或多个控制器的逻辑相同的芯片上。在一实施例中,处理器904的至少一个可集成在与系统控制逻辑908的一或多个控制器的逻辑相同的芯片上来形成芯片上系统(SoC)。
在各种实施例中,输入/输出装置932可包括设计成实现与系统900的用户交互的用户界面、设计成实现与系统900的外围部件交互的外围部件接口和/或设计成确定涉及系统900的环境条件和/或位点信息的传感器。
在各种实施例中,用户界面可以包括但不限于:显示器(例如,液晶显示器、触摸屏显示器等)、扬声器、麦克风、一个或多个拍摄装置(例如,照相机和/或摄像机)、闪光灯(例如,发光二极管闪光灯)、以及键盘。
在各种实施例中,外围部件接口可以包括但不限于:非易失性存储器端口、通用串行总线(USB)端口、音频插口以及电力供应接口。
在各种实施例中,传感器可以包括但不限于:陀螺仪传感器、加速度计、接近传感器、环境光传感器以及定位单元。定位单元也可为网络接口920的一部份或者与网络接口920交互来与例如全球定位系统(GPS)卫星的定位网络的部件通信。
在各种实施例中,系统900可以是移动计算装置。在各种实施例中,系统900可具有更多或更少的部件,和/或不同的架构。
下面通过多个示例实施例进行说明。
在示例实施例一中,一种方法,包括:根据磁盘快照对应的数据块的数据块标识,判断是否有重复的数据块标识;如果有重复的数据块标识,删除所重复的数据块标识,以获得去重后的数据块标识;根据去重后的数据块标识,进行以下至少一项处理:检验所述磁盘快照的数据完整性、跨存储域迁移所述磁盘快照。
在示例实施例二中,根据示例性实施例一所述的方法,根据磁盘快照对应的数据块 的数据块标识,判断是否有重复的数据块标识,可以包括:
根据磁盘快照链内的磁盘快照对应的数据块的数据块标识,判断所述磁盘快照链内不同磁盘快照对应的数据块是否有重复的数据块标识。
在示例实施例三中,根据示例性实施例二所述的方法,根据去重后的数据块标识,进行以下至少一项处理,可以包括:
根据磁盘快照链对应的去重后的数据块标识以及磁盘快照链内的一个或多个磁盘快照,进行以下至少一项处理:检验磁盘快照的数据完整性、跨存储域迁移磁盘快照。
在示例实施例四中,根据示例性实施例一所述的方法,任一磁盘快照对应的数据块的数据块标识存储在一个数据块标识列表中;
根据磁盘快照对应的数据块的数据块标识,判断是否有重复的数据块标识,如果有重复的数据块标识,删除所重复的数据块标识,以获得去重后的数据块标识,可以包括:
创建初始为空集的数据块标识集合;
遍历多个磁盘快照的数据块标识列表,将所述数据块标识集合中没有的数据块标识添加到所述数据块标识集合中,根据遍历所述多个磁盘快照后得到的所述数据块标识集合,确定所述多个磁盘快照对应的去重后的数据块标识。
在示例实施例五中,根据示例性实施例一所述的方法,所述根据去重后的数据块标识,检验所述磁盘快照的数据完整性,可以包括:读取去重后的数据块标识指示的数据块;若成功读取去重后的每个数据块标识指示的数据块,则确定所述磁盘快照的数据完整;若读取去重后的至少一个数据块标识指示的数据块失败,则确定对应所述数据块标识指示的数据块的磁盘快照的数据不完整。
在示例实施例六中,根据示例性实施例一所述的方法,所述根据去重后的数据块标识,跨存储域迁移所述磁盘快照之前,所述方法还可以包括:确定磁盘的多个磁盘快照的迁移顺序;
所述根据去重后的数据块标识,跨存储域迁移所述磁盘快照,包括:
按照所述迁移顺序,将去重后的数据块标识指示的数据块从源存储域复制到目的存储域;将所述磁盘快照的元数据从源存储域复制到目的存储域,以便基于所述磁盘快照的元数据和所述磁盘快照对应的数据块在所述目的存储域重建所述磁盘快照。
在示例实施例七中,根据示例性实施例六所述的方法,所述去重后的每个数据块标识可以对应有一个迁移标签;
所述按照所述迁移顺序,将去重后的数据块标识指示的数据块从源存储域复制到目 的存储域,可以包括:
按照所述迁移顺序,遍历去重后的数据块标识;针对每个数据块标识,若所述数据块标识对应的迁移标签指示未迁移,则将所述数据块标识指示的数据块从源存储域复制到目的存储域,并更新所述数据块标识对应的迁移标签指示已迁移,若所述数据块标识对应的迁移标签指示已迁移,则无需对所述数据块标识指示的数据块进行复制。
在示例实施例八中,根据示例性实施例一所述的方法,所述根据去重后的数据块标识,跨存储域迁移所述磁盘快照,包括:将去重后的数据块标识指示的数据块从源存储域复制到目的存储域;将所述磁盘快照的元数据从源存储域复制到目的存储域,以便基于所述磁盘快照的元数据和所述磁盘快照对应的数据块在所述目的存储域重建所述磁盘快照。
在示例实施例九中,根据示例性实施例一至八任一项所述的方法,所述方法还可以包括:在跨存储域迁移多个磁盘的磁盘快照之前,确定任一个磁盘与磁盘快照的映射关系。
在示例实施例十中,一种装置,包括:快照完整性检测单元和快照迁移单元中至少一项以及去重单元;其中,去重单元,用于根据磁盘快照对应的数据块的数据块标识,判断是否有重复的数据块标识;如果有重复的数据块标识,删除所重复的数据块标识,以获得去重后的数据块标识;快照完整性检验单元,用于根据去重后的数据块标识,检验所述磁盘快照的数据完整性;快照迁移单元,用于根据去重后的数据块标识,跨存储域迁移所述磁盘快照。
在示例实施例十一中,根据示例实施例十所述的装置,所述去重单元,用于通过以下方式根据磁盘快照对应的数据块的数据块标识,判断是否有重复的数据块标识:
根据磁盘快照链内的磁盘快照对应的数据块的数据块标识,判断所述磁盘快照链内不同磁盘快照对应的数据块是否有重复的数据块标识。
在示例实施例十二中,根据示例实施例十所述的装置,任一磁盘快照对应的数据块的数据块标识存储在一个数据块标识列表中;
所述去重单元,用于通过以下方式获得去重后的数据块标识:
创建初始为空集的数据块标识集合;
遍历多个磁盘快照的数据块标识列表,将所述数据块标识集合中没有的数据块标识添加到所述数据块标识集合中,根据遍历所述多个磁盘快照后得到的所述数据块标识集合,确定所述多个磁盘快照对应的去重后的数据块标识。
在示例实施例十三中,根据示例实施例十所述的装置,快照完整性检验单元,可以用于通过以下方式根据去重后的数据块标识,检验所述磁盘快照的数据完整性:读取去重后的数据块标识指示的数据块;若成功读取去重后的每个数据块标识指示的数据块,则确定所述磁盘快照的数据完整;若读取去重后的至少一个数据块标识指示的数据块失败,则确定对应所述数据块标识指示的数据块的磁盘快照的数据不完整。
在示例实施例十四中,根据示例实施例十所述的装置,所述快照迁移单元还可以用于确定多个磁盘快照的迁移顺序;
所述快照迁移单元可以用于通过以下方式根据去重后的数据块标识,跨存储域迁移所述磁盘快照:
按照所述迁移顺序,将去重后的数据块标识指示的数据块从源存储域复制到目的存储域;将所述磁盘快照的元数据从源存储域复制到目的存储域,以便基于所述磁盘快照的元数据和所述磁盘快照对应的数据块在所述目的存储域重建所述磁盘快照。
在示例实施例十五中,根据示例实施例十四所述的装置,去重后的每个数据块标识可以对应有一个迁移标签;
所述快照迁移单元可以用于通过以下方式按照所述迁移顺序,将去重后的数据块标识指示的数据块从源存储域复制到目的存储域:
按照所述迁移顺序,遍历去重后的数据块标识;针对每个数据块标识,若所述数据块标识对应的迁移标签指示未迁移,则将所述数据块标识指示的数据块从源存储域复制到目的存储域,并更新所述数据块标识对应的迁移标签指示已迁移,若所述数据块标识对应的迁移标签指示已迁移,则无需对所述数据块标识指示的数据块进行复制。
在示例实施例十六中,一种系统,包括:一个或多个处理器;以及一个或多个存储有多个指令的机器可读介质,当所述多个指令被所述一个或多个处理器执行时,使得所述系统实现示例实施例一至九中任一示例实施例所述的方法。
在示例实施例十七中,一种存储有多个指令的机器可读介质,当多个指令被一个或多个处理器执行时实现示例实施例一至九中任一示例实施例所述的方法。
在示例性实施例十八中,一种方法,包括:获得磁盘快照对应的数据块的数据块标识,对所述数据块标识进行去重;根据去重后的数据块标识,进行以下至少一项处理:检验所述磁盘快照的数据完整性、跨存储域迁移所述磁盘快照。
以上显示和描述了本申请的基本原理和主要特征和本申请的优点。本申请不受上述实施例的限制,上述实施例和说明书中描述的只是说明本申请的原理,在不脱离本申请 精神和范围的前提下,本申请还会有各种变化和改进,这些变化和改进都落入要求保护的本申请范围内。

Claims (18)

  1. 一种方法,包括:
    根据磁盘快照对应的数据块的数据块标识,判断是否有重复的数据块标识;如果有重复的数据块标识,删除所重复的数据块标识,以获得去重后的数据块标识;
    根据去重后的数据块标识,进行以下至少一项处理:检验所述磁盘快照的数据完整性、跨存储域迁移所述磁盘快照。
  2. 根据权利要求1所述的方法,其特征在于,所述根据磁盘快照对应的数据块的数据块标识,判断是否有重复的数据块标识,包括:
    根据磁盘快照链内的磁盘快照对应的数据块的数据块标识,判断所述磁盘快照链内不同磁盘快照对应的数据块是否有重复的数据块标识。
  3. 根据权利要求2所述的方法,其特征在于,所述根据去重后的数据块标识,进行以下至少一项处理,包括:
    根据所述磁盘快照链对应的去重后的数据块标识以及所述磁盘快照链内的一个或多个磁盘快照,进行以下至少一项处理:检验所述磁盘快照的数据完整性、跨存储域迁移所述磁盘快照。
  4. 根据权利要求1所述的方法,其特征在于,任一磁盘快照对应的数据块的数据块标识存储在一个数据块标识列表中;
    所述根据磁盘快照对应的数据块的数据块标识,判断是否有重复的数据块标识,如果有重复的数据块标识,删除所重复的数据块标识,以获得去重后的数据块标识,包括:
    创建初始为空集的数据块标识集合;遍历多个磁盘快照的数据块标识列表,将所述数据块标识集合中没有的数据块标识添加到所述数据块标识集合中,根据遍历所述多个磁盘快照后得到的所述数据块标识集合,确定所述多个磁盘快照对应的去重后的数据块标识。
  5. 根据权利要求1所述的方法,其特征在于,所述根据去重后的数据块标识,检验所述磁盘快照的数据完整性,包括:
    读取去重后的数据块标识指示的数据块;若成功读取去重后的每个数据块标识指示的数据块,则确定所述磁盘快照的数据完整;若读取去重后的至少一个数据块标识指示的数据块失败,则确定对应所述数据块标识指示的数据块的磁盘快照的数据不完整。
  6. 根据权利要求1所述的方法,其特征在于,所述根据去重后的数据块标识,跨存储域迁移所述磁盘快照之前,所述方法还包括:确定多个磁盘快照的迁移顺序;
    所述根据去重后的数据块标识,跨存储域迁移所述磁盘快照,包括:
    按照所述迁移顺序,将去重后的数据块标识指示的数据块从源存储域复制到目的存储域;将所述磁盘快照的元数据从源存储域复制到目的存储域,以便基于所述磁盘快照的元数据和所述磁盘快照对应的数据块在所述目的存储域重建所述磁盘快照。
  7. 根据权利要求6所述的方法,其特征在于,所述去重后的每个数据块标识对应有一个迁移标签;
    所述按照所述迁移顺序,将去重后的数据块标识指示的数据块从源存储域复制到目的存储域,包括:
    按照所述迁移顺序,遍历去重后的数据块标识;针对每个数据块标识,若所述数据块标识对应的迁移标签指示未迁移,则将所述数据块标识指示的数据块从源存储域复制到目的存储域,并更新所述数据块标识对应的迁移标签指示已迁移,若所述数据块标识对应的迁移标签指示已迁移,则无需对所述数据块标识指示的数据块进行复制。
  8. 根据权利要求1所述的方法,其特征在于,所述根据去重后的数据块标识,跨存储域迁移所述磁盘快照,包括:
    将去重后的数据块标识指示的数据块从源存储域复制到目的存储域;
    将所述磁盘快照的元数据从源存储域复制到目的存储域,以便基于所述磁盘快照的元数据和所述磁盘快照对应的数据块在所述目的存储域重建所述磁盘快照。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,所述方法还包括:在跨存储域迁移多个磁盘的磁盘快照之前,确定任一个磁盘与磁盘快照的映射关系。
  10. 一种装置,其特征在于,包括:快照完整性检测单元和快照迁移单元中至少一项以及去重单元;
    所述去重单元,用于根据磁盘快照对应的数据块的数据块标识,判断是否有重复的数据块标识;如果有重复的数据块标识,删除所重复的数据块标识,以获得去重后的数据块标识;
    所述快照完整性检验单元,用于根据去重后的数据块标识,检验所述磁盘快照的数据完整性;
    所述快照迁移单元,用于根据去重后的数据块标识,跨存储域迁移所述磁盘快照。
  11. 根据权利要求10所述的装置,其特征在于,所述去重单元,用于通过以下方式根据磁盘快照对应的数据块的数据块标识,判断是否有重复的数据块标识:根据磁盘快照链内的磁盘快照对应的数据块的数据块标识,判断所述磁盘快照链内不同磁盘快照 对应的数据块是否有重复的数据块标识。
  12. 根据权利要求10所述的装置,其特征在于,任一磁盘快照对应的数据块的数据块标识存储在一个数据块标识列表中;
    所述去重单元,用于通过以下方式获得去重后的数据块标识:
    创建初始为空集的数据块标识集合;遍历多个磁盘快照的数据块标识列表,将所述数据块标识集合中没有的数据块标识添加到所述数据块标识集合中,根据遍历所述多个磁盘快照后得到的所述数据块标识集合,确定所述多个磁盘快照对应的去重后的数据块标识。
  13. 根据权利要求10所述的装置,其特征在于,所述快照完整性检验单元,用于通过以下方式根据去重后的数据块标识,检验所述磁盘快照的数据完整性:
    读取去重后的数据块标识指示的数据块;若成功读取去重后的每个数据块标识指示的数据块,则确定所述磁盘快照的数据完整;若读取去重后的至少一个数据块标识指示的数据块失败,则确定对应所述数据块标识指示的数据块的磁盘快照的数据不完整。
  14. 根据权利要求10所述的装置,其特征在于,所述快照迁移单元还用于确定多个磁盘快照的迁移顺序;
    所述快照迁移单元用于通过以下方式根据去重后的数据块标识,跨存储域迁移所述磁盘快照:按照所述迁移顺序,将去重后的数据块标识指示的数据块从源存储域复制到目的存储域;将所述磁盘快照的元数据从源存储域复制到目的存储域,以便基于所述磁盘快照的元数据和所述磁盘快照对应的数据块在所述目的存储域重建所述磁盘快照。
  15. 根据权利要求14所述的装置,其特征在于,所述去重后的每个数据块标识对应有一个迁移标签;
    所述快照迁移单元用于通过以下方式按照所述迁移顺序,将去重后的数据块标识指示的数据块从源存储域复制到目的存储域:
    按照所述迁移顺序,遍历去重后的数据块标识;针对每个数据块标识,若所述数据块标识对应的迁移标签指示未迁移,则将所述数据块标识指示的数据块从源存储域复制到目的存储域,并更新所述数据块标识对应的迁移标签指示已迁移,若所述数据块标识对应的迁移标签指示已迁移,则无需对所述数据块标识指示的数据块进行复制。
  16. 一种系统,其特征在于,包括:
    一个或多个处理器;以及
    一个或多个存储有多个指令的机器可读介质,当所述多个指令被所述一个或多个处 理器执行时使得所述系统实现权利要求1至9中任一项所述的方法。
  17. 一种存储有多个指令的机器可读介质,当所述多个指令被一个或多个处理器执行时实现权利要求1至9中任一项所述的方法。
  18. 一种方法,包括:
    获得磁盘快照对应的数据块的数据块标识,对所述数据块标识进行去重;
    根据去重后的数据块标识,进行以下至少一项处理:检验所述磁盘快照的数据完整性、跨存储域迁移所述磁盘快照。
PCT/CN2018/109933 2017-10-23 2018-10-12 一种磁盘快照的数据处理方法及装置 WO2019080717A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710994265.1 2017-10-23
CN201710994265.1A CN109697021A (zh) 2017-10-23 2017-10-23 一种磁盘快照的数据处理方法及装置

Publications (1)

Publication Number Publication Date
WO2019080717A1 true WO2019080717A1 (zh) 2019-05-02

Family

ID=66226809

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/109933 WO2019080717A1 (zh) 2017-10-23 2018-10-12 一种磁盘快照的数据处理方法及装置

Country Status (2)

Country Link
CN (1) CN109697021A (zh)
WO (1) WO2019080717A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113031851A (zh) * 2019-12-25 2021-06-25 阿里巴巴集团控股有限公司 数据快照方法、装置及设备
CN114077569B (zh) * 2020-08-18 2023-07-18 富泰华工业(深圳)有限公司 压缩数据的方法及设备、解压缩数据的方法及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101650679A (zh) * 2009-07-27 2010-02-17 浪潮电子信息产业股份有限公司 一种基于磁盘io读写变化的高效快照技术
CN102081552A (zh) * 2009-12-01 2011-06-01 华为技术有限公司 一种物理机到虚拟机的在线迁移方法、装置和系统
US20110258239A1 (en) * 2010-04-19 2011-10-20 Greenbytes, Inc. Method of minimizing the amount of network bandwidth needed to copy data between data deduplication storage systems
CN102378969A (zh) * 2009-03-30 2012-03-14 惠普开发有限公司 拷贝卷中存储的数据的去重复
CN105095016A (zh) * 2014-05-16 2015-11-25 北京云巢动脉科技有限公司 一种磁盘快照回滚方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100504800C (zh) * 2006-12-15 2009-06-24 英业达股份有限公司 磁盘快照的方法
US20120159098A1 (en) * 2010-12-17 2012-06-21 Microsoft Corporation Garbage collection and hotspots relief for a data deduplication chunk store
US9633033B2 (en) * 2013-01-11 2017-04-25 Commvault Systems, Inc. High availability distributed deduplicated storage system
CN104484480B (zh) * 2014-12-31 2018-06-05 华为技术有限公司 基于重复数据删除的远程复制方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102378969A (zh) * 2009-03-30 2012-03-14 惠普开发有限公司 拷贝卷中存储的数据的去重复
CN101650679A (zh) * 2009-07-27 2010-02-17 浪潮电子信息产业股份有限公司 一种基于磁盘io读写变化的高效快照技术
CN102081552A (zh) * 2009-12-01 2011-06-01 华为技术有限公司 一种物理机到虚拟机的在线迁移方法、装置和系统
US20110258239A1 (en) * 2010-04-19 2011-10-20 Greenbytes, Inc. Method of minimizing the amount of network bandwidth needed to copy data between data deduplication storage systems
CN105095016A (zh) * 2014-05-16 2015-11-25 北京云巢动脉科技有限公司 一种磁盘快照回滚方法及装置

Also Published As

Publication number Publication date
CN109697021A (zh) 2019-04-30

Similar Documents

Publication Publication Date Title
US10887393B2 (en) Data storage device with embedded software
US9411821B1 (en) Block-based backups for sub-file modifications
US8473462B1 (en) Change tracking for shared disks
US11093162B2 (en) Method and apparatus for deleting cascaded snapshot
US9176853B2 (en) Managing copy-on-writes to snapshots
WO2015023897A1 (en) Address translation for a non-volatile memory storage device
US9904482B1 (en) Method and system to protect applications configured on cluster-shared volumes seamlessly
US11210177B2 (en) System and method for crash-consistent incremental backup of cluster storage
WO2019080717A1 (zh) 一种磁盘快照的数据处理方法及装置
US9229814B2 (en) Data error recovery for a storage device
JP2017531892A (ja) ブロックレベル記憶デバイスのスナップショットを実行するための改善された装置および方法
US10496493B1 (en) Method and system for restoring applications of particular point in time
CN108573049B (zh) 数据处理方法和分布式存储装置
US11010332B2 (en) Set-based mutual exclusion using object metadata tags in a storage appliance
US9740596B1 (en) Method of accelerated test automation through unified test workflows
US10372683B1 (en) Method to determine a base file relationship between a current generation of files and a last replicated generation of files
US10372607B2 (en) Systems and methods for improving the efficiency of point-in-time representations of databases
US11593230B2 (en) Efficient mechanism for data protection against cloud region failure or site disasters and recovery time objective (RTO) improvement for backup applications
US11847334B2 (en) Method or apparatus to integrate physical file verification and garbage collection (GC) by tracking special segments
US10503417B2 (en) Data element validation in consistency groups

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18870522

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18870522

Country of ref document: EP

Kind code of ref document: A1