CN110457163B - Data recovery method and device for distributed block storage and storage medium - Google Patents

Data recovery method and device for distributed block storage and storage medium Download PDF

Info

Publication number
CN110457163B
CN110457163B CN201910605476.0A CN201910605476A CN110457163B CN 110457163 B CN110457163 B CN 110457163B CN 201910605476 A CN201910605476 A CN 201910605476A CN 110457163 B CN110457163 B CN 110457163B
Authority
CN
China
Prior art keywords
storage unit
file
storage
files
reading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910605476.0A
Other languages
Chinese (zh)
Other versions
CN110457163A (en
Inventor
王刚
张天炯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Yuanhen Cloud Technology Co ltd
Original Assignee
Suzhou Yuanhen Cloud Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Yuanhen Cloud Technology Co ltd filed Critical Suzhou Yuanhen Cloud Technology Co ltd
Priority to CN201910605476.0A priority Critical patent/CN110457163B/en
Publication of CN110457163A publication Critical patent/CN110457163A/en
Application granted granted Critical
Publication of CN110457163B publication Critical patent/CN110457163B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data recovery method and device for distributed block storage and a storage medium. The data recovery method of the distributed block storage comprises the following steps: s1, reading a disk of a storage server, acquiring an original storage unit file set of a block storage device mirror image, and removing duplicate of redundant files in the storage unit file set; s2, writing zero padding is carried out on the storage unit files in the storage unit file set which do not reach the maximum value of the size of the storage unit files; s3 renaming the storage unit files according to the logical address sequence and the serial numbers in sequence, reading the storage unit files according to the size of the stripe unit and the width of the stripe and writing the storage unit files into a new image file; and S4, restoring the new image file to the block storage device. By reading the storage unit file set and performing the operations of de-duplication, zero-writing and recombination on the storage unit files, the data recovery of distributed block storage under disaster conditions is solved, and the data safety is ensured.

Description

Data recovery method and device for distributed block storage and storage medium
Technical Field
The present invention relates to the field of data storage technologies, and in particular, to a data recovery method, an apparatus, and a storage medium for distributed block storage.
Background
Distributed storage is a data storage technology, which uses disk space on each machine in an enterprise through a network and forms a virtual storage device with these distributed storage resources, and data is distributed and stored in each corner of the enterprise.
The key technology of distributed storage is data metadata management, storage system elastic expansion technology, fault self-healing, storage performance technology and the like to meet the requirements of high availability, high reliability and high performance of a storage system. According to the storage protocol classification, distributed storage can be divided into a distributed file system, distributed block storage and distributed object storage. The distributed block storage always consists of logically continuous mirror image files, and the mirror image files are dispersed on different physical disks according to the storage units and the number of copies. In order to improve the parallel efficiency of reading and writing, the distributed block memory mirror generally organizes data in a striped manner.
In the data age, distributed block storage is the infrastructure in cloud storage. The block device provides services for the application through the IP-SAN or the FC-SAN, wherein the application scene of the block storage mainly provides virtualized storage for the cloud host, which inevitably requires the data security of distributed storage, and the data recovery can be performed under various faults and disasters. Under the common fault, the distributed storage system has the capability of data automatic recovery, such as consistency, multiple copies, fault self-healing and the like. But in extreme or special disaster situations, such as metadata loss, running of the operating system or the storage software, partial data loss of the hard disk, and the like, the data in the hard disk is difficult to recover.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art. Therefore, an object of the present invention is to provide a data recovery method, apparatus and storage medium for distributed block storage, which are beneficial to solve the problem of data recovery of distributed block storage in a disaster situation, and ensure the safety of data.
To this end, a first object of the present invention is to provide a data recovery method for distributed block storage.
The technical scheme adopted by the invention is as follows:
in a first aspect, the present invention provides a data recovery method for distributed block storage, including the following steps:
s1, reading a disk of a storage server, acquiring an original storage unit file set of a block storage device mirror image, and removing duplicate of redundant files in the storage unit file set;
s2, writing zero padding is carried out on the storage unit files in the storage unit file set which do not reach the maximum value of the size of the storage unit files;
s3 renaming the storage unit files according to the logical address sequence and the serial numbers in sequence, reading the storage unit files according to the size of the stripe unit and the width of the stripe and writing the storage unit files into a new image file;
and S4, restoring the new image file to the block storage device.
Further, the S1 includes the following steps:
s11, reading the disk of the storage server and acquiring the original storage unit file set mirrored from the block storage device;
s12, dividing the storage unit files in the storage unit file set into a group with the same size;
s13, comparing the logical addresses of the storage unit files with the same group size, and deleting the storage unit file with the repeated logical addresses if the logical addresses are repeated.
Further, the S3 includes the following steps:
s31, calculating the times M of reading the storage unit file set according to the size relation between the storage unit file set and the storage unit file, and calculating the times Q of reading the storage unit file according to the size relation between the storage unit file and the stripe unit;
s32, reading the file from the storage unit file according to the strip width K and the size of the strip unit in sequence and writing the file into a new image file, and repeating the process for Q times;
and S33, reading the rest storage unit files according to the step S32, repeating the step M times, and writing the rest storage unit files into a new image file.
Further, the S4 includes the following steps:
s41, mapping the new mirror image file to local loopback equipment, and mounting the new mirror image file to a local directory;
And S42, accessing the data in the new image file in a file system mode.
Further, the S4 includes the following steps:
s41, mapping the new image file to local loopback equipment;
and S42, the loopback device is exported according to the access protocol of the block storage device to access the data of the new image file.
In a second aspect, the present invention provides a data recovery apparatus for distributed block storage, including:
the acquisition duplication removal module is used for acquiring a storage unit file set in the block storage equipment, deleting repeated storage unit files in the acquired storage unit file set and only reserving a complete storage unit file set;
the preprocessing module is used for performing zero writing operation on the storage unit files which do not reach the maximum value of the storage unit files and sequentially renaming and numbering the storage unit files according to logical addresses;
the assembling module is used for reassembling the storage unit files in the preprocessing module into a new image file;
and the restoring module is used for restoring the new image file to the block storage device.
Further, the acquire deduplication module comprises:
the acquisition unit is used for reading a disk of the storage server, acquiring a storage unit file set in the block storage device, and grouping the storage unit files in the storage unit file set according to the same file size;
And the deduplication unit is used for comparing the logical addresses of the storage unit files in the same group and deleting the storage unit files with the same logical addresses.
Further, the assembly module includes:
the assembly module includes:
the reading calculation unit is used for calculating the times M of reading the storage unit file set according to the maximum value of the size of the storage unit file set and the size of the storage unit file, and calculating the times Q of reading the storage unit file according to the size of the storage unit file and the size of the strip unit;
and the writing unit is used for reading the file from the storage unit file according to the size of the stripe unit and the width K of the stripe and writing the file into the new image file to form the new image file.
Further, the recovery module includes:
the mapping unit is used for mapping the new mirror image file to local loopback equipment, and the local loopback equipment mounts a local directory;
and the access unit is used for accessing the data in a file system mode.
A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any of claims 1 to 5.
The invention has the beneficial effects that:
the invention obtains the data of the storage unit file of the block storage device from the disk, and preprocesses the collected storage unit file set, wherein the preprocessing comprises zero writing filling, duplicate removal and renumbering of the storage unit file, then the reassembly of the storage unit file is carried out according to the strip unit and the strip width, and finally the original storage unit file is recovered. Therefore, the method for efficiently and accurately recovering the data by the distributed block storage under disasters such as running bursting of storage system software is provided.
Drawings
FIG. 1 is a block diagram of the first and second embodiments of the present invention;
FIG. 2 is a flow chart of a first embodiment and a second embodiment of the present invention;
fig. 3 is a flowchart of step S3 in the first embodiment and the second embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The first embodiment is as follows: referring to fig. 1, the present invention discloses a data recovery apparatus for distributed block storage, including: the method comprises the steps of obtaining a duplicate removal module 1, a preprocessing module 2, an assembly module 3 and a recovery module 4, wherein the obtained duplicate removal module 1 is used for obtaining a storage unit file set in a block storage device, deleting repeated storage unit files in the obtained storage unit file set, and only reserving a complete storage unit file set; the preprocessing module 2 is used for performing zero writing operation on the storage unit files which do not reach the maximum value of the storage unit files, and sequentially renaming and numbering the storage unit files according to the logical addresses; the assembling module 3 is used for reassembling the storage unit files in the preprocessing module 2 into a new image file, and the recovering module 4 is used for recovering the new image file to the block storage device.
The disk of the storage server is read through the duplication removal module 1, the storage unit files stored in the block storage device in a distributed mode are obtained, redundant storage unit files in the storage unit file set are deleted to obtain a complete storage unit file set, then zero writing supplement is conducted on the storage unit files similar to the hole files, and the situation that the hole files cause data unrecoverable is avoided through the zero writing supplement conducted on the storage unit files. And after zero writing supplement, renaming and numbering the storage unit files according to a logic sequence, reassembling the storage unit file set subjected to zero writing filling and recoding by the assembling module 3 to obtain a new image file, and restoring the new image file to the original block storage equipment so as to facilitate a user to directly access data. A device is provided for efficiently and accurately recovering data when a distributed block is stored in a disaster such as a storage system software running.
The acquiring and deduplication module 1 includes an acquiring unit and a deduplication unit, and is configured to read a disk of a storage server and acquire a storage unit file set in a block storage device, and group storage unit files in the storage unit file set according to the same file size, where the deduplication unit is configured to compare logical addresses of the storage unit files in the same group, and delete the storage unit files with the same logical address. The data processing of the deduplication unit is as follows: initializing HashMap < md5, count >, wherein d5 represents md5 values of files, and count represents the number of the same md5 values; traversing and calculating the md5 value of the subset file, if the count of the md5 in the hashmap is greater than 1, indicating that the current file is a redundant file, and deleting the file; if the md5 value is not included in the hashmap, then the md5 value is added to the hashmap, with count equal to 1.
The assembly module 3 comprises a reading calculation unit and a writing unit, wherein the reading calculation unit is used for calculating the times M of reading the storage unit file set according to the maximum value of the size of the storage unit file set and the size of the storage unit file, calculating the times Q of reading the storage unit file according to the size of the storage unit file and the size of the stripe unit, and the writing unit is used for reading the file from the storage unit file according to the size of the stripe unit and writing the file into a new mirror image file.
The reading calculation unit divides the size of the storage unit file set by the maximum size of the storage unit file to obtain the number M of times of reading the storage unit file set, and divides the size of the storage unit file by the size of the stripe unit file to obtain the number of times of reading the storage unit file, wherein the stripe is a method for dividing continuous data into data blocks with the same size and writing each data block into different storage unit files on different disks in the distributed cluster. A strip unit: refers to the size of the band, also called the band size. This parameter refers to the size of the striped data blocks written on each disk. Data block sizes are typically between 16KB and 512KB (or larger), with values to the power of 2, i.e., 16KB, 32KB, 64KB, 128 KB. Strip width: refers to the number of stripes that can be read or written simultaneously. This number is equal to the number of memory cells operating concurrently. For example, a striped distributed block device image, with a stripe size of 256KB and a stripe width of 8, reads 8 memory locations concurrently during a data read operation, each reading size of 256 KB. Increasing the stripe width can increase the read and write performance of the block device. It is clear that adding more hard disks increases the number of stripes that can be read or written simultaneously and concurrently. The writing unit reads files from the storage unit files sequentially according to the size of the strip unit and the width of the strip, reads and writes the files into a new storage unit file for a plurality of storage units at one time according to the width of the strip, reads for a plurality of times according to the size of one storage unit file and the width of the strip unit to form a new image file, repeats the process Q times, reads the rest N-K storage unit files, reads and writes for M times, and accordingly obtains the new image file.
The recovery module 4 comprises a mapping unit for mapping the new image file to a local loopback device, which mounts a local directory, which is a process by which the operating system makes computer files and directories on a storage device (such as a hard disk, CD-ROM, or shared resource) available to the user through the computer's file system, and an access unit. The access unit is used for accessing data in a file system mode. The mapping unit maps the new image file to the ring device in a cost-effective way and then mounts the new image file to the local directory, and then the access unit accesses the new image file in a file system mode, so that data recovery is efficient.
Example two: the difference from the first embodiment is that the recovery module 4 includes an export unit for mapping the new image file to the ring device and exporting the new image file according to the access protocol of the original block storage device, and an access unit for accessing the exported data. The mapped access protocol of the block storage device is exported by the exporting unit, and then the data is exported by the accessing unit, so that the data recovery operation is simple and efficient.
Referring to fig. 2, the invention discloses a data recovery method for distributed block storage, comprising the following steps:
S1, reading the disk of the storage server, acquiring the original storage unit file set of the block storage device mirror image, and removing the duplicate of the redundant file in the storage unit file set;
s2, writing zero padding is carried out on the storage unit files in the storage unit file set which do not reach the maximum value of the size of the storage unit files;
s3 renaming the storage unit files according to the logical address sequence and the serial numbers in sequence, reading the storage unit files according to the size of the stripe unit and the width of the stripe and writing the storage unit files into a new image file;
and S4, restoring the new image file to the block storage device.
The storage unit file sets of the block storage devices are obtained from the disks of all the storage servers, the storage unit files collected in the storage unit file sets are preprocessed, wherein the preprocessing comprises zero filling writing of the storage unit files similar to the hole files, deletion of redundant storage unit files, renumbering of the storage unit files according to a logic sequence, data reassembly of stripe units and stripe widths in the storage unit files according to stripe parameters of the block storage devices, and original distributed block storage data are restored.
S1 includes the steps of: s11, reading a disk of the storage server and acquiring an original storage unit file set mirrored from the block storage device;
s12, dividing the storage unit files in the storage unit file set into a group with the same size;
s13, comparing the logical addresses of the storage unit files with the same group size, deleting the storage unit file with the repeated logical addresses if the logical addresses are repeated, and processing the repeated storage unit files as follows: initializing HashMap < md5, count >, wherein md5 represents md5 values of files, and count represents the number of the same md5 values; traversing and calculating the md5 value of the subset file, if the count of the md5 in the hashmap is greater than 1, indicating that the current file is a redundant file, and deleting the file; if the md5 value is not included in hashmap, the md5 value is added to hashmap, and count is 1. The original unit data file sets are grouped according to the size, and then the storage unit files with the same logical address in the same group are deleted to leave only one storage unit file, so that data recovery errors caused by data repetition due to the fact that repeated storage unit file residues are prevented, and accuracy of data recovery is improved.
As shown in fig. 3, S3 includes the steps of:
s31, calculating the times M of reading the storage unit file set according to the size relation of the storage unit file set and the storage unit file, and calculating the times Q of reading the storage unit file according to the size relation of the storage unit file and the stripe unit;
s32, reading the file from the storage unit file according to the strip width K and the strip unit size in sequence and writing the file into a new image file, and repeating the process for Q times;
and S33, reading the rest storage unit files according to the step S32, repeating the step M times, and writing the rest storage unit files into the new image file.
Dividing the size of a storage unit file set by the size of a storage unit file to obtain the number of times of reading the storage unit file set, dividing the size of the storage unit file by the size of a strip unit to obtain the number of times of reading the storage unit file, reading the files from the storage file sequentially according to the size of the strip unit and writing the files into a new image file, reading each storage unit file repeatedly Q times to completely write a new image file, reading the rest N-K storage unit files according to the size of the strip unit and the width of the strip and writing the rest N-K storage unit files into the new image file, reading K storage units simultaneously once when the width of the strip is K in the embodiment, entering the new image file when the size of the strip unit is read each time, and finally obtaining the new image file to repeatedly read and write M times to form a complete new image file due to the relationship between the sizes of the storage unit files and the strip units, therefore, in this embodiment, when the read data enters the new image file, K memory cells need to be read concurrently, and each time the read size is 256KB, and the memory cell file can be completely read into the new image file by repeating the reading Q times, so that the data recovery of the memory cell file is fast and complete.
S4 includes the steps of: s41, mapping the new mirror image file to local loopback equipment, and mounting the new mirror image file to a local directory;
and S42, accessing the data in the new image file in a file system mode, so that the data is restored to the local storage device easily and efficiently.
The second embodiment: a difference from the first embodiment is that S4 includes the following steps: s41, mapping the new image file to local loopback equipment;
and S42, the loopback device exports according to the access protocol of the block storage device and accesses the data of the new image file.
And the recovered mirror image file set is mapped to the local loopback equipment and then directly accesses data according to the access protocol of the original block storage equipment, so that the operation of restoring the data to the local equipment is simple and easy.
A computer-readable storage medium storing computer-executable instructions for causing a computer to perform steps S1-S4.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A data recovery method for distributed block storage is characterized by comprising the following steps:
s1, reading a disk of a storage server, acquiring an original storage unit file set of a block storage device mirror image, and removing duplicate of redundant files in the storage unit file set;
s2, performing zero writing filling on the storage unit files in the storage unit file set which do not reach the maximum value of the size of the storage unit files;
s3 renaming the storage unit files according to the logical address sequence and the serial numbers in sequence, reading the storage unit files according to the size of the stripe unit and the width of the stripe and writing the storage unit files into a new image file;
s31, calculating the times M of reading the storage unit file set according to the size relation between the storage unit file set and the storage unit file, and calculating the times Q of reading the storage unit file according to the size relation between the storage unit file and the stripe unit;
s32, reading the file from the storage unit file according to the strip width K and the size of the strip unit in sequence and writing the file into a new image file, and repeating the process for Q times;
s33, reading the rest storage unit files according to the step S32, repeating the step M times, and writing the rest storage unit files into a new mirror image file;
And S4, restoring the new image file to the block storage device.
2. The method for recovering data in a distributed block storage according to claim 1, wherein said S1 includes the following steps:
s11, reading a disk of the storage server and acquiring an original storage unit file set mirrored from the block storage device;
s12, dividing the storage unit files in the storage unit file set into a group with the same size;
s13, comparing the logical addresses of the storage unit files with the same group size, and deleting the storage unit file with the repeated logical address if the logical addresses are repeated.
3. The method for recovering data in a distributed block storage according to claim 1, wherein said S4 includes the following steps:
s41, mapping the new mirror image file to local loopback equipment, and mounting the new mirror image file to a local directory;
and S42, accessing the data in the new image file in a file system mode.
4. The data recovery method for distributed block storage according to claim 1, wherein said S4 includes the following steps:
s41, mapping the new image file to local loopback equipment;
and S42, the loopback device is exported according to the access protocol of the block storage device to access the data of the new image file.
5. A data recovery apparatus for distributed block storage, comprising:
the acquisition and deduplication module is used for acquiring a storage unit file set in the block storage device, deleting repeated storage unit files in the acquired storage unit file set, and only keeping a complete storage unit file set;
the preprocessing module is used for performing zero writing operation on the storage unit files which do not reach the maximum value of the storage unit files and renaming and numbering the storage unit files in sequence according to logical addresses;
the assembling module is used for reassembling the storage unit files in the preprocessing module into a new image file;
the reading calculation unit is used for calculating the times M of reading the storage unit file set according to the maximum value of the size of the storage unit file set and the size of the storage unit file, and calculating the times Q of reading the storage unit file according to the size of the storage unit file and the size of the strip unit;
the writing unit is used for reading the file from the storage unit file according to the size of the stripe unit and the width K of the stripe and writing the file into the new image file to form the new image file;
and the recovery module is used for recovering the new image file to the block storage device.
6. The apparatus of claim 5, wherein the retrieve deduplication module comprises:
the acquisition unit is used for reading a disk of the storage server, acquiring a storage unit file set in the block storage device, and grouping the storage unit files in the storage unit file set according to the same file size;
and the deduplication unit is used for comparing the logical addresses of the storage unit files in the same group and deleting the storage unit files with the same logical addresses.
7. The apparatus of claim 6, wherein the recovery module comprises:
the mapping unit is used for mapping the new mirror image file to local loopback equipment, and the local loopback equipment mounts a local directory;
and the access unit is used for accessing the data in a file system mode.
8. A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 4.
CN201910605476.0A 2019-07-05 2019-07-05 Data recovery method and device for distributed block storage and storage medium Active CN110457163B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910605476.0A CN110457163B (en) 2019-07-05 2019-07-05 Data recovery method and device for distributed block storage and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910605476.0A CN110457163B (en) 2019-07-05 2019-07-05 Data recovery method and device for distributed block storage and storage medium

Publications (2)

Publication Number Publication Date
CN110457163A CN110457163A (en) 2019-11-15
CN110457163B true CN110457163B (en) 2022-05-03

Family

ID=68482298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910605476.0A Active CN110457163B (en) 2019-07-05 2019-07-05 Data recovery method and device for distributed block storage and storage medium

Country Status (1)

Country Link
CN (1) CN110457163B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114564456B (en) * 2022-03-03 2022-09-23 北京天融信网络安全技术有限公司 Distributed storage file recovery method and device
CN117971774B (en) * 2024-03-29 2024-06-07 苏州元脑智能科技有限公司 File set recovery method, apparatus, computer device, medium and program product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102736961A (en) * 2011-03-11 2012-10-17 微软公司 Backup and restore strategies for data deduplication
CN103955530A (en) * 2014-05-12 2014-07-30 暨南大学 Data reconstruction and optimization method of on-line repeating data deletion system
CN109933278A (en) * 2017-12-19 2019-06-25 中国电信股份有限公司 For realizing the method and apparatus of block device carry access

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1563411B1 (en) * 2002-11-14 2013-06-19 EMC Corporation Systems and methods for restriping files in a distributed file system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102736961A (en) * 2011-03-11 2012-10-17 微软公司 Backup and restore strategies for data deduplication
CN103955530A (en) * 2014-05-12 2014-07-30 暨南大学 Data reconstruction and optimization method of on-line repeating data deletion system
CN109933278A (en) * 2017-12-19 2019-06-25 中国电信股份有限公司 For realizing the method and apparatus of block device carry access

Also Published As

Publication number Publication date
CN110457163A (en) 2019-11-15

Similar Documents

Publication Publication Date Title
JP4972158B2 (en) System and method for eliminating duplicate data using sampling
US11003547B2 (en) Method, apparatus and computer program product for managing data storage
US11663236B2 (en) Search and analytics for storage systems
US10936228B2 (en) Providing data deduplication in a data storage system with parallelized computation of crypto-digests for blocks of host I/O data
CN106547641B (en) CDP backup method based on volume
US9959049B1 (en) Aggregated background processing in a data storage system to improve system resource utilization
US10628298B1 (en) Resumable garbage collection
US8572338B1 (en) Systems and methods for creating space-saving snapshots
US11199990B2 (en) Data reduction reporting in storage systems
US10409497B2 (en) Systems and methods for increasing restore speeds of backups stored in deduplicated storage systems
CN110457163B (en) Data recovery method and device for distributed block storage and storage medium
US11327844B1 (en) Automated cloud recovery to provide a full usable application image
WO2015096847A1 (en) Method and apparatus for context aware based data de-duplication
CN114416665B (en) Method, device and medium for detecting and repairing data consistency
US11481319B2 (en) Using data mirroring across multiple regions to reduce the likelihood of losing objects maintained in cloud object storage
CN113885809B (en) Data management system and method
US10896152B2 (en) Method, apparatus and computer program product for managing data storage
US20200042617A1 (en) Method, apparatus and computer program product for managing data storage
CN106933707B (en) Data recovery method and system of data storage device based on raid technology
US11513702B2 (en) Placement of metadata on data storage drives in a first storage enclosure of a data storage system
US10078553B2 (en) Point in time copy technique using a block level of granularity
CN111625186B (en) Data processing method, device, electronic equipment and storage medium
US11645333B1 (en) Garbage collection integrated with physical file verification
CN111399774B (en) Data processing method and device based on snapshot under distributed storage system
US10769020B2 (en) Sharing private space among data storage system data rebuild and data deduplication components to minimize private space overhead

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant