CN111177092A - Deduplication method and device based on erasure codes - Google Patents
Deduplication method and device based on erasure codes Download PDFInfo
- Publication number
- CN111177092A CN111177092A CN201911251209.4A CN201911251209A CN111177092A CN 111177092 A CN111177092 A CN 111177092A CN 201911251209 A CN201911251209 A CN 201911251209A CN 111177092 A CN111177092 A CN 111177092A
- Authority
- CN
- China
- Prior art keywords
- data
- data block
- stored
- storage table
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000013500 data storage Methods 0.000 claims abstract description 52
- 238000012545 processing Methods 0.000 claims abstract description 42
- 231100000279 safety data Toxicity 0.000 claims abstract description 10
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000013144 data compression Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000000903 blocking effect Effects 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1748—De-duplication implemented within the file system, e.g. based on file segments
- G06F16/1752—De-duplication implemented within the file system, e.g. based on file segments based on file chunks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and a device for deleting repeated data based on erasure codes, wherein the method comprises the following steps: performing security processing on a data block to be stored by using a nonlinear Hash function to obtain a security data block; carrying out operation processing on the safety data block by using an erasure code to obtain a storage value of the data block; judging whether the data block is a repeated data block or not according to the stored value of the data block and a pre-stored data storage table; and correspondingly processing the data block needing to be stored according to the judgment result.
Description
Technical Field
The invention relates to the technical field of data storage, in particular to a data de-duplication method and device based on erasure codes.
Background
In the 21 st century, with the advent of the Information age, MIS (Management Information System) was used by various industries around the world, which enhanced Information Management, collected, collated, and processed data of enterprises using computer and network communication, and then decision-makers could analyze the Information resources thus generated, thereby improving the Management level and benefit of the enterprises. The data volume of modern enterprises grows exponentially, and the required storage capacity of the modern enterprises is from dozens of TB of dozens of GB to several PB. The big data age has long been no longer only theoretical, but has come. Through research, nearly 60% of data in storage is duplicated, and the existence of duplicated data not only wastes storage space, but also reduces the processing speed and the calculation accuracy of the data. Naturally, reducing the number of copies of the repeated data blocks has become an effective way to reduce the storage capacity and save the storage space.
Deduplication is a data pruning technique that can efficiently optimize storage capacity. The definition of IDC (International Data Corporation ) for deletion of duplicate Data is: a technique that can normalize duplicate data into a single shared data object to improve storage capacity efficiency. The purpose of deduplication is to globally remove redundant data existing in a storage system, including intra-file and inter-file redundant data, whereas conventional data compression can only remove redundant information inside files. Compared with the prior art, the data compression effect of the data de-duplication technology is more obvious, and the data de-duplication rate for specific application data can reach 300: 1 and even higher, the two data compression techniques are only 2: about 1.
The key of the data de-duplication technology is to determine whether a file, a data block or even a byte in a storage system is duplicated by detecting duplicated data, and the de-duplication efficiency of the duplicated data needs to be determined according to the dividing method of the file. There are two main types of current deduplication: file-level data de-duplication can detect the same file or two files with different names and the same content at different positions, thereby avoiding the repeated storage of the same file; the data block level data de-duplication can detect the same data block in the file and ensure the unique storage of the data block.
The data de-duplication utilizes the identity and similarity of the files with the files and the interior of the files, and the finer the processing granularity is, the more redundant data is deleted. Today, the algorithm for computing the duplicate data is generally a Hash algorithm. And the MD5 algorithm and the SHA-1 algorithm are Hash algorithms which are widely applied at present. The Hash algorithm is utilized to calculate the repeated data, and generally, two modes are provided, namely full-text Hash and file blocking Hash.
Full file Hash is a method to find duplicate data at the file granularity level. In a storage system, since a file is generally used as a unit of one information set, it is originally thought that a deduplication technology compares duplicates based on a file. For files already stored in the storage system, their respective hash function values are first calculated (usually using MD5 or SHA-1) and organized into a hash function library for individual storage. The premise of applying the data de-duplication function is that the application has a lot of repeated data, otherwise, the storage space is actually wasted due to the fact that the hash function value of the file is stored. When new files to be stored arrive at the storage system, the hash function values of the new files are calculated. The resulting hash function value is compared with values already stored in a hash function value library. If the two files have the same hash function value, the two files are judged to be the same, and only a pointer pointing to the stored file is needed to replace a new file to be stored. If the new file to be stored is not found in the hash function value library, the file is judged not to be in the storage system, and the hash function value library is updated to add the new file hash function value in addition to storing the file.
File blocking Hash is similar to data compression techniques. The file blocking Hash is very similar to the dictionary type compression algorithm. And carrying out the Hash calculation of the file blocks, namely firstly dividing the data blocks and then carrying out the Hash calculation on the data blocks. The simplest way to divide a block is to fix the size of the data block. The block size is within a specified range of minimum and maximum sizes. Variable-size data blocks may be partitioned by a sliding window, and a partition is created when the Hash value of the sliding window matches a reference value. In general, the reference value may be calculated using a Rabin fingerprint, and the range of block size variation may be reduced by setting upper and lower limits of the block size. The storage of data blocks is similar to the way full file Hash, with identical blocks identified by linear block numbers. Fixed block sizes may reduce the need for block partitioning algorithms, but similarity detection for the same block will be reduced.
The full-text Hash has the advantage of high calculation speed in a common environment, but has the defect that the same data existing among different files cannot be detected and redundancy elimination cannot be realized. The advantage of the file block Hash is that the same data between different files can be detected and deleted, and the disadvantage is that the Hash index of the block must be saved, which additionally increases some storage space. The Hash algorithm has a common disadvantage that the security of data cannot be guaranteed.
Disclosure of Invention
The technical problem solved by the scheme provided by the embodiment of the invention is that the existing data in the existing data de-duplication technology has lower safety.
The deduplication method based on the erasure code provided by the embodiment of the invention comprises the following steps:
performing security processing on a data block to be stored by using a nonlinear Hash function to obtain a security data block;
carrying out operation processing on the safety data block by using an erasure code to obtain a storage value of the data block;
judging whether the data block is a repeated data block or not according to the stored value of the data block and a pre-stored data storage table;
and correspondingly processing the data block needing to be stored according to the judgment result.
Preferably, the method further comprises the following steps:
reading data to be stored;
and segmenting the data to be stored according to a preset size to obtain N data blocks with the same size.
Preferably, the data storage table comprises index locations, data blocks and storage values.
Preferably, the determining whether the data block is a duplicate data block according to the storage value of the data block and a pre-stored data storage table includes:
traversing the stored values in the pre-stored data storage table, and determining whether the stored values of the data blocks are contained in the data storage table;
when the data storage table is determined to contain the storage value of the data block, judging that the data block is a repeated data block;
and when the data storage table is determined not to contain the storage value of the data block, judging that the data block is a non-repeated data block.
Preferably, the performing, according to the determination result, the corresponding processing on the data block to be stored includes:
when the data block is judged to be a repeated data block, discarding the data block, and recording the index position of the data block in the data storage table;
and when the data block is judged to be a non-repeated data block, storing the data block and a storage value thereof, and recording the index position of the data block in the data storage table.
According to an embodiment of the present invention, a de-duplication apparatus based on erasure codes includes:
the safety processing module is used for carrying out safety processing on the data block needing to be stored by utilizing a nonlinear Hash function to obtain a safety data block;
the operation processing module is used for performing operation processing on the safety data block by using the erasure code to obtain a storage value of the data block;
the judging module is used for judging whether the data block is a repeated data block or not according to the stored value of the data block and a pre-stored data storage table;
and the processing module is used for correspondingly processing the data block needing to be stored according to the judgment result.
Preferably, the method further comprises the following steps:
the reading module is used for reading data needing to be stored;
and the segmentation module is used for segmenting the data to be stored according to a preset size to obtain N data blocks with the same size.
Preferably, the data storage table comprises index locations, data blocks and storage values.
Preferably, the judging module includes:
the determining unit is used for traversing the stored values in the pre-stored data storage table and determining whether the stored values of the data blocks are contained in the data storage table;
and the judging unit is used for judging that the data block is a repeated data block when the data storage table is determined to contain the stored value of the data block, and judging that the data block is a non-repeated data block when the data storage table is determined not to contain the stored value of the data block.
Preferably, the processing module is specifically configured to discard the data block and record an index position of the data block in the data storage table when the data block is determined to be a duplicate data block, and store the data block and a storage value thereof and record an index position of the data block in the data storage table when the data block is determined to be a non-duplicate data block.
According to the scheme provided by the embodiment of the invention, the erasure code technology is utilized to prevent the data from being deleted by mistake, thereby ensuring the safety of the data.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of a method for erasure code based deduplication provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of an erasure code based de-duplication apparatus according to an embodiment of the present invention;
fig. 3 is a flowchart of an erasure code based deduplication method provided by an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, and it should be understood that the preferred embodiments described below are only for the purpose of illustrating and explaining the present invention, and are not to be construed as limiting the present invention.
Fig. 1 is a flowchart of an erasure code-based data de-duplication method according to an embodiment of the present invention, as shown in fig. 1, including:
step S100: performing security processing on a data block to be stored by using a nonlinear Hash function to obtain a security data block;
step S110: carrying out operation processing on the safety data block by using an erasure code to obtain a storage value of the data block;
step S120: judging whether the data block is a repeated data block or not according to the stored value of the data block and a pre-stored data storage table;
step S130: and correspondingly processing the data block needing to be stored according to the judgment result.
The invention also includes: reading data to be stored; and segmenting the data to be stored according to a preset size to obtain N data blocks with the same size.
Wherein the data storage table includes an index position, a data block, and a storage value.
Wherein the step S120 includes: traversing the stored values in the pre-stored data storage table, and determining whether the stored values of the data blocks are contained in the data storage table; when the data storage table is determined to contain the storage value of the data block, judging that the data block is a repeated data block; and when the data storage table is determined not to contain the storage value of the data block, judging that the data block is a non-repeated data block. Specifically, the performing, according to the determination result, the corresponding processing on the data block to be stored includes: when the data block is judged to be a repeated data block, discarding the data block, and recording the index position of the data block in the data storage table; and when the data block is judged to be a non-repeated data block, storing the data block and a storage value thereof, and recording the index position of the data block in the data storage table.
Fig. 2 is a schematic diagram of an erasure code-based data de-duplication apparatus according to an embodiment of the present invention, as shown in fig. 2, including: the device comprises a safety processing module, an operation processing module, a judgment module and a processing module.
The safety processing module is used for carrying out safety processing on the data block to be stored by utilizing a nonlinear Hash function to obtain a safety data block; the operation processing module is used for performing operation processing on the safety data block by using an erasure code to obtain a storage value of the data block; the judging module is used for judging whether the data block is a repeated data block according to the stored value of the data block and a pre-stored data storage table; and the processing module is used for correspondingly processing the data block needing to be stored according to the judgment result.
The invention also includes: the reading module is used for reading data needing to be stored; and the segmentation module is used for segmenting the data to be stored according to a preset size to obtain N data blocks with the same size.
Wherein the data storage table includes an index position, a data block, and a storage value.
Wherein, the judging module comprises: the determining unit is used for traversing the stored values in the pre-stored data storage table and determining whether the stored values of the data blocks are contained in the data storage table; and the judging unit is used for judging that the data block is a repeated data block when the data storage table is determined to contain the stored value of the data block, and judging that the data block is a non-repeated data block when the data storage table is determined not to contain the stored value of the data block. Specifically, the processing module is configured to discard the data block and record an index position of the data block in the data storage table when the data block is determined to be a duplicate data block, and store the data block and a storage value thereof and record an index position of the data block in the data storage table when the data block is determined to be a non-duplicate data block.
The method combines a data de-duplication technology method, divides data into n data blocks with fixed size, processes each data block by using a nonlinear hash function, uses binary Goppa codes to operate the processed data blocks to obtain keys, compares each calculation result with existing data in a database in sequence, and stores the index position of the data block without storing the data block if the data block already exists in the original database; otherwise, it is stored in the database and its index position is saved. When the file needs to be read, the data block index file is extracted from the database according to the search content, then the corresponding data block is searched according to the index position recorded in the index file found before, and then the found data block is restored into the original data file.
It should be noted that, the data and the values related to the embodiments of the present invention may be determined according to actual needs, and are not limited herein.
Fig. 3 is a flowchart of a deduplication method based on erasure codes according to an embodiment of the present invention, and as shown in fig. 3, taking a file with a size of about 32M as an example, where the file name is test, and the subscript i takes a value of 1 to 1024, including:
step 101 reads the data to be stored, here read test.
And 102, partitioning the file test according to a fixed size to obtain n data blocks.
Specifically, the file test in step 101 is divided into 1024 blocks, named n1, n2, n3 … n1024, according to 32kb each. Where the size of each block of data may vary with demand.
Step 103: the 1024 data blocks in step 102 are processed separately using a nonlinear Hash function.
Specifically, the 1024 data blocks in the step 102 are respectively processed according to the set nonlinear Hash function, so that the attack can be prevented, and meanwhile, the subsequent steps can be better realized. The nonlinear Hash function is here set to H (x) and the data blocks after processing are named m1, m2, m3 … m1024, respectively.
Step 104: calculating the data block in the step 103 according to a repeated data calculation rule;
specifically, m1, m2, m3 … m1024 in step 103 are operated by binary, i.e., Goppa codes, to obtain the corresponding values of each data block, which are named key1, key2, key3 … key 1024.
Step 105: the data block is processed according to the deduplication rule using the value in step 104.
Specifically, the value obtained in step 104 is compared with a value stored in the system to determine whether this value is present in the system; when the value of a data block is the same as the value in the system (through calculation, the value is the same as the value in the system, if the value is the same, the data block already exists in the original system, the data block is not stored, and repeated storage is avoided), recording the index position corresponding to the value in the system (the index position records what the data block after the data is split is specific, and subsequent data reconstruction is needed), and discarding the data block; when the value of a data block does not exist in the system, the data block and the value are stored, and the index position of the data block is recorded.
That is, the key1, key2, key3 … key1024 obtained in step 104 is compared with the key value stored in the system to determine whether this value exists in the system; when the keyi values are the same, recording the index positions of the data blocks corresponding to the values in the system, and discarding the data blocks; when the key i value does not exist in the system, storing the data block ni corresponding to the key value and the key value, and recording the index position of ni.
According to the scheme provided by the embodiment of the invention, the data is divided into n data blocks with fixed sizes, then each data block is calculated according to a certain rule to obtain a unique value key, and finally the value is compared with the key value of the existing data block in the original database, and if the key value exists in the original database, the data block is deleted; if the data block does not exist, the data block is stored in the database, and the safety of the data is ensured.
Although the present invention has been described in detail hereinabove, the present invention is not limited thereto, and various modifications can be made by those skilled in the art in light of the principle of the present invention. Thus, modifications made in accordance with the principles of the present invention should be understood to fall within the scope of the present invention.
Claims (10)
1. A deduplication method based on erasure codes is characterized by comprising the following steps:
performing security processing on a data block to be stored by using a nonlinear Hash function to obtain a security data block;
carrying out operation processing on the safety data block by using an erasure code to obtain a storage value of the data block;
judging whether the data block is a repeated data block or not according to the stored value of the data block and a pre-stored data storage table;
and correspondingly processing the data block needing to be stored according to the judgment result.
2. The method of claim 1, further comprising:
reading data to be stored;
and segmenting the data to be stored according to a preset size to obtain N data blocks with the same size.
3. The method of claim 1, wherein the data storage table comprises index locations, data blocks, and storage values.
4. The method according to claim 3, wherein the determining whether the data block is a duplicate data block according to the stored value of the data block and a pre-stored data storage table comprises:
traversing the stored values in the pre-stored data storage table, and determining whether the stored values of the data blocks are contained in the data storage table;
when the data storage table is determined to contain the storage value of the data block, judging that the data block is a repeated data block;
and when the data storage table is determined not to contain the storage value of the data block, judging that the data block is a non-repeated data block.
5. The method according to claim 4, wherein the performing the corresponding processing on the data block to be stored according to the determination result comprises:
when the data block is judged to be a repeated data block, discarding the data block, and recording the index position of the data block in the data storage table;
and when the data block is judged to be a non-repeated data block, storing the data block and a storage value thereof, and recording the index position of the data block in the data storage table.
6. An erasure code based de-duplication apparatus, comprising:
the safety processing module is used for carrying out safety processing on the data block needing to be stored by utilizing a nonlinear Hash function to obtain a safety data block;
the operation processing module is used for performing operation processing on the safety data block by using the erasure code to obtain a storage value of the data block;
the judging module is used for judging whether the data block is a repeated data block or not according to the stored value of the data block and a pre-stored data storage table;
and the processing module is used for correspondingly processing the data block needing to be stored according to the judgment result.
7. The apparatus of claim 6, further comprising:
the reading module is used for reading data needing to be stored;
and the segmentation module is used for segmenting the data to be stored according to a preset size to obtain N data blocks with the same size.
8. The apparatus of claim 6, wherein the data storage table comprises index locations, data blocks, and storage values.
9. The apparatus of claim 8, wherein the determining module comprises:
the determining unit is used for traversing the stored values in the pre-stored data storage table and determining whether the stored values of the data blocks are contained in the data storage table;
and the judging unit is used for judging that the data block is a repeated data block when the data storage table is determined to contain the stored value of the data block, and judging that the data block is a non-repeated data block when the data storage table is determined not to contain the stored value of the data block.
10. The apparatus according to claim 9, wherein the processing module is specifically configured to discard the data chunk and record an index position of the data chunk in the data storage table when the data chunk is determined to be a duplicate data chunk, and store the data chunk and a storage value thereof and record an index position of the data chunk in the data storage table when the data chunk is determined to be a non-duplicate data chunk.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911251209.4A CN111177092A (en) | 2019-12-09 | 2019-12-09 | Deduplication method and device based on erasure codes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911251209.4A CN111177092A (en) | 2019-12-09 | 2019-12-09 | Deduplication method and device based on erasure codes |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111177092A true CN111177092A (en) | 2020-05-19 |
Family
ID=70653835
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911251209.4A Pending CN111177092A (en) | 2019-12-09 | 2019-12-09 | Deduplication method and device based on erasure codes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111177092A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115993939A (en) * | 2023-03-22 | 2023-04-21 | 陕西中安数联信息技术有限公司 | Method and device for deleting repeated data of storage system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177111A (en) * | 2013-03-29 | 2013-06-26 | 西安理工大学 | System and method for deleting repeating data |
CN103561057A (en) * | 2013-10-15 | 2014-02-05 | 深圳清华大学研究院 | Data storage method based on distributed hash table and erasure codes |
CN105824720A (en) * | 2016-03-10 | 2016-08-03 | 中国人民解放军国防科学技术大学 | Continuous data reading oriented data placement method of deduplication and erasure correcting combined system |
CN109522283A (en) * | 2018-10-30 | 2019-03-26 | 深圳先进技术研究院 | A kind of data de-duplication method and system |
CN110149198A (en) * | 2019-04-29 | 2019-08-20 | 成都信息工程大学 | A kind of autonomous system and method that safeguard protection and storage controllably are carried out to data |
-
2019
- 2019-12-09 CN CN201911251209.4A patent/CN111177092A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177111A (en) * | 2013-03-29 | 2013-06-26 | 西安理工大学 | System and method for deleting repeating data |
CN103561057A (en) * | 2013-10-15 | 2014-02-05 | 深圳清华大学研究院 | Data storage method based on distributed hash table and erasure codes |
CN105824720A (en) * | 2016-03-10 | 2016-08-03 | 中国人民解放军国防科学技术大学 | Continuous data reading oriented data placement method of deduplication and erasure correcting combined system |
CN109522283A (en) * | 2018-10-30 | 2019-03-26 | 深圳先进技术研究院 | A kind of data de-duplication method and system |
CN110149198A (en) * | 2019-04-29 | 2019-08-20 | 成都信息工程大学 | A kind of autonomous system and method that safeguard protection and storage controllably are carried out to data |
Non-Patent Citations (5)
Title |
---|
DAN TANG, YA-QIANG WANG, HAO-PENG YANG: ""Array Erasure Codes with Preset Fault Tolerance Capability"" * |
DI FAN, FENG XIAO, AND DAN TANG: ""A New Erasure Code Decoding Algorithm"" * |
唐聃;舒红平;: "一类多容错的阵列纠删码" * |
朱江;冀鸣;杨志成;张嘉贤;曹雄;: "基于重复数据删除技术的存储系统分析" * |
罗象宏 舒继武: ""存储系统中的纠删码研究综述"" * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115993939A (en) * | 2023-03-22 | 2023-04-21 | 陕西中安数联信息技术有限公司 | Method and device for deleting repeated data of storage system |
CN115993939B (en) * | 2023-03-22 | 2023-06-09 | 陕西中安数联信息技术有限公司 | Method and device for deleting repeated data of storage system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9223794B2 (en) | Method and apparatus for content-aware and adaptive deduplication | |
US8914338B1 (en) | Out-of-core similarity matching | |
US9454318B2 (en) | Efficient data storage system | |
US9292584B1 (en) | Efficient data communication based on lossless reduction of data by deriving data from prime data elements resident in a content-associative sieve | |
US7434015B2 (en) | Efficient data storage system | |
US9430156B1 (en) | Method to increase random I/O performance with low memory overheads | |
US9367448B1 (en) | Method and system for determining data integrity for garbage collection of data storage systems | |
Lu et al. | Frequency based chunking for data de-duplication | |
US20160283505A1 (en) | Methods and apparatus for efficient compression and deduplication | |
US7117204B2 (en) | Transparent content addressable data storage and compression for a file system | |
US20070043757A1 (en) | Storage reports duplicate file detection | |
CN106611035A (en) | Retrieval algorithm for deleting repetitive data in cloud storage | |
US20190379394A1 (en) | System and method for global data compression | |
CN106980680B (en) | Data storage method and storage device | |
Tan et al. | Improving restore performance in deduplication-based backup systems via a fine-grained defragmentation approach | |
CN110888837A (en) | Object storage small file merging method and device | |
US7379940B1 (en) | Focal point compression method and apparatus | |
CN111177092A (en) | Deduplication method and device based on erasure codes | |
US8244677B2 (en) | Focal point compression method and apparatus | |
Vikraman et al. | A study on various data de-duplication systems | |
US11836388B2 (en) | Intelligent metadata compression | |
US12124420B2 (en) | Systems, methods and devices for eliminating duplicates and value redundancy in computer memories | |
US20230076729A2 (en) | Systems, methods and devices for eliminating duplicates and value redundancy in computer memories | |
US20240345955A1 (en) | Detecting Modifications To Recently Stored Data | |
Zhou et al. | BBMC: a novel block level chunking algorithm for de-duplication backup system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200519 |
|
RJ01 | Rejection of invention patent application after publication |