US9244623B1 - Parallel de-duplication of data chunks of a shared data object using a log-structured file system - Google Patents
Parallel de-duplication of data chunks of a shared data object using a log-structured file system Download PDFInfo
- Publication number
- US9244623B1 US9244623B1 US13/799,325 US201313799325A US9244623B1 US 9244623 B1 US9244623 B1 US 9244623B1 US 201313799325 A US201313799325 A US 201313799325A US 9244623 B1 US9244623 B1 US 9244623B1
- Authority
- US
- United States
- Prior art keywords
- data chunk
- duplication
- log
- node
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
- G06F3/0641—De-duplication techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1748—De-duplication implemented within the file system, e.g. based on file segments
- G06F16/1752—De-duplication implemented within the file system, e.g. based on file segments based on file chunks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
- G06F16/1824—Distributed file systems implemented using Network-attached Storage [NAS] architecture
-
- G06F17/30197—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0619—Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0685—Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays
Definitions
- the present invention relates to parallel storage in high performance computing environments.
- Parallel storage systems are widely used in many computing environments. Parallel storage systems provide high degrees of concurrency in which many distributed processes within a parallel application simultaneously access a shared file namespace.
- Parallel computing techniques are used in many industries and applications for implementing computationally intensive models or simulations.
- the Department of Energy uses a large number of distributed compute nodes tightly coupled into a supercomputer to model physics experiments.
- parallel computing techniques are often used for computing geological models that help predict the location of natural resources.
- each parallel process generates a portion, referred to as a data chunk, of a shared data object.
- De-duplication is a common technique to reduce redundant data by eliminating duplicate copies of repeating data. De-duplication is to improve storage utilization and also to reduce the number of bytes that must be sent for network data transfers.
- unique chunks of data are identified and stored as “fingerprints” during an analysis process. As the analysis progresses, other chunks are compared to the stored copy and when a match is detected, the redundant chunk is replaced with a reference that points to the stored chunk.
- Embodiments of the present invention provide improved techniques for parallel de-duplication of data chunks being written to a shared object.
- a method is provided for a client executing on one or more of a compute node and a burst buffer node in a parallel computing system to store a data chunk generated by the parallel computing system to a shared data object on a storage node in the parallel computing system by processing the data chunk to obtain a de-duplication fingerprint; comparing the de-duplication fingerprint to de-duplication fingerprints of other data chunks; and providing original data chunks to the storage node that stores the shared object.
- a reference to an original data chunk can be stored when the de-duplication fingerprint matches a de-duplication fingerprint of another data chunk.
- the client may be embodied, for example, as a Log-Structured File System (LSFS) client, and the storage node may be embodied, for example, as a Log-Structured File server.
- LSFS Log-Structured File System
- a storage node in a parallel computing system stores a data chunk as part of a shared object by receiving only an original version of the data chunk from a compute node in the parallel computing system; and storing the original version of the data chunk to the shared data object on the storage node as a shared object.
- the storage node can provide the original version of the data chunk to a compute node when the data chunk is read from the storage node.
- illustrative embodiments of the invention provide techniques for parallel de-duplication of data chunks being written to a shared object.
- FIG. 1 illustrates an exemplary conventional technique for de-duplicating data being stored to a shared object by a plurality of processes in a storage system
- FIG. 2 illustrates an exemplary distributed technique for de-duplication of data being stored to a shared object by a plurality of processes in a storage system in accordance with aspects of the present invention
- FIG. 3 illustrates an exemplary alternate distributed technique for de-duplication of data being stored to a shared object by a plurality of processes in a storage system in accordance with an alternate embodiment of the present invention
- FIG. 4 is a flow chart describing an exemplary LSFS de-duplication process incorporating aspects of the present invention.
- the present invention provides improved techniques for cooperative parallel writing of data to a shared object.
- one aspect of the present invention leverages the parallelism of concurrent writes to a shared object and the high interconnect speed of parallel supercomputer networks to de-duplicate the data in parallel as it is written.
- Embodiments of the present invention will be described herein with reference to exemplary computing systems and data storage systems and associated servers, computers, storage units and devices and other processing devices. It is to be appreciated, however, that embodiments of the invention are not restricted to use with the particular illustrative system and device configurations shown. Moreover, the phrases “computing system” and “data storage system” as used herein are intended to be broadly construed, so as to encompass, for example, private or public cloud computing or storage systems, as well as other types of systems comprising distributed virtual infrastructure. However, a given embodiment may more generally comprise any arrangement of one or more processing devices. As used herein, the term “files” shall include complete files and portions of files, such as sub-files or shards.
- FIG. 1 illustrates an exemplary conventional storage system 100 that employs a conventional technique for de-duplication of data being stored to a shared object 150 by a plurality of processes.
- the exemplary storage system 100 may be implemented, for example, as a Parallel Log-Structured File System (PLFS) to make placement decisions automatically, as described in U.S. patent application Ser. No. 13/536,331, filed Jun. 28, 2012, entitled “Storing Files in a Parallel Computing System Using List-Based Index to Identify Replica Files,” (now U.S. Pat. No. 9,087,075), incorporated by reference herein, or it can be explicitly controlled by the application and administered by a storage daemon.
- PLFS Parallel Log-Structured File System
- the exemplary storage system 100 comprises a plurality of compute nodes 110 - 1 through 110 -N (collectively, compute nodes 110 ) where a distributed application process generates a corresponding portion 120 - 1 through 120 -N of a distributed shared data structure 150 or other information to store.
- the compute nodes 110 optionally store the portions 120 of the distributed data structure 150 in one or more nodes of the exemplary storage system 100 , such as an exemplary flash based storage node 140 .
- the exemplary hierarchical storage tiering system 100 optionally comprises one or more hard disk drives (not shown).
- the compute nodes 110 send their distributed data chunks 120 into a single file 150 .
- the single file 150 is striped into file system defined blocks, and then a de-duplication fingerprint 160 - 1 through 160 - i is generated for each block.
- existing de-duplication approaches process the shared data structure 150 only after it has been sent to the storage node 140 of the storage system 100 .
- the de-duplication is applied to offset ranges on the data in sizes that are pre-defined by the file system 100 .
- the offset size of the de-duplication does not typically align with the size of the data portions 120 (i.e., the file system defined blocks will typically not match the original memory layout).
- FIG. 2 illustrates an exemplary storage system 200 that de-duplicates data chunks 220 being stored to a shared object 250 by a plurality of processes in accordance with aspects of the present invention.
- the exemplary storage system 200 may be implemented, for example, as a Parallel Log-Structured File System.
- the exemplary storage system 200 comprises a plurality of compute nodes 210 - 1 through 210 -N (collectively, compute nodes 210 ) where a distributed application process generates a corresponding data chunk portion 220 - 1 through 220 -N (collectively, data chunks 220 ) of a distributed shared data object 250 to store.
- the distributed application executing on given compute node 210 in the parallel computing system 200 writes and reads the data chunks 220 that are part of the shared data object 250 using a log-structured file system (LSFS) client 205 - 1 through 205 -N executing on the given compute node 210 .
- LSFS log-structured file system
- each LSFS client 205 applies a corresponding de-duplication function 260 - 1 through 260 -N to each data chunk 220 - 1 through 220 -N to generate a corresponding fingerprint 265 - 1 through 265 -N that is compared to other fingerprints.
- the redundant chunk 220 is replaced with a reference that points to the stored chunk.
- chunk 220 - 3 is a duplicate of chunk 220 - 2 so only chunk 220 - 2 is stored and a reference pointing to the stored chunk 220 - 2 is stored for chunk 220 - 3 , in a known manner.
- Each original data chunk 220 is then stored by the corresponding LSFS client 205 on the compute nodes 210 on one or more storage nodes of the exemplary storage system 200 , such as an exemplary LSFS server 240 .
- the LSFS server 240 may be implemented, for example, as a flash based storage node.
- the exemplary hierarchical storage tiering system 200 optionally comprises one or more hard disk drives (not shown).
- the parallelism of the compute nodes 210 can also be also leveraged to build a parallel key server to help find the de-duplicated fingerprints 265 .
- the keys can be cached across the compute server network 200 .
- FIG. 3 illustrates an exemplary storage system 300 that de-duplicates data chunks 220 being stored to a shared object 250 by a plurality of processes in accordance with an alternate embodiment of the present invention.
- the exemplary storage system 300 may be implemented; for example, as a Parallel Log-Structured File System.
- the exemplary storage system 300 comprises a plurality of compute nodes 210 - 1 through 210 -N (collectively, compute nodes 210 ) where a distributed application process generates a corresponding data chunk portion 220 - 1 through 220 -N (collectively, data chunks 220 ) of a distributed shared data object 250 to store, in a similar manner to FIG. 2 .
- the distributed application executing on given compute node 210 in the parallel computing system 200 writes and reads the data chunks 220 that are part of the shared data object 250 using a log-structured file system (LSFS) client 205 - 1 through 205 -N executing on the given compute node 210 , in a similar manner to FIG. 2 .
- LSFS log-structured file system
- each original data chunk 220 from the distributed data structure 250 is stored in one or more storage nodes of the exemplary storage system 200 , such as an exemplary LSFS server 240 .
- the LSFS server 240 may be implemented, for example, as a flash based storage node.
- the exemplary hierarchical storage tiering system 200 optionally comprises one or more hard disk drives (not shown).
- the exemplary storage system 300 also comprises one or more flash-based burst buffer nodes 310 - 1 through 310 - k that process the data chunks 220 that are written by the LSFS clients 205 to the LSFS server 240 , and are read by the LSFS clients 205 from the LSFS server 240 .
- the exemplary flash-based burst buffer nodes 310 comprise LSFS clients 305 in a similar manner to the LSFS clients 205 of FIG. 2 .
- each burst buffer node 310 applies a de-duplication function 360 - 1 through 360 - k to each data chunk 220 - 1 through 220 -N to generate a corresponding fingerprint 365 - 1 through 365 -N.
- Each original data chunk 220 is then stored on the LSFS server 240 , in a similar manner to FIG. 2 .
- FIGS. 2 and 3 can be combined such that a first level de-duplication is performed by the LSFS clients 205 executing on the compute nodes 210 and additional more computationally intensive de-duplication is performed by the burst buffer nodes 310 .
- FIG. 4 is a flow chart describing an exemplary LSFS de-duplication process 400 incorporating aspects of the present invention.
- the exemplary LSFS de-duplication process 400 is implemented by the LSFS clients 205 executing on the compute nodes 210 in the embodiment of FIG. 2 and by the flash-based burst buffer nodes 310 in the embodiment of FIG. 3 .
- the exemplary LSFS de-duplication process 400 initially obtains the data chunk from the application during step 420 .
- the exemplary LSFS de-duplication process 400 then de-duplicates the data chunk during step 430 on the compute nodes 210 or the burst buffer nodes 310 .
- the original data chunks are stored on the LSFS server 240 as part of the shared object 250 during step 440 .
- the number of compute servers 210 as shown in FIG. 2 is at least an order of magnitude greater than the number of storage servers 240 in HPC systems, thus it is much faster to perform the de-duplication on the compute servers 210 .
- the de-duplication is performed on the data chunks 220 as they are being written by the LSFS client 205 as opposed to when they have been placed into the file 250 by the server 240 .
- the advantage is that in a conventional approach, the data chunks 120 on the compute node 110 may be completely reorganized when the server 140 puts them into the shared file 150 , as shown in FIG. 1 .
- the data chunks 120 may be split into many smaller sub-chunks and interspersed with small sub-chunks from other compute nodes 110 .
- the original chunking of the data is the most likely to have commonality with other chunks.
- this reorganization with the conventional approach may reduce the de-duplicability of the data.
- the chunks 220 in a log-structured file system retain their original data organization whereas in existing approaches, the data in the chunks will almost always be reorganized into file system defined blocks. This can introduce additional latency as the file system will either wait for the blocks to be filled or do the de-duplication multiple times each time the block is partially filled.
- aspects of the present invention leverage the parallelism of concurrent writes to a shared object and the high interconnect speed of parallel supercomputer networks to improve data de-duplication during a write operation.
- aspects of the present invention thus recognize that the log-structured file system eliminates the need for artificial file system boundaries because all block sizes perform equally well in a log-structured file system.
- PLFS files can be shared across many locations, data processing required to implement these functions can be performed more efficiently when there are multiple nodes cooperating on the data processing operations. Therefore, when this is run on a parallel system with a parallel language, such as Message Passing Interface (MPI), PLFS can provide MPI versions of these functions which will allow it to exploit parallelism for more efficient data processing.
- MPI Message Passing Interface
- Such components can communicate with other elements over any type of network, such as a wide area network (WAN), a local area network (LAN), a satellite network, a telephone or cable network, or various portions or combinations of these and other types of networks.
- WAN wide area network
- LAN local area network
- satellite network a satellite network
- telephone or cable network a telephone or cable network
- a tangible machine-readable recordable storage medium stores one or more software programs, which when executed by one or more processing devices, implement the data deduplication techniques described herein.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/799,325 US9244623B1 (en) | 2013-03-13 | 2013-03-13 | Parallel de-duplication of data chunks of a shared data object using a log-structured file system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/799,325 US9244623B1 (en) | 2013-03-13 | 2013-03-13 | Parallel de-duplication of data chunks of a shared data object using a log-structured file system |
Publications (1)
Publication Number | Publication Date |
---|---|
US9244623B1 true US9244623B1 (en) | 2016-01-26 |
Family
ID=55086120
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/799,325 Active 2034-01-28 US9244623B1 (en) | 2013-03-13 | 2013-03-13 | Parallel de-duplication of data chunks of a shared data object using a log-structured file system |
Country Status (1)
Country | Link |
---|---|
US (1) | US9244623B1 (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160224595A1 (en) * | 2015-01-29 | 2016-08-04 | HGST Netherlands B.V. | Hardware Efficient Fingerprinting |
US10078643B1 (en) | 2017-03-23 | 2018-09-18 | International Business Machines Corporation | Parallel deduplication using automatic chunk sizing |
US10108659B2 (en) | 2015-01-29 | 2018-10-23 | Western Digital Technologies, Inc. | Hardware efficient rabin fingerprints |
US10459633B1 (en) | 2017-07-21 | 2019-10-29 | EMC IP Holding Company LLC | Method for efficient load balancing in virtual storage systems |
US10481813B1 (en) | 2017-07-28 | 2019-11-19 | EMC IP Holding Company LLC | Device and method for extending cache operational lifetime |
US10795859B1 (en) | 2017-04-13 | 2020-10-06 | EMC IP Holding Company LLC | Micro-service based deduplication |
US10795860B1 (en) | 2017-04-13 | 2020-10-06 | EMC IP Holding Company LLC | WAN optimized micro-service based deduplication |
US10860212B1 (en) | 2017-07-21 | 2020-12-08 | EMC IP Holding Company LLC | Method or an apparatus to move perfect de-duplicated unique data from a source to destination storage tier |
US10929382B1 (en) | 2017-07-31 | 2021-02-23 | EMC IP Holding Company LLC | Method and system to verify integrity of a portion of replicated data |
US10936543B1 (en) | 2017-07-21 | 2021-03-02 | EMC IP Holding Company LLC | Metadata protected sparse block set for SSD cache space management |
US10949088B1 (en) * | 2017-07-21 | 2021-03-16 | EMC IP Holding Company LLC | Method or an apparatus for having perfect deduplication, adapted for saving space in a deduplication file system |
CN112527186A (en) * | 2019-09-18 | 2021-03-19 | 华为技术有限公司 | Storage system, storage node and data storage method |
US11093176B2 (en) * | 2019-04-26 | 2021-08-17 | EMC IP Holding Company LLC | FaaS-based global object compression |
US11093453B1 (en) | 2017-08-31 | 2021-08-17 | EMC IP Holding Company LLC | System and method for asynchronous cleaning of data objects on cloud partition in a file system with deduplication |
US11100051B1 (en) * | 2013-03-15 | 2021-08-24 | Comcast Cable Communications, Llc | Management of content |
US11113153B2 (en) | 2017-07-27 | 2021-09-07 | EMC IP Holding Company LLC | Method and system for sharing pre-calculated fingerprints and data chunks amongst storage systems on a cloud local area network |
US11461269B2 (en) | 2017-07-21 | 2022-10-04 | EMC IP Holding Company | Metadata separated container format |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110258245A1 (en) * | 2010-04-14 | 2011-10-20 | International Business Machines Corporation | Performing A Local Reduction Operation On A Parallel Computer |
US20140136789A1 (en) * | 2011-09-20 | 2014-05-15 | Netapp Inc. | Host side deduplication |
-
2013
- 2013-03-13 US US13/799,325 patent/US9244623B1/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110258245A1 (en) * | 2010-04-14 | 2011-10-20 | International Business Machines Corporation | Performing A Local Reduction Operation On A Parallel Computer |
US20140136789A1 (en) * | 2011-09-20 | 2014-05-15 | Netapp Inc. | Host side deduplication |
Non-Patent Citations (1)
Title |
---|
Rosenblum, Mendel, "The Design and Implementation of a Log-Structured File System", Feb. 1992, pp. 26-52. * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11100051B1 (en) * | 2013-03-15 | 2021-08-24 | Comcast Cable Communications, Llc | Management of content |
US20160224595A1 (en) * | 2015-01-29 | 2016-08-04 | HGST Netherlands B.V. | Hardware Efficient Fingerprinting |
US10078646B2 (en) * | 2015-01-29 | 2018-09-18 | HGST Netherlands B.V. | Hardware efficient fingerprinting |
US10108659B2 (en) | 2015-01-29 | 2018-10-23 | Western Digital Technologies, Inc. | Hardware efficient rabin fingerprints |
US10078643B1 (en) | 2017-03-23 | 2018-09-18 | International Business Machines Corporation | Parallel deduplication using automatic chunk sizing |
US11157453B2 (en) | 2017-03-23 | 2021-10-26 | International Business Machines Corporation | Parallel deduplication using automatic chunk sizing |
US10621144B2 (en) | 2017-03-23 | 2020-04-14 | International Business Machines Corporation | Parallel deduplication using automatic chunk sizing |
US10795859B1 (en) | 2017-04-13 | 2020-10-06 | EMC IP Holding Company LLC | Micro-service based deduplication |
US10795860B1 (en) | 2017-04-13 | 2020-10-06 | EMC IP Holding Company LLC | WAN optimized micro-service based deduplication |
US10459633B1 (en) | 2017-07-21 | 2019-10-29 | EMC IP Holding Company LLC | Method for efficient load balancing in virtual storage systems |
US10860212B1 (en) | 2017-07-21 | 2020-12-08 | EMC IP Holding Company LLC | Method or an apparatus to move perfect de-duplicated unique data from a source to destination storage tier |
US10936543B1 (en) | 2017-07-21 | 2021-03-02 | EMC IP Holding Company LLC | Metadata protected sparse block set for SSD cache space management |
US10949088B1 (en) * | 2017-07-21 | 2021-03-16 | EMC IP Holding Company LLC | Method or an apparatus for having perfect deduplication, adapted for saving space in a deduplication file system |
US11461269B2 (en) | 2017-07-21 | 2022-10-04 | EMC IP Holding Company | Metadata separated container format |
US11113153B2 (en) | 2017-07-27 | 2021-09-07 | EMC IP Holding Company LLC | Method and system for sharing pre-calculated fingerprints and data chunks amongst storage systems on a cloud local area network |
US10481813B1 (en) | 2017-07-28 | 2019-11-19 | EMC IP Holding Company LLC | Device and method for extending cache operational lifetime |
US10929382B1 (en) | 2017-07-31 | 2021-02-23 | EMC IP Holding Company LLC | Method and system to verify integrity of a portion of replicated data |
US11093453B1 (en) | 2017-08-31 | 2021-08-17 | EMC IP Holding Company LLC | System and method for asynchronous cleaning of data objects on cloud partition in a file system with deduplication |
US11093176B2 (en) * | 2019-04-26 | 2021-08-17 | EMC IP Holding Company LLC | FaaS-based global object compression |
CN112527186A (en) * | 2019-09-18 | 2021-03-19 | 华为技术有限公司 | Storage system, storage node and data storage method |
CN112527186B (en) * | 2019-09-18 | 2023-09-08 | 华为技术有限公司 | Storage system, storage node and data storage method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9244623B1 (en) | Parallel de-duplication of data chunks of a shared data object using a log-structured file system | |
US9477682B1 (en) | Parallel compression of data chunks of a shared data object using a log-structured file system | |
US11416452B2 (en) | Determining chunk boundaries for deduplication of storage objects | |
CN104871155B (en) | Optimizing data block size for deduplication | |
US9251160B1 (en) | Data transfer between dissimilar deduplication systems | |
CN116431072A (en) | Accessible fast durable storage integrated into mass storage device | |
EP3376393B1 (en) | Data storage method and apparatus | |
US9501488B1 (en) | Data migration using parallel log-structured file system middleware to overcome archive file system limitations | |
Manogar et al. | A study on data deduplication techniques for optimized storage | |
US10261946B2 (en) | Rebalancing distributed metadata | |
US9471582B2 (en) | Optimized pre-fetch ordering using de-duplication information to enhance network performance | |
US10242021B2 (en) | Storing data deduplication metadata in a grid of processors | |
CN116601596A (en) | Selecting segments for garbage collection using data similarity | |
US9965487B2 (en) | Conversion of forms of user data segment IDs in a deduplication system | |
US10255288B2 (en) | Distributed data deduplication in a grid of processors | |
US20160371295A1 (en) | Removal of reference information for storage blocks in a deduplication system | |
US10963177B2 (en) | Deduplication using fingerprint tries | |
US11314432B2 (en) | Managing data reduction in storage systems using machine learning | |
Nicolae | Leveraging naturally distributed data redundancy to reduce collective I/O replication overhead | |
Kumar et al. | Differential Evolution based bucket indexed data deduplication for big data storage | |
Shieh et al. | De-duplication approaches in cloud computing environment: a survey | |
US10521400B1 (en) | Data reduction reporting in storage systems | |
US20200249862A1 (en) | System and method for optimal order migration into a cache based deduplicated storage array | |
Karthika et al. | Perlustration on techno level classification of deduplication techniques in cloud for big data storage | |
US9836475B2 (en) | Streamlined padding of deduplication repository file systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: EMC CORPORATION, MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BENT, JOHN M.;FAIBISH, SORIN;SIGNING DATES FROM 20130422 TO 20130521;REEL/FRAME:030575/0860 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLATERAL AGENT, NORTH CAROLINA Free format text: SECURITY AGREEMENT;ASSIGNORS:ASAP SOFTWARE EXPRESS, INC.;AVENTAIL LLC;CREDANT TECHNOLOGIES, INC.;AND OTHERS;REEL/FRAME:040134/0001 Effective date: 20160907 Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT, TEXAS Free format text: SECURITY AGREEMENT;ASSIGNORS:ASAP SOFTWARE EXPRESS, INC.;AVENTAIL LLC;CREDANT TECHNOLOGIES, INC.;AND OTHERS;REEL/FRAME:040136/0001 Effective date: 20160907 Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLAT Free format text: SECURITY AGREEMENT;ASSIGNORS:ASAP SOFTWARE EXPRESS, INC.;AVENTAIL LLC;CREDANT TECHNOLOGIES, INC.;AND OTHERS;REEL/FRAME:040134/0001 Effective date: 20160907 Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., A Free format text: SECURITY AGREEMENT;ASSIGNORS:ASAP SOFTWARE EXPRESS, INC.;AVENTAIL LLC;CREDANT TECHNOLOGIES, INC.;AND OTHERS;REEL/FRAME:040136/0001 Effective date: 20160907 |
|
AS | Assignment |
Owner name: EMC IP HOLDING COMPANY LLC, MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:EMC CORPORATION;REEL/FRAME:040203/0001 Effective date: 20160906 |
|
CC | Certificate of correction | ||
AS | Assignment |
Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., T Free format text: SECURITY AGREEMENT;ASSIGNORS:CREDANT TECHNOLOGIES, INC.;DELL INTERNATIONAL L.L.C.;DELL MARKETING L.P.;AND OTHERS;REEL/FRAME:049452/0223 Effective date: 20190320 Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., TEXAS Free format text: SECURITY AGREEMENT;ASSIGNORS:CREDANT TECHNOLOGIES, INC.;DELL INTERNATIONAL L.L.C.;DELL MARKETING L.P.;AND OTHERS;REEL/FRAME:049452/0223 Effective date: 20190320 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
AS | Assignment |
Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., TEXAS Free format text: SECURITY AGREEMENT;ASSIGNORS:CREDANT TECHNOLOGIES INC.;DELL INTERNATIONAL L.L.C.;DELL MARKETING L.P.;AND OTHERS;REEL/FRAME:053546/0001 Effective date: 20200409 |
|
AS | Assignment |
Owner name: WYSE TECHNOLOGY L.L.C., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: SCALEIO LLC, MASSACHUSETTS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: MOZY, INC., WASHINGTON Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: MAGINATICS LLC, CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: FORCE10 NETWORKS, INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: EMC IP HOLDING COMPANY LLC, TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: EMC CORPORATION, MASSACHUSETTS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: DELL SYSTEMS CORPORATION, TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: DELL SOFTWARE INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: DELL MARKETING L.P., TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: DELL INTERNATIONAL, L.L.C., TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: DELL USA L.P., TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: CREDANT TECHNOLOGIES, INC., TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: AVENTAIL LLC, CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: ASAP SOFTWARE EXPRESS, INC., ILLINOIS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 |
|
AS | Assignment |
Owner name: SCALEIO LLC, MASSACHUSETTS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: EMC IP HOLDING COMPANY LLC (ON BEHALF OF ITSELF AND AS SUCCESSOR-IN-INTEREST TO MOZY, INC.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: EMC CORPORATION (ON BEHALF OF ITSELF AND AS SUCCESSOR-IN-INTEREST TO MAGINATICS LLC), MASSACHUSETTS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: DELL MARKETING CORPORATION (SUCCESSOR-IN-INTEREST TO FORCE10 NETWORKS, INC. AND WYSE TECHNOLOGY L.L.C.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: DELL INTERNATIONAL L.L.C., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: DELL USA L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: DELL MARKETING L.P. (ON BEHALF OF ITSELF AND AS SUCCESSOR-IN-INTEREST TO CREDANT TECHNOLOGIES, INC.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: DELL MARKETING CORPORATION (SUCCESSOR-IN-INTEREST TO ASAP SOFTWARE EXPRESS, INC.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 |
|
AS | Assignment |
Owner name: SCALEIO LLC, MASSACHUSETTS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: EMC IP HOLDING COMPANY LLC (ON BEHALF OF ITSELF AND AS SUCCESSOR-IN-INTEREST TO MOZY, INC.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: EMC CORPORATION (ON BEHALF OF ITSELF AND AS SUCCESSOR-IN-INTEREST TO MAGINATICS LLC), MASSACHUSETTS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: DELL MARKETING CORPORATION (SUCCESSOR-IN-INTEREST TO FORCE10 NETWORKS, INC. AND WYSE TECHNOLOGY L.L.C.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: DELL INTERNATIONAL L.L.C., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: DELL USA L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: DELL MARKETING L.P. (ON BEHALF OF ITSELF AND AS SUCCESSOR-IN-INTEREST TO CREDANT TECHNOLOGIES, INC.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: DELL MARKETING CORPORATION (SUCCESSOR-IN-INTEREST TO ASAP SOFTWARE EXPRESS, INC.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |