US7293035B2 - System and method for performing compression/encryption on data such that the number of duplicate blocks in the transformed data is increased - Google Patents
System and method for performing compression/encryption on data such that the number of duplicate blocks in the transformed data is increased Download PDFInfo
- Publication number
- US7293035B2 US7293035B2 US10/880,843 US88084304A US7293035B2 US 7293035 B2 US7293035 B2 US 7293035B2 US 88084304 A US88084304 A US 88084304A US 7293035 B2 US7293035 B2 US 7293035B2
- Authority
- US
- United States
- Prior art keywords
- data
- chunk
- marker
- offset
- working
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99942—Manipulating data structure, e.g. compression, compaction, compilation
Definitions
- the present invention relates to data processing systems. More particularly, the present invention relates to a system and a method for applying desired transformations to data such that the number of duplicate chunks in the transformed data is increased and the chunks are predominantly of a predetermined size. Additionally, the present invention provides a technique for determining the unique and duplicate chunks of transformed data.
- transformations include compression for reducing the overall data size, encryption for preventing unauthorized access to data, and various forms of encoding for supporting different character sets (e.g., uuencode).
- Many transformations are stateful, meaning that the transformed data depends not only on the data being transformed, but also on some state that typically depends on previous transformed data. With stateful transformations, any change in the data trickles down beyond the point of change in the transformed data. Accordingly, the transformed data of an updated object after the point of change tends to be different from the corresponding transformed data of the original object. Consequently, the number of duplicate portions would be greatly reduced after a stateful transformation even though a significant amount of the data may be duplicative.
- a block is a chunk of data having a fixed size for a given data processing system.
- the chunks can then be transformed individually and duplicate blocks are detected in the transformed data.
- Such an approach is expensive because the data is processed twice and two layers of mapping are required for the data. Further, the effectiveness of such an approach is limited because the transformed chunks are likely to straddle block boundaries and markers tend not to appear consistently in real data.
- the present invention provides a technique of applying desired transformations to data such that the number of duplicate chunks in the transformed data is increased and the chunks are predominantly of a fixed size. Additionally, the present invention provides a technique for determining the duplicate chunks of transformed data.
- the present invention provides a method for determining unique chunks of data and duplicate chunks of data of a transformed set of data that has been transformed, such as by compression, encryption and/or encoding.
- a group of data is selected from the set of data, such that the selected group of data has a beginning and an end and is continuous between the beginning and the end of the selected group of data, and such that a working chunk of transformed data generated from the selected group of data is of size equal to a predetermined size. Then, it is determined whether the working chunk is a duplicate chunk of data.
- the process repeats by selecting a next group of data, generating a next working chunk of data from the next selected group of data, and evaluating whether the next working chunk of data is a duplicate chunk of data.
- the beginning of the next group of data is immediately after the end of a preceding selected group of data that generated a working chunk of data that was a duplicate chunk of data.
- the beginning of the next group of data is a predetermined number of data units after the beginning of a preceding selected group of data that generated a working chunk of data that was not a duplicate chunk of data.
- the data units can be, for example, a bit, a byte or a word.
- the data of the data set between the end of the last selected group of data that generated a working chunk that was a duplicate chunk of data and the beginning of the next selected group of data that generated a working chunk that was a duplicate chunk of data is processed as follows.
- a group of data is selected from this data such that the selected group of data has a beginning and an end and is continuous between the beginning and the end of the selected group of data, and such that a working chunk of transformed data generated from the selected group of data is of size equal to or less than a predetermined size.
- This working chunk is classified as a unique chunk of data.
- the current process then repeats by selecting a next group of data such that the beginning of the next group of data is immediately after the end of a preceding selected group of data.
- at most one unique chunk of data can have a size that is less than the predetermined size.
- the working chunk of data is considered a duplicate chunk of data in a probabilistic sense.
- determining whether the working chunk of data is a duplicate chunk of data includes computing a mathematical value based on the working chunk of data and comparing the mathematical value to contents of a data structure such as a hash table.
- the mathematical value for a chunk of data classified as a unique chunk of data is stored in the data structure.
- An alternative exemplary embodiment provides that the mathematical value is based on a cryptographic hash.
- the mathematical value is stored for a predetermined period of time.
- the data structure has a maximum predetermined size, and the oldest value is removed from the data structure when a mathematical value for the working chunk is stored in the data structure and causes the data structure to exceed the maximum predetermined size.
- Another exemplary embodiment of the present invention provides that the determination of whether the working chunk of data is a duplicate chunk of data is based on a checksum generated from the working chunk of data.
- a further alternative exemplary embodiment provides that the determination of whether the working chunk of data is a duplicate chunk of data is based on a comparison of the working chunk of data to previously seen chunks of data.
- One exemplary embodiment of the present invention provides that when a group of data is selected, a marker is located in the working chunk of data. A mathematical function of data around the marker is then computed and it is determined whether a remembered offset is greater than or equal to the current offset of the marker from an edge of the working chunk of data when the computed mathematical function of data around the marker has been previously seen. A number y is set to be equal to the predetermined size minus a quantity of a remembered offset minus the current offset of the marker in the working chunk of data when the remembered offset is greater than or equal to the current offset of the marker. The number y is set to be equal to the remembered offset minus the current offset of the marker in the working chunk of data when the remembered offset is less than the current offset of the marker.
- the number y is set to be equal to the offset of the marker when the computed mathematical function of data around the marker has not been previously seen.
- the number y is set to be equal to the predetermined size when the working chunk does not contain the marker.
- a number x is set to be equal to an offset in the untransformed set of data corresponding to the offset of the number y in the working chunk of data.
- the location of the beginning of the next group of data is shifted by x data units.
- the marker is located within the selected group of data rather than the corresponding working chunk.
- FIG. 1A depicts an exemplary data stream R that is transformed, such as by compression and/or by encryption and/or encoding, into transformed data stream Rt for illustrating the present invention
- FIGS. 1B-1I depict a sequence of steps according to the present invention of transforming an exemplary data stream R′ and identifying duplicate chunks and unique chunks resulting from the transformation;
- FIG. 2 shows a flow chart of a process according to the present invention for transforming a data stream R such that the transformed data stream Rt has many duplicative chunks and such that the unique chunks resulting from the transformation are identified;
- FIG. 3 shows a flow chart of a process according to the present invention for using data markers for matching and aligning data chunks.
- the present invention provides a system and a method that applies transformations, such as compression and/or encryption and/or encoding, to data so that the transformed data contains or is likely to contain many duplicate chunks of data of a preferred size. Additionally, the present invention provides a technique for identifying the chunks of data that are duplicates or are likely to be duplicates of chunks of transformed data that have been previously seen.
- transformations such as compression and/or encryption and/or encoding
- FIG. 1A depicts an exemplary data stream, or set of data, R that is transformed, such as by compression and/or by encryption and/or encoding, into a transformed data stream, or transformed set of data, Rt for illustrating the present invention.
- Data stream R is shown as having variable-sized groups of data that have been transformed into chunks of data that are each of size k.
- block means, in particular, a chunk of data of size k.
- data groups 1 - 5 each have different sizes
- data groups 1 - 5 are respectively transformed into blocks 1 - 5 , which are each of size k.
- FIG. 1B depicts exemplary data stream R after undergoing a change that has caused some data to be inserted between data groups 1 and 2 to form a second data stream R′.
- the present invention identifies duplicate chunks after data stream R′ has been transformed by using a window that slides, or moves, across data stream R′ until a match is found with previously seen data.
- the window size varies dynamically and is selected so that as the window moves across data stream R′, chunks of transformed data are created that are predominantly of the fixed size k.
- the transformed data cannot be exactly of size k, in which case the largest window is selected so that the data within the window after transformation is of a size that is as close to, but smaller than k.
- data d has been inserted near the beginning of data stream R to form data stream R′. It should be understood that the technique of the present invention is applicable when a data stream, or set of data, undergoes any change, including a change that causes new data to be inserted or data to be removed from any portion of a data stream, or set of data, R.
- FIG. 2 shows a flow chart 200 of a process according to the present invention for transforming a data stream R′ and for identifying duplicative chunks (i.e., chunks of data that have been previously seen) and unique chunks of data resulting from the transformation.
- a pointer or cursor
- a transformation of data stream R′ is computed beginning at the cursor and ending at the point where the size of the transformed data equals k, which is indicated as “X 1 ” for this example.
- the window used by the present invention extends between points X 0 and X 1 .
- the transformed chunk is referred to herein as block A.
- the transformed chunk i.e., block A
- block A is compared to previously remembered chunks, such as the chunks that were formed when data stream R was originally transformed ( FIG. 1A ). If, at step 203 , it is determined that block A has been previously encountered, flow continues to step 204 where block A is designated as a duplicate chunk, or block.
- block A in transformed data stream R′t ( FIG. 1B ) is identical to block 1 in transformed data stream Rt. Flow returns to step 201 for processing the rest of data stream R′.
- Steps 201 through 204 are repeated for the next portion of data stream R′, which is shown in FIG. 1C .
- the cursor is set to the beginning of unprocessed data stream R′, indicated as X 0 .
- transformation of data stream R′ is computed beginning at the cursor and ending at the point where the size of the transformed data equals k, indicated as X 1 . Note that the window created between X 0 and X 1 is larger for this particular group of data stream R′ than the window that was used to form chunk 1 .
- the transformed chunk i.e., “new” block A
- previously remembered chunks such as the chunks that were formed when data stream R was originally transformed and the chunks that were formed from the portion of data stream R′ that has already been processed.
- step 203 If, at step 203 , it is determined that “new” block A does not match any previously remembered chunks, which is the case for this example, flow continues to step 205 .
- step 205 the window is shifted forward through data stream R′ by x bytes, for example, one byte.
- step 206 the window size is readjusted so that the data within the window after transformation will be of size k. That is, the size of the chunk of the transformed data that is computed from the beginning of the new location (X 0 ) of the cursor in data stream R′ and ending at another selected location (X 1 ) in R′ equals k.
- 1D depicts the shifting and of the window forward through data stream R′ and resizing to form a new block A of size k.
- a transform of the residual data is computed (not shown in FIG. 1D ).
- the residual data is the data that is between the point where the cursor started when flow went from step 203 to step 204 (i.e., X 2 ) to the current starting point of the cursor (i.e., X 0 ).
- a new transform is started whenever the chunk size of the transformed data equals k. For the situation depicted in FIG. 1D , the size of the transformed residual data would be less that size k.
- the resulting chunks of the transformed residual data are referred to herein as B 1 , B 2 , . . . , etc.
- step 208 it is determined whether block A has been seen before. If, at step 208 , it is determined that block A has not been seen before, flow continues to step 210 where it is determined whether the size of chunk B 1 is k. If, at step 210 , the size of chunk B 1 is not of size k, flow returns to step 205 .
- FIG. 1E depicts the shifting of the window (step 205 ), and the resizing of the window and the transformation of the group of data within the window to form a new block A (step 206 ).
- FIGS. 1F-1H depict a sequence of continued shifting of the window and of the resizing of the window, and the transformation of the group of data within the window to form another new block A as steps 205 - 208 and 210 are performed.
- step 208 If, at step 208 , it is determined that block A has been seen before ( FIG. 1H ), flow continues to step 209 where block A is designated as a duplicate block. Flow continues to step 211 where block B 1 (B 2 , . . . etc., if the case) is designated as a unique chunk. Block B 1 (B 2 , . . . , etc.) is remembered at step 212 . Flow continues to step 201 to process the rest of data stream R′. Note that Step 211 could be performed before step 209 in order to preserve the original sequence of data in the transformed chunks. If at step 210 , the size of chunk B 1 is k, flow continues to step 211 where block B 1 (B 2 , . . .
- step 212 block B 1 (B 2 , . . . , etc.) is remembered and flow continues to step 201 for processing the rest of data stream R′.
- step 201 for processing the rest of data stream R′.
- only chunks that are of size k are remembered unless the chunk is the last in a data stream Rt.
- FIG. 1I depicts data groups 2 - 5 being transformed into chunks 2 - 5 , each of size k. Chunks B 1 and B 2 are identified as new unique chunks. For this example, block B 1 is of size k, while block B 2 is of size less that k.
- the present invention continues shifting a window and adjusting the size of the window until a match is found for the group of data within the window after transformation, or until the residue or data over which the window has already passed has a transformed size that is of size k or larger.
- the present invention provides several alternative embodiments for determining whether a chunk of transformed data matches existing data.
- the determination at steps 203 and 208 of whether a chunk of data has been previously seen or is likely to have been previously seen is performed by computing a mathematical function of the data and a data structure such as a hash table is examined for the computed value.
- a chunk of data is designated as unique, the corresponding computed value of the data chunk is entered into the data structure at step 212 .
- the previously remembered chunk or chunks that a particular chunk of data corresponds to can be optionally remembered. To accomplish this, the addresses are tracked of chunks corresponding to each computed value in the data structure.
- a series of mathematical functions, or tests could be used for accelerating the determination of which chunks are identical to previously remembered chunks and to which particular previously remembered chunks.
- the series of tests have increasing levels of accuracy. The least accurate test is performed first and when that test is positive, the next, more accurate, test is performed, and so on. As the accuracy increases, the probability of false positives in identifying duplicate data decreases, but the cost of performing the test accordingly rises.
- the least accurate test could use as the mathematical function a rolling checksum, such as disclosed by A. Tridgell et al., “The rsync algorithm,” Technical Report TR-CS-96-05, Australian National University, 1996.
- the next, more accurate, test could use a cryptographic hash, such as SHA1, for the mathematical function. See, for example, National Institute of Standards and Technology, FIPS 180-1, Secure Hash Standard, US Department of Commerce, April 1995.
- the most accurate test could be an actual comparison of the data in the chunks.
- the present invention attempts to divide the data into consecutive chunks of transformed data such that each of the chunks after transformation is of a preferred size k.
- the present invention tries to shift the chunking in case portions of previously seen data are offset differently with respect to each other. To determine the shift amount, all possible shift positions are tested.
- steps 206 and 207 can be performed by extending the results from a prior iteration of the respective steps. For example, instead of computing the transform of the residual data from scratch on each iteration of step 207 , we can instead maintain the transform of the residual data seen so far and only transform the additional x bytes on a new iteration.
- step 206 cannot be performed by undoing the contribution of the data that has been shifted over and adding the contribution of the data that has been shifted in. In other words, it is likely that extending the results from a previous iteration of step 206 cannot be efficiently performed.
- an exemplary alternative embodiment of the present invention provides an improvement that fundamentally speeds up the determination of the amount the window should shift.
- the alternate embodiment of the present invention utilizes marker offsets from a chunk boundary for shifting the window.
- deterministic positions i.e., data-dependent positions
- the computed values can be matched up efficiently and the shift amount can then be calculated without testing all possibilities.
- One or more specific patterns or markers in the data are looked for to find the deterministic positions in the data.
- a marker may be a sequence of bytes in which some mathematical function of the sequence of bytes results in a certain bit pattern.
- a marker may be as simple as a full stop or a period. The only requirement is that a marker should appear reasonably consistently throughout the data.
- Rabin's fingerprint such as disclosed by M. O. Rabin, “Fingerprinting by random polynomials,” Technical Report TR-15-81, Harvard Aiken Computation Laboratory, 1981, which is incorporated by reference herein, is computed looking for the positions in the data in which the last few (n) bits of the computed fingerprint are zeros.
- the expected separation of the deterministic positions can be controlled to be close to or less than k.
- the neighborhood and offset of the markers in the chunks are remembered, and the information is used for controlling how the window slides over the data stream.
- FIG. 3 shows a flow chart 300 of a process according to the present invention for using data markers for matching and aligning data chunks.
- the steps shown in FIG. 3 would replace step 205 in FIG. 2 .
- the steps shown in FIG. 3 may be performed immediately after step 202 in FIG. 2 instead of replacing step 205 in FIG. 2 .
- step 212 further includes remembering the first marker in each chunk and the offset of that marker from the chunk boundary.
- the data of block A is searched for the next marker.
- a marker is associated with a mathematical function of the data around the marker for identifying the marker.
- the mathematical function that is used is a cryptographic hash.
- step 304 it is determined whether the computed value for the marker has been seen before.
- whether the computed value has been seen before is determined by examining a data structure such as a hash table for the computed value. When a computed value has not been seen before, it is entered into the data structure.
- step 305 it is determined whether the remembered offset from the chunk boundary the last time this particular marker was seen is greater than or equal to the current offset of this particular marker. If so, flow continues to step 306 where y is set equal to k minus the quantity of the remembered offset minus the current offset of the marker. Flow continues to step 307 where x is set equal to the offset in the untransformed data corresponding to the offset y in block A. Flow continues to step 308 where the cursor is shifted by x bytes.
- step 305 If, at step 305 , it is determined that the remembered offset from the chunk boundary the last time this particular marker was seen is not greater than or equal to the current offset of this particular marker, flow continues to step 309 where y is set equal to the quantity of the remembered offset minus the current offset of the marker. Flow continues to step 307 .
- step 304 If, at step 304 , it is determined that the computed value for the marker has not been seen before, flow continues to step 310 where y is set equal to the offset marker. Flow continues to step 307 .
- step 302 If, at step 302 , it is determined that the block A does not have the next marker, flow continues to step 311 where y is set to equal k. Flow continues to step 307 .
- markers can be used in the untransformed data, in which case step 307 is omitted and the steps shown in FIG. 3 are performed on data stream R′ before being transformed to R′t.
- the untransformed data may not be as randomly distributed so that the specific pattern being used as the marker may not occur as consistently throughout the data.
- looking for markers in the untransformed data would entail processing more data.
- the present invention can use the untransformed version of the data to determine whether a chunk of data has been seen before, but when the transformation is many to one (e.g., lossy compression), doing so would miss some duplicate chunks in the transformed data.
- the remembered information is forgotten with the passage of time so that only the information pertaining to data processed during a preceding period of time is remembered.
- the data structures have a maximum size and the oldest information in the data structures is removed whenever the structures exceed a maximum size.
- the oldest information is not forgotten, but is archived, for example on low cost storage, and brought back when necessary, such as to recover from error.
- the set of data that is to be processed by the present invention may be incrementally increased over time.
- the remembered information e.g., hash tables
- the remembered information may be stored in persistent storage, such as disks. New data may be added to the remembered information as it is processed. Additionally, the data to be processed may be geographically distributed, and the remembered information may be moved to a different location for efficient processing and storage.
- the present invention has been described in terms of a technique of applying desired transformations to data such that the number of duplicate chunks in the transformed data is increased and the chunks are predominantly of a fixed size and for determining the duplicate chunks of transformed data, it should be understood that the present invention can be embodied as program steps that are executed by a computer and/or a state machine.
- the present invention can be embodied as a service for applying desired transformations to data such that the number of duplicate chunks in the transformed data is increased and the chunks are predominantly of a fixed size and for determining the duplicate chunks of transformed data.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (18)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/880,843 US7293035B2 (en) | 2004-06-30 | 2004-06-30 | System and method for performing compression/encryption on data such that the number of duplicate blocks in the transformed data is increased |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/880,843 US7293035B2 (en) | 2004-06-30 | 2004-06-30 | System and method for performing compression/encryption on data such that the number of duplicate blocks in the transformed data is increased |
Publications (2)
Publication Number | Publication Date |
---|---|
US20060004808A1 US20060004808A1 (en) | 2006-01-05 |
US7293035B2 true US7293035B2 (en) | 2007-11-06 |
Family
ID=35515276
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/880,843 Expired - Fee Related US7293035B2 (en) | 2004-06-30 | 2004-06-30 | System and method for performing compression/encryption on data such that the number of duplicate blocks in the transformed data is increased |
Country Status (1)
Country | Link |
---|---|
US (1) | US7293035B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070242745A1 (en) * | 2006-03-03 | 2007-10-18 | Samsung Electronics Co., Ltd. | Method and apparatus to transmit data on PLC network by aggregating data |
US9552162B2 (en) * | 2014-12-08 | 2017-01-24 | Sap Se | Splitting-based approach to control data and storage growth in a computer system |
Families Citing this family (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8868930B2 (en) | 2006-05-31 | 2014-10-21 | International Business Machines Corporation | Systems and methods for transformation of logical data objects for storage |
WO2007138601A2 (en) | 2006-05-31 | 2007-12-06 | Storwize Ltd. | Method and system for transformation of logical data objects for storage |
US8209334B1 (en) * | 2007-12-28 | 2012-06-26 | Don Doerner | Method to direct data to a specific one of several repositories |
US8645333B2 (en) * | 2008-05-29 | 2014-02-04 | International Business Machines Corporation | Method and apparatus to minimize metadata in de-duplication |
US8484162B2 (en) | 2008-06-24 | 2013-07-09 | Commvault Systems, Inc. | De-duplication systems and methods for application-specific data |
US8495161B2 (en) * | 2008-12-12 | 2013-07-23 | Verizon Patent And Licensing Inc. | Duplicate MMS content checking |
US9071843B2 (en) * | 2009-02-26 | 2015-06-30 | Microsoft Technology Licensing, Llc | RDP bitmap hash acceleration using SIMD instructions |
US8930306B1 (en) | 2009-07-08 | 2015-01-06 | Commvault Systems, Inc. | Synchronized data deduplication |
US8868575B2 (en) * | 2010-01-13 | 2014-10-21 | International Business Machines Corporation | Method and system for transformation of logical data objects for storage |
US8364652B2 (en) | 2010-09-30 | 2013-01-29 | Commvault Systems, Inc. | Content aligned block-based deduplication |
US8578109B2 (en) | 2010-09-30 | 2013-11-05 | Commvault Systems, Inc. | Systems and methods for retaining and using data block signatures in data protection operations |
US9020900B2 (en) | 2010-12-14 | 2015-04-28 | Commvault Systems, Inc. | Distributed deduplicated storage system |
US20120150818A1 (en) | 2010-12-14 | 2012-06-14 | Commvault Systems, Inc. | Client-side repository in a networked deduplicated storage system |
US10795766B2 (en) * | 2012-04-25 | 2020-10-06 | Pure Storage, Inc. | Mapping slice groupings in a dispersed storage network |
US9218374B2 (en) | 2012-06-13 | 2015-12-22 | Commvault Systems, Inc. | Collaborative restore in a networked storage system |
US9633033B2 (en) | 2013-01-11 | 2017-04-25 | Commvault Systems, Inc. | High availability distributed deduplicated storage system |
US10380072B2 (en) | 2014-03-17 | 2019-08-13 | Commvault Systems, Inc. | Managing deletions from a deduplication database |
US9633056B2 (en) | 2014-03-17 | 2017-04-25 | Commvault Systems, Inc. | Maintaining a deduplication database |
US11249858B2 (en) | 2014-08-06 | 2022-02-15 | Commvault Systems, Inc. | Point-in-time backups of a production application made accessible over fibre channel and/or ISCSI as data sources to a remote application by representing the backups as pseudo-disks operating apart from the production application and its host |
US9852026B2 (en) | 2014-08-06 | 2017-12-26 | Commvault Systems, Inc. | Efficient application recovery in an information management system based on a pseudo-storage-device driver |
US9575673B2 (en) | 2014-10-29 | 2017-02-21 | Commvault Systems, Inc. | Accessing a file system using tiered deduplication |
US10339106B2 (en) | 2015-04-09 | 2019-07-02 | Commvault Systems, Inc. | Highly reusable deduplication database after disaster recovery |
US20160350391A1 (en) | 2015-05-26 | 2016-12-01 | Commvault Systems, Inc. | Replication using deduplicated secondary copy data |
US9766825B2 (en) | 2015-07-22 | 2017-09-19 | Commvault Systems, Inc. | Browse and restore for block-level backups |
US20170192868A1 (en) | 2015-12-30 | 2017-07-06 | Commvault Systems, Inc. | User interface for identifying a location of a failed secondary storage device |
US10296368B2 (en) | 2016-03-09 | 2019-05-21 | Commvault Systems, Inc. | Hypervisor-independent block-level live browse for access to backed up virtual machine (VM) data and hypervisor-free file-level recovery (block-level pseudo-mount) |
US10740193B2 (en) | 2017-02-27 | 2020-08-11 | Commvault Systems, Inc. | Hypervisor-independent reference copies of virtual machine payload data based on block-level pseudo-mount |
US10664352B2 (en) | 2017-06-14 | 2020-05-26 | Commvault Systems, Inc. | Live browsing of backed up data residing on cloned disks |
US11010258B2 (en) | 2018-11-27 | 2021-05-18 | Commvault Systems, Inc. | Generating backup copies through interoperability between components of a data storage management system and appliances for data storage and deduplication |
US11698727B2 (en) | 2018-12-14 | 2023-07-11 | Commvault Systems, Inc. | Performing secondary copy operations based on deduplication performance |
US20200327017A1 (en) | 2019-04-10 | 2020-10-15 | Commvault Systems, Inc. | Restore using deduplicated secondary copy data |
US11463264B2 (en) | 2019-05-08 | 2022-10-04 | Commvault Systems, Inc. | Use of data block signatures for monitoring in an information management system |
US11449325B2 (en) * | 2019-07-30 | 2022-09-20 | Sony Interactive Entertainment LLC | Data change detection using variable-sized data chunks |
US11442896B2 (en) | 2019-12-04 | 2022-09-13 | Commvault Systems, Inc. | Systems and methods for optimizing restoration of deduplicated data stored in cloud-based storage resources |
US11687424B2 (en) | 2020-05-28 | 2023-06-27 | Commvault Systems, Inc. | Automated media agent state management |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6799258B1 (en) * | 2001-01-10 | 2004-09-28 | Datacore Software Corporation | Methods and apparatus for point-in-time volumes |
US6895415B1 (en) * | 1999-08-18 | 2005-05-17 | International Business Machines Corporation | System and method for concurrent distributed snapshot management |
-
2004
- 2004-06-30 US US10/880,843 patent/US7293035B2/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6895415B1 (en) * | 1999-08-18 | 2005-05-17 | International Business Machines Corporation | System and method for concurrent distributed snapshot management |
US6799258B1 (en) * | 2001-01-10 | 2004-09-28 | Datacore Software Corporation | Methods and apparatus for point-in-time volumes |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070242745A1 (en) * | 2006-03-03 | 2007-10-18 | Samsung Electronics Co., Ltd. | Method and apparatus to transmit data on PLC network by aggregating data |
US8406298B2 (en) * | 2006-03-03 | 2013-03-26 | Samsung Electronics Co., Ltd. | Method and apparatus to transmit data on PLC network by aggregating data |
US9552162B2 (en) * | 2014-12-08 | 2017-01-24 | Sap Se | Splitting-based approach to control data and storage growth in a computer system |
Also Published As
Publication number | Publication date |
---|---|
US20060004808A1 (en) | 2006-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7293035B2 (en) | System and method for performing compression/encryption on data such that the number of duplicate blocks in the transformed data is increased | |
US9690802B2 (en) | Stream locality delta compression | |
US7478113B1 (en) | Boundaries | |
US20050091234A1 (en) | System and method for dividing data into predominantly fixed-sized chunks so that duplicate data chunks may be identified | |
US9727573B1 (en) | Out-of core similarity matching | |
US10380073B2 (en) | Use of solid state storage devices and the like in data deduplication | |
EP2256934B1 (en) | Method and apparatus for content-aware and adaptive deduplication | |
US8165221B2 (en) | System and method for sampling based elimination of duplicate data | |
US9262280B1 (en) | Age-out selection in hash caches | |
US7587401B2 (en) | Methods and apparatus to compress datasets using proxies | |
US8751462B2 (en) | Delta compression after identity deduplication | |
EP1259883B1 (en) | Method and system for updating an archive of a computer file | |
CN108255647B (en) | High-speed data backup method under samba server cluster | |
US6810398B2 (en) | System and method for unorchestrated determination of data sequences using sticky byte factoring to determine breakpoints in digital sequences | |
US7814149B1 (en) | Client side data deduplication | |
US20120303595A1 (en) | Data restoration method for data de-duplication | |
US10366072B2 (en) | De-duplication data bank | |
US11157188B2 (en) | Detecting data deduplication opportunities using entropy-based distance | |
US11314598B2 (en) | Method for approximating similarity between objects | |
CN112506877B (en) | Data deduplication method, device and system based on deduplication domain and storage equipment | |
TWI442223B (en) | The data recovery method of the data de-duplication | |
Majed et al. | Cloud based industrial file handling and duplication removal using source based deduplication technique | |
Udayashankar et al. | The Impact of Low-Entropy on Chunking Techniques for Data Deduplication | |
CN117813591A (en) | Deduplication of strong and weak hashes using cache evictions | |
WO2006098720A1 (en) | Methods and apparatus to compress datasets using proxies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HSU, WINDSOR WEE SUN;ONG, SHAUCHI;REEL/FRAME:015538/0800 Effective date: 20040628 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: GOOGLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:026664/0866 Effective date: 20110503 |
|
REMI | Maintenance fee reminder mailed | ||
LAPS | Lapse for failure to pay maintenance fees | ||
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20151106 |
|
AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044142/0357 Effective date: 20170929 |