US20170090786A1 - Distributed and Deduplicating Data Storage System and Methods of Use - Google Patents
Distributed and Deduplicating Data Storage System and Methods of Use Download PDFInfo
- Publication number
- US20170090786A1 US20170090786A1 US14/864,850 US201514864850A US2017090786A1 US 20170090786 A1 US20170090786 A1 US 20170090786A1 US 201514864850 A US201514864850 A US 201514864850A US 2017090786 A1 US2017090786 A1 US 2017090786A1
- Authority
- US
- United States
- Prior art keywords
- chunks
- signature
- data stream
- input data
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 94
- 238000013500 data storage Methods 0.000 title description 7
- 238000013507 mapping Methods 0.000 claims abstract description 7
- 238000012986 modification Methods 0.000 claims description 22
- 230000004048 modification Effects 0.000 claims description 22
- 230000001815 facial effect Effects 0.000 claims description 9
- 238000005516 engineering process Methods 0.000 description 36
- 238000003860 storage Methods 0.000 description 35
- 230000008569 process Effects 0.000 description 26
- 230000006870 function Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 12
- 230000002123 temporal effect Effects 0.000 description 10
- 230000015654 memory Effects 0.000 description 9
- 238000004590 computer program Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 230000010076 replication Effects 0.000 description 6
- 101100217298 Mus musculus Aspm gene Proteins 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 230000002093 peripheral effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1471—Saving, restoring, recovering or retrying involving logging of persistent data for recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
- G06F11/1453—Management of the data involved in backup or backup restore using de-duplication of the data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/128—Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/162—Delete operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2358—Change logging, detection, and notification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2365—Ensuring data consistency and integrity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0619—Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
- G06F3/0641—De-duplication techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0643—Management of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1464—Management of the backup or restore process for networked environments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/84—Using snapshots, i.e. a logical point-in-time copy of the data
Definitions
- the present technology may be generally described as providing systems and methods for distributing and deduplicating data storage.
- Creating large backup data stores that are efficient in terms of data storage and data retrieval are complex processes, especially for systems that store petabytes of data or greater. Additional complexities are introduced when these large backup data stores use deduplication, such as when only unique data blocks are stored. Additionally, backup data stores that use deduplication are not currently suitable for storing data using, for example, distributed hash tables (“DHT”) as the DHT may destroy the locality of the data and the index used to track the data as it is distributed to the data store.
- DHT distributed hash tables
- the present technology may be directed to methods that comprise: (a) generating a signature for at least a portion of an input data stream, the signature including a representation of data included in the input data stream; (b) comparing the signature to signatures of data included in a deduplicated backup data store; (c) selecting a signature based upon the step of comparing the signature to signatures of data included in a deduplicated backup data store; (d) comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream; and (e) distributing the unique data to the deduplicated backup data store.
- the present technology may be directed to methods that comprise: (a) receiving an input data stream; (b) segmenting the input data stream into chunks; (c) creating a signature for each of the chunks; (d) distributing each chunk to one of a plurality of containers, each container comprising a container identifier; and (e) creating a locality index that includes a mapping of a chunk signature and a container identifier.
- the present technology may be directed to systems that comprise: (a) a processor; (b) logic encoded in one or more tangible media for execution by the processor and when executed operable to perform operations comprising: (i) generating a signature for at least a portion of an input data stream, the signature including a representation of data included in the input data stream; (ii) comparing the signature to signatures of data included in a deduplicated backup data store; (iii) selecting a signature based upon the step of comparing the signature to signatures of data included in a deduplicated backup data store; (iv) comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream; and (v) distributing the unique data to the deduplicated backup data store.
- the present technology may be directed to a non-transitory machine-readable storage medium having embodied thereon a program.
- the program may be executed by a machine to perform a method.
- the method may comprise: (a) generating a signature for at least a portion of an input data stream, the signature including a representation of data included in the input data stream; (b) comparing the signature to signatures of data included in a deduplicated backup data store; (c) selecting a signature based upon the step of comparing the signature to signatures of data included in a deduplicated backup data store; (d) comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream; and (e) distributing the unique data to the deduplicated backup data store.
- the present technology may be directed to methods that comprise: (a) receiving an input data stream; (b) separating the input data stream into chunks; (c) performing one or more of an exact and an approximate matching of the chunks of the input data stream to chunks stored in a deduplicated backup data store to determine unique chunks; (d) determining one or more locations in the deduplicated backup data store for the unique chunks; (e) updating an index to include the unique chunks with their locations; and (f) distributing the unique chunks to the deduplicated backup data store according to the index.
- FIG. 1 is a block diagram of an exemplary architecture in which embodiments of the present technology may be practiced
- FIG. 2 is a flowchart of an exemplary method of exact matching of chunks of data to determine unique chunks
- FIG. 3 is a flowchart of an exemplary method for providing a distributed and deduplicated data store
- FIG. 4 is a flowchart of an example method of the present technology.
- FIG. 5 is another example method of the present technology for storing input streams from two separate file modification operations of a client.
- FIG. 6 illustrates an exemplary computing system that may be used to implement embodiments according to the present technology.
- some storage systems may deduplicate block storage, where only unique data blocks are stored. This allows the system to reduce the overall amount of data blocks stored compared to systems that store complete data sets.
- each backup e.g., snapshot or mirror
- each backup taken of a physical system must be stored in order to allow the physical system to be restored back to a given point in time in the past, as described above.
- DHT distributed hash tables
- chunks may be distributed into a data storage cloud.
- each block of data may be hashed to form the index key for a DHT and the data itself is stored as the value of the key.
- the combination of data blocks and hash values are used to create a DHT. While the effectiveness of the methods and systems described herein may be advantageously leveraged within systems or processes that use DHTs, the present technology is not limited to these types of systems and processes. Thus, descriptions of DHTs included herein are merely provided as an exemplary use of the present technology.
- DHTs While storage of data using DHTs can be effective in load balancing IO load across distributed nodes, unfortunately, when a DHT is used the temporal locality of the data is not maintained spatially on the disk. This is, in part, due to the fact that DHTs use the hash of the data to determine the location of the data and cryptographic hashes are by design random. For example, when multiple snapshots of a physical system are taken over time, random operations are performed on the snapshots when DHTs are used. These random operations are inefficient when compared to sequential operations. In short, DHTs are less than optimal for building deduplicated storage systems. That is, deduplicated storage systems rely on the maintaining temporal locality of the data spatially on the disk.
- locality can be described in terms of temporality or space. For example, if a user modifies multiple files at the same time, it will be understood or assumed that the modification of these files is related to one another. By way of example, the user could be updating multiple spreadsheets within a given period of time. These spreadsheets may all be related to the same project or task that the user is working on. These file changes can be transmitted over the network efficiently in an input stream. The present technology will store these changes spatially together on the backup store, but their spatial proximity to one another on the backup store is due to their temporal adjacency relating to how they are used.
- a DHT may randomly distribute the changes to the files anywhere in the backup store, which increases data fragmentation and slows down retrieval.
- the backup store when one file is requested from the backup store, the backup store will automatically pre-fetch the files that were determined to be changed at the same time the requested file. Again, this benefit is possible because temporal locality (context) is determined and maintained. Even if the user does not utilize the additional files, the likelihood that they may be utilized is sufficient to justify pre-fetching the files in anticipation of use.
- these processes greatly improve file retrieval and replication methods of backup stores.
- the index created for the blocks of the changed files also maintains context and locality due to the manner in which it is created.
- the updates to the index occur temporally when changes are transferred to the backup store.
- Architecture 100 may include a deduplicated backup data store 105 ; hereinafter “data store 105 .”
- the data store 105 may be implemented within a cloud-based computing environment.
- a cloud-based computing environment is a resource that typically combines the computational power of a large model of processors and/or that combines the storage capacity of a large model of computer memories or storage devices.
- systems that provide a cloud resource may be utilized exclusively by their owners; or such systems may be accessible to outside users who deploy applications within the computing infrastructure to obtain the benefit of large computational or storage resources.
- the cloud may be formed, for example, by a network of servers, with each server (or at least a plurality thereof) providing processor and/or storage resources. These servers may manage workloads provided by multiple users (e.g., cloud resource consumers or other users). Typically, each user places workload demands upon the cloud that vary in real-time, sometimes dramatically. The nature and extent of these variations typically depend on the type of business associated with the user.
- the data store 105 may include a block store 115 that stores unique blocks of data for one or more objects, such as a file, a group of files, or an entire disk.
- the block store 115 may comprise a plurality of containers 120 a - n , which are utilized to store data chunks that are separated from the input data stream, as will be described in greater detail below.
- the term “container” may also be referred to as an “extent.”
- objects written to the block store 115 are immutable.
- a new object identifier may be generated and provided back to the object owner.
- the responsibility of implementing a traditional interface where object identifiers do not change on update is facilitated by the application/client.
- the data store 105 may provide ‘mutable’ metadata storage where the client/application can manage immutable objects which are mapped to mutable object identifiers and other application specific metadata.
- the block store 115 may include immutable object addressable block storage.
- the block store 115 may form an underlying storage foundation that allows for the storing of blocks of objects.
- the identifiers of the blocks are a unique representation of the object, generated for example by using an SHA1 hash function.
- the present technology may also use other cryptographic hash functions that would be known to one of ordinary skill in the art with the present disclosure before them.
- the architecture 100 may include a deduplication system, hereinafter referred to as system 125 that provides distributed and deduplicated data storage.
- the system 125 receives input data streams from a client device 130 .
- an input data stream may include a snapshot or an incremental file for the client device 130 .
- the client device may include an end user computing system, an appliance, such as a backup appliance, a server, or any other computing device that may include objects such as files, directories, disks, and so forth.
- the API may encapsulate messages and their respective operations, allowing for efficient writing of objects over a network, such as network 135 .
- the network 135 may comprise a local area network (“LAN”), a wide area network (“WAN”), or any other private or public network, such as the Internet.
- the system 125 may divide or separate an input data stream into a plurality of chunks, also referred to as blocks, segments, pieces, and so forth. Any method for separating the input data stream into chunks that would be known to one of ordinary skill in the art may also likewise be utilized in accordance with the present technology.
- containers 120 a - n which may also be referred to as blobs.
- Containers 120 a - n may be filled with chunks, which are received sequentially around the same time thus maintaining temporal locality also spatial locality within the same container.
- each of the chunks may be encrypted or otherwise hashed so as to create a unique identifier for the chunk of data.
- a chunk may be hashed using SHA1 to produce a SHA1 key value for the chunk.
- the input data stream may arrive at the system 125 in an already-chunked manner.
- each of the hashed chunk values may be incorporated by the system 125 into Merkel nodes and the Merkel nodes may be arrange into a Merkel tree at the data store 105 .
- the system 125 may generate a signature for each extent using other technologies than cryptographic hashing functions.
- the signature is a representation of the data included in the extent.
- the system 125 may apply an algorithm that is similar to an algorithm used for facial recognition. For example, in facial recognition, a signature for a face of an individual included in an image file may be generated. This signature may be compare facial signatures in other image files to determine if facial signatures included these additional image files corresponds to the facial signature of the individual.
- the “signature” is a mathematical representation of the unique facial features of the individual. These unique facial features convert into unique mathematical values that may be used to locate the individual in other image files.
- extents include data chunks that can be distinguished from other chunks on the basis of unique data features.
- a signature for an extent would include mathematical representations of these unique features such that comparing a signature for the extent to other signatures of other extents may allow for the system 125 to determine similar or dissimilar extents.
- chunks are placed sequentially (in order received relative to the input stream) into containers 120 a - n and each chunk is provided with a unique identifier, such as a hash value, locality of the chunks may be maintained.
- a locality index may be managed by the system 125 that maps each chunk to its corresponding container based upon the chunk identifier.
- locality of data chunks is a function of the order in which the chunks are received, as well as the chunk identifiers used to distinguish chunks from one another.
- the locality index may comprise a sparse index when the locality index becomes too large and cumbersome to maintain in memory.
- the sparse index may map only the chunk signature with a container identifier.
- the system 125 may split the locality index into chunks and these chunks may also be stored in the containers, along with the chunks created from the input stream.
- system 125 may also manage a container index for each container that provides an exact or approximate location for each chunk within the container.
- the index may specify the offset and length of each chunk within the container.
- the system 125 may also separate the subsequent input streams into chunks and generate signatures for these chunks.
- signatures for chunks of a subsequent input data stream are compared to signatures for chunks of a previous input data stream, differences deduced by the system 125 in these signatures may indicate that data in a particular chunk has changed.
- the system 125 may then obtain these changed chunks and store data from these changed chunks in the data store 105 .
- the ability for the system 125 to recognize changed data allows the system 125 to store only unique data in the data store 105 (e.g., deduplicated data).
- the system 125 may employ either exact or approximated deduplication methods. In some instances, the system 125 may also use approximated deduplication methods initially, followed by a more robust exact matching deduplication method at a later time, as a means of verification.
- the system 125 may compare the signature of an extent to signature for similar extents store in the data store 105 . Any difference in signatures between similar extents for the same object such as a file, indicate that the data of the object has changed.
- the system 125 may establish rules that allow the system 125 to quickly process input data streams to determine if unique data blocks exist in the input data stream. If the comparison between signatures indicates that the input data stream is not likely to include unique data, the system 125 may ignore the input data stream. Conversely, if the comparison between signatures indicates that the input data stream is not likely to include unique data, the system 125 may further examine the input data stream to determine which chunks of data have changed.
- the system 125 may also process the input data stream using the exact deduplication method described below.
- the system 125 may compare signatures of chunks of an input data stream to node signatures of similar chunks stored in the data store 105 .
- the system 125 may check matches at the chunk or extent level using hash values associated with chunks. That is, each block or chunk of data included in an extent may be associated with its own signature of identifier.
- the chunk may include a unique hash value of the data included in a particular chunk of data. Any change in data of a chunk will change the hash value of the chunk.
- the system 125 can use the comparison of the signatures of the chunks to determine if data has changed in a chunk.
- the system 125 may load the input data stream and selected data from the data store 105 into cache memory. Processing the input data stream and selected data from the data store 105 may allow for faster and more efficient data analysis by the system 125 .
- the system 125 may utilize information indicative of the client device or object stored on the client device to “warm up” the data loaded into the cache. That is, instead of examining an entire input data stream, the system 125 may understand that the input data stream came from a particular customer or client device. Additionally, the system 125 may know that the input data stream refers to a particular object. Thus, the system 125 may not need to compare signatures for each block (e.g., chunk) of a client device to determine unique blocks. The system 125 , in effect, narrows the comparison down to the most likely candidate chunks or extents stored in the data store 105 .
- block e.g., chunk
- the system 105 may select extents by comparing root (or head) signatures for a chunk of an input data stream to root (or head) signatures of extents stored in the data store 105 . Extents that have matching signatures may be ignored as the blocks corresponding thereto are already present. This process is known as deduplication. That is, only unique data need be transmitted and stored after its identification.
- the system 125 may determine an appropriate location for the unique block(s) in the data store 105 and update an index to include metadata indicative of a location of the unique block(s). The unique block(s) may then be distributed by the system 125 to the data store 105 according to the locations recorded in the index.
- the system 125 may store links to multiple containers into a single index.
- This single index may be referred to as a locality sensitive index.
- the locality sensitive index is an index that allows various local indices to be tied together into a single index, thus preserving the locality of the individual indices while allowing for interrelation of the same.
- the system 125 allows for the use of chunks while preserving the index and locality required for the deduplicated backup data store, as described in greater detail above.
- FIG. 2 illustrates an exemplary method for maintaining locality of an input stream of data.
- the method may comprise an initial step 205 of receiving an input stream, such as a backup of a local machine.
- the method may comprise a step 210 of splitting the input stream into a plurality of chunks, according to any desired process.
- the method may comprise an optional step 215 of creating an identifier for each chunk. As mentioned above, this identifier may comprise a signature or a cryptographic hash value.
- the method may comprise a step 220 of placing each of the chunks into a container in a sequential manner.
- Each container may be assigned a size and when the container is full, additional chunks may be placed into an open container.
- containers may be filled sequentially.
- the method may include a step 225 of generating a locality index that maps the container in which a chunk is placed. Again, this locality is based on the temporal adjacency of the chunks in the input stream due to their association with a particular file modification process occurring on the client.
- chunk “locality” within the system is a function of both the order in which the chunk is received relative to the input stream, as well as a container location of the chunk after placement into a container. Locality preservation is enhanced by tracking chunks using their calculated, created, or assigned identifier. For example, a SHA1 key value for a chunk may be linked to the container in which the chunk has been placed.
- the method may comprise a step 230 of generating a container index that includes a location of the chunks within their respective containers.
- the container index may include an offset and a length for each chunk in the container.
- FIG. 3 is a flowchart of an exemplary method for managing a deduplicated backup data store.
- the method may comprise a step 305 of storing an initial backup of a client device such as an end user computing system.
- the initial backup may comprise not only blocks of data but also associated Merkle nodes, which when combined with the blocks of data comprise a distributed hash table.
- the Merkle node is a representation or hash value of the names of the individual data blocks that comprise the files of the client.
- the method may then comprise a step 310 of receiving an input data stream from the client device.
- the method may separate the input data stream into chunks in step 315 .
- the method may then include a step 320 of hashing the chunks to create a key to index the data block.
- the index may include not only the hashes of data blocks, but also hashes of Merkle nodes. As mentioned previously, sequential chunks may be combined into an extent to maintain their temporal relatedness (which enables and enhances pre-fetching as needed). The extent itself may also be hashed.
- the method may include a step 325 of approximating deduplication of the chunks (or extent) by generating a signature for the input data stream.
- This signature may be compared against the signatures of other extents stored in the deduplicated backup data store. Again, the comparison of signatures may be performed at the chunk level or alternatively at the extent level.
- the method may comprise a step 330 of selecting a signature based upon the step of comparing the signature to signatures of extents.
- the method may comprise a step 335 of comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique blocks included in the at least a portion of the input data stream. This delineation between unique and non-unique data chunks is used in deduplicating the input data stream to ensure that only unique chunks (e.g., changed data) are stored in deduplicated backup data store.
- the method may comprise a step 340 of updating an index to reflect the inclusion of the new unique chunks in the deduplicated backup data store.
- the index provides a location of the unique blocks, which have been distributed to the deduplicated backup data store in a step 345 .
- step 345 may also include a plurality of DHTs which are linked together using a locality sensitive index that preserves locality and index of each DHT.
- the input data stream is created when a user performs a file modification process to one or more files.
- the user may edit several spreadsheets at the same time (or in close temporal proximity, such as within a few seconds or minutes of one another).
- the plurality of files need not be the same type.
- the user can edit a spreadsheet and word processing document together.
- the changes to these files would be assembled and streamed as an input data stream.
- the input data stream can be checked against the stored signature for the client to determine what parts of the input data stream need be stored in the backup store.
- the input data stream can be transmitted as the file modifications occur or only after a signature comparison has been completed. For example, a prior signature of a backup for the client may have been taken at an earlier point in time. A comparison of a new signature for the client against the old signature stored on the file replication store (e.g., backup store) would indicate that the files were modified. The changed data would then be transmitted over the network to the file replication store.
- a prior signature of a backup for the client may have been taken at an earlier point in time.
- a comparison of a new signature for the client against the old signature stored on the file replication store e.g., backup store
- the changed data would then be transmitted over the network to the file replication store.
- the method includes a step of generating 405 an input signature for at least a portion of an input data stream from a client.
- the input signature is a representation of data included in the input data stream.
- the method also includes a step of comparing 410 the input signature to stored signatures of data included in a deduplicated backup data store. This process allows the system to find the signature of the client that was previously stored on the backup store.
- the method includes the system selecting 415 a stored signature based upon the step of comparing the input signature to the stored signatures of data included in a deduplicated backup data store.
- the method includes comparing 420 data associated with the selected stored signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream.
- the method includes distributing the unique data to the deduplicated backup data store.
- the unique data that has not been stored previously is transmitted over the network to the backup data store.
- This method provides a network optimization technique, ensuring that only new, unique data is transmitted over the network for any given backup or replication procedure.
- input data streams are transmitted to the backup data store only upon the occurrence of a file modification process occurring on the client.
- a new input data stream is created and transmitted for storage.
- FIG. 5 illustrates an example method for storing input data streams of multiple file modification operations that occur on a client.
- a first file modification process occurs at a first point in time. This first file modification process occurs for a first set of files.
- a second file modification process occurs for a second set of files.
- Temporal context and locality can be maintained for each of these file modification processes by storing the data in the input data streams in their own extents (e.g., containers).
- the method can begin with a step of receiving 505 a first input data stream at a first point in time.
- the first point in time is associated with a first file modification operation for a first set of files occurring on a client.
- the method includes segmenting 510 the first input data stream into chunks, as well as creating 515 a signature for each of the chunks. Indeed, this could include creating a Sha1 hash value, as an example.
- the method includes distributing 520 each chunk to one of a first plurality of containers.
- Each container comprises a container identifier and the first plurality of containers is proximate to one another on a backup data store.
- the temporal locality of the chunks in the input data stream are represented as spatial locality on the backup data store.
- the method includes creating 525 a locality index that includes a mapping of a chunk signature and a container identifier. To be sure, the chunk signatures and container identifiers for each of the chunks are related to one another because they were created from the first input data stream.
- the method includes receiving 530 a second input data stream at a second point in time.
- the second and first points in time are different from one another because they are associated with different file modification processes.
- the second point in time is associated with a second file modification operation for a second set of files occurring on a client.
- the method includes segmenting 535 the second input data stream into chunks, and creating 540 a signature for each of the chunks.
- the method comprises distributing 545 each chunk to one of a second plurality of containers.
- each container comprises a container identifier.
- the second plurality of containers is proximate to one another on a backup data store for ease of retrieval and pre-fetching as described above.
- the method also includes creating 550 a locality index that includes a mapping of a chunk signature and a container identifier. Again, the chunk signatures and container identifiers for each of the chunks are related to one another because they were created from the second input data stream.
- FIG. 6 illustrates an exemplary computing system 600 that may be used to implement an embodiment of the present technology.
- the computing system 600 of FIG. 6 includes one or more processors 610 and memory 620 .
- Main memory 620 stores, in part, instructions and data for execution by processor 610 .
- Main memory 620 can store the executable code when the system 600 is in operation.
- the system 600 of FIG. 6 may further include a mass storage device 630 , portable storage medium drive(s) 640 , output devices 650 , user input devices 660 , a graphics display 670 , and other peripheral devices 680 .
- the system 600 may also comprise network storage 645 .
- FIG. 6 The components shown in FIG. 6 are depicted as being connected via a single bus 690 .
- the components may be connected through one or more data transport means.
- Processor unit 610 and main memory 620 may be connected via a local microprocessor bus, and the mass storage device 630 , peripheral device(s) 680 , portable storage device 640 , and graphics display 670 may be connected via one or more input/output (I/O) buses.
- I/O input/output
- Mass storage device 630 which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 610 . Mass storage device 630 can store the system software for implementing embodiments of the present technology for purposes of loading that software into main memory 620 .
- Portable storage device 640 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk or digital video disc, to input and output data and code to and from the computing system 600 of FIG. 6 .
- the system software for implementing embodiments of the present technology may be stored on such a portable medium and input to the computing system 600 via the portable storage device 640 .
- Input devices 660 provide a portion of a user interface.
- Input devices 660 may include an alphanumeric keypad, such as a keyboard, for inputting alphanumeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys.
- the system 600 as shown in FIG. 6 includes output devices 650 . Suitable output devices include speakers, printers, network interfaces, and monitors.
- Graphics display 670 may include a liquid crystal display (LCD) or other suitable display device. Graphics display 670 receives textual and graphical information, and processes the information for output to the display device.
- LCD liquid crystal display
- Peripherals 680 may include any type of computer support device to add additional functionality to the computing system.
- Peripheral device(s) 680 may include a modem or a router.
- the components contained in the computing system 600 of FIG. 6 are those typically found in computing systems that may be suitable for use with embodiments of the present technology and are intended to represent a broad category of such computer components that are well known in the art.
- the computing system 600 of FIG. 6 can be a personal computer, hand held computing system, telephone, mobile computing system, workstation, server, minicomputer, mainframe computer, or any other computing system.
- the computer can also include different bus configurations, networked platforms, multi-processor platforms, etc.
- Various operating systems can be used including UNIX, Linux, Windows, Macintosh OS, Palm OS, and other suitable operating systems.
- Some of the above-described functions may be composed of instructions that are stored on storage media (e.g., computer-readable medium).
- the instructions may be retrieved and executed by the processor.
- Some examples of storage media are memory devices, tapes, disks, and the like.
- the instructions are operational when executed by the processor to direct the processor to operate in accord with the technology. Those skilled in the art are familiar with instructions, processor(s), and storage media.
- Non-volatile media include, for example, optical or magnetic disks, such as a fixed disk.
- Volatile media include dynamic memory, such as system RAM.
- Transmission media include coaxial cables, copper wire and fiber optics, among others, including the wires that comprise one embodiment of a bus.
- Transmission media can also take the form of acoustic or light waves, such as those generated during radio frequency (RF) and infrared (IR) data communications.
- RF radio frequency
- IR infrared
- Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic medium, a CD-ROM disk, digital video disk (DVD), any other optical medium, any other physical medium with patterns of marks or holes, a RAM, a PROM, an EPROM, an EEPROM, a FLASHEPROM, any other memory chip or data exchange adapter, a carrier wave, or any other medium from which a computer can read.
- a bus carries the data to system RAM, from which a CPU retrieves and executes the instructions.
- the instructions received by system RAM can optionally be stored on a fixed disk either before or after execution by a CPU.
- Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- LAN local area network
- WAN wide area network
- Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
- These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- This non-provisional U.S. patent application is related to non-provisional U.S. patent application Ser. No. 13/889,164, filed on May 7, 2013, entitled “Cloud Storage Using Merkle Trees,” which is hereby incorporated by reference herein in its entirety.
- The present technology may be generally described as providing systems and methods for distributing and deduplicating data storage.
- Creating large backup data stores that are efficient in terms of data storage and data retrieval are complex processes, especially for systems that store petabytes of data or greater. Additional complexities are introduced when these large backup data stores use deduplication, such as when only unique data blocks are stored. Additionally, backup data stores that use deduplication are not currently suitable for storing data using, for example, distributed hash tables (“DHT”) as the DHT may destroy the locality of the data and the index used to track the data as it is distributed to the data store.
- According to some embodiments, the present technology may be directed to methods that comprise: (a) generating a signature for at least a portion of an input data stream, the signature including a representation of data included in the input data stream; (b) comparing the signature to signatures of data included in a deduplicated backup data store; (c) selecting a signature based upon the step of comparing the signature to signatures of data included in a deduplicated backup data store; (d) comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream; and (e) distributing the unique data to the deduplicated backup data store.
- According to some embodiments, the present technology may be directed to methods that comprise: (a) receiving an input data stream; (b) segmenting the input data stream into chunks; (c) creating a signature for each of the chunks; (d) distributing each chunk to one of a plurality of containers, each container comprising a container identifier; and (e) creating a locality index that includes a mapping of a chunk signature and a container identifier.
- According to some embodiments, the present technology may be directed to systems that comprise: (a) a processor; (b) logic encoded in one or more tangible media for execution by the processor and when executed operable to perform operations comprising: (i) generating a signature for at least a portion of an input data stream, the signature including a representation of data included in the input data stream; (ii) comparing the signature to signatures of data included in a deduplicated backup data store; (iii) selecting a signature based upon the step of comparing the signature to signatures of data included in a deduplicated backup data store; (iv) comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream; and (v) distributing the unique data to the deduplicated backup data store.
- According to some embodiments, the present technology may be directed to a non-transitory machine-readable storage medium having embodied thereon a program. In some embodiments the program may be executed by a machine to perform a method. The method may comprise: (a) generating a signature for at least a portion of an input data stream, the signature including a representation of data included in the input data stream; (b) comparing the signature to signatures of data included in a deduplicated backup data store; (c) selecting a signature based upon the step of comparing the signature to signatures of data included in a deduplicated backup data store; (d) comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream; and (e) distributing the unique data to the deduplicated backup data store.
- According to some embodiments, the present technology may be directed to methods that comprise: (a) receiving an input data stream; (b) separating the input data stream into chunks; (c) performing one or more of an exact and an approximate matching of the chunks of the input data stream to chunks stored in a deduplicated backup data store to determine unique chunks; (d) determining one or more locations in the deduplicated backup data store for the unique chunks; (e) updating an index to include the unique chunks with their locations; and (f) distributing the unique chunks to the deduplicated backup data store according to the index.
- Certain embodiments of the present technology are illustrated by the accompanying figures. It will be understood that the figures are not necessarily to scale and that details not necessary for an understanding of the technology or that render other details difficult to perceive may be omitted. It will be understood that the technology is not necessarily limited to the particular embodiments illustrated herein.
-
FIG. 1 is a block diagram of an exemplary architecture in which embodiments of the present technology may be practiced; -
FIG. 2 is a flowchart of an exemplary method of exact matching of chunks of data to determine unique chunks; -
FIG. 3 is a flowchart of an exemplary method for providing a distributed and deduplicated data store; and -
FIG. 4 is a flowchart of an example method of the present technology. -
FIG. 5 is another example method of the present technology for storing input streams from two separate file modification operations of a client. -
FIG. 6 illustrates an exemplary computing system that may be used to implement embodiments according to the present technology. - While this technology is susceptible of embodiment in many different forms, there is shown in the drawings and will herein be described in detail several specific embodiments with the understanding that the present disclosure is to be considered as an exemplification of the principles of the technology and is not intended to limit the technology to the embodiments illustrated.
- It will be understood that like or analogous elements and/or components, referred to herein, may be identified throughout the drawings with like reference characters. It will be further understood that several of the figures are merely schematic representations of the present technology. As such, some of the components may have been distorted from their actual scale for pictorial clarity.
- Generally speaking, building large data storage systems that allow for efficient storage and retrieval of data is a complex. In general, when data is received, it may be separated into chunks and the chunks may then be transmitted to a storage system. In some systems these data storage systems create an index for all chunks that are received and distributed. A metadata server may maintain the indexes and perform operations on the chunks. Thus, a malfunction of the metadata server may result in a loss of the chunks stored in the storage system, either actual loss of the data or a loss in the ability to track the location of the data in the storage system.
- Additionally, some storage systems may deduplicate block storage, where only unique data blocks are stored. This allows the system to reduce the overall amount of data blocks stored compared to systems that store complete data sets. When deduplication is not utilized, each backup (e.g., snapshot or mirror) taken of a physical system must be stored in order to allow the physical system to be restored back to a given point in time in the past, as described above.
- While the use of distributed hash tables (“DHT”) to store data is known, the use of DHTs is currently incompatible with systems that deduplicate data blocks. Advantageously, DHTs allow load balancing within storage systems, where chunks may be distributed into a data storage cloud. In one embodiment, each block of data may be hashed to form the index key for a DHT and the data itself is stored as the value of the key. The combination of data blocks and hash values are used to create a DHT. While the effectiveness of the methods and systems described herein may be advantageously leveraged within systems or processes that use DHTs, the present technology is not limited to these types of systems and processes. Thus, descriptions of DHTs included herein are merely provided as an exemplary use of the present technology.
- While storage of data using DHTs can be effective in load balancing IO load across distributed nodes, unfortunately, when a DHT is used the temporal locality of the data is not maintained spatially on the disk. This is, in part, due to the fact that DHTs use the hash of the data to determine the location of the data and cryptographic hashes are by design random. For example, when multiple snapshots of a physical system are taken over time, random operations are performed on the snapshots when DHTs are used. These random operations are inefficient when compared to sequential operations. In short, DHTs are less than optimal for building deduplicated storage systems. That is, deduplicated storage systems rely on the maintaining temporal locality of the data spatially on the disk.
- To be sure, as described herein, locality can be described in terms of temporality or space. For example, if a user modifies multiple files at the same time, it will be understood or assumed that the modification of these files is related to one another. By way of example, the user could be updating multiple spreadsheets within a given period of time. These spreadsheets may all be related to the same project or task that the user is working on. These file changes can be transmitted over the network efficiently in an input stream. The present technology will store these changes spatially together on the backup store, but their spatial proximity to one another on the backup store is due to their temporal adjacency relating to how they are used.
- If these changes are stored in close spatial proximity on the backup store, context (the fact that they were modified together) is maintained. When the user requests this data from the backup store, the replication or retrieval process can be executed efficiently because all changes to the files were stored in close proximity to one another on the backup store. In contrast, a DHT may randomly distribute the changes to the files anywhere in the backup store, which increases data fragmentation and slows down retrieval.
- In some embodiments, when one file is requested from the backup store, the backup store will automatically pre-fetch the files that were determined to be changed at the same time the requested file. Again, this benefit is possible because temporal locality (context) is determined and maintained. Even if the user does not utilize the additional files, the likelihood that they may be utilized is sufficient to justify pre-fetching the files in anticipation of use. Advantageously, these processes greatly improve file retrieval and replication methods of backup stores.
- The index created for the blocks of the changed files also maintains context and locality due to the manner in which it is created. The updates to the index occur temporally when changes are transferred to the backup store.
- These and other advantages of the present technology will be discussed in greater detail herein.
- Referring now to the drawings, and more particularly, to
FIG. 1 , which includes a schematic diagram of anexemplary architecture 100 for practicing the present invention.Architecture 100 may include a deduplicatedbackup data store 105; hereinafter “data store 105.” In some instances, thedata store 105 may be implemented within a cloud-based computing environment. In general, a cloud-based computing environment is a resource that typically combines the computational power of a large model of processors and/or that combines the storage capacity of a large model of computer memories or storage devices. For example, systems that provide a cloud resource may be utilized exclusively by their owners; or such systems may be accessible to outside users who deploy applications within the computing infrastructure to obtain the benefit of large computational or storage resources. - The cloud may be formed, for example, by a network of servers, with each server (or at least a plurality thereof) providing processor and/or storage resources. These servers may manage workloads provided by multiple users (e.g., cloud resource consumers or other users). Typically, each user places workload demands upon the cloud that vary in real-time, sometimes dramatically. The nature and extent of these variations typically depend on the type of business associated with the user.
- In some instances the
data store 105 may include a block store 115 that stores unique blocks of data for one or more objects, such as a file, a group of files, or an entire disk. For example, the block store 115 may comprise a plurality of containers 120 a-n, which are utilized to store data chunks that are separated from the input data stream, as will be described in greater detail below. The term “container” may also be referred to as an “extent.” - In some instances, objects written to the block store 115 are immutable. When the present technology updates an existing object to generate a new object, a new object identifier may be generated and provided back to the object owner.
- In some instances, the responsibility of implementing a traditional interface where object identifiers do not change on update is facilitated by the application/client. In other embodiments, the
data store 105 may provide ‘mutable’ metadata storage where the client/application can manage immutable objects which are mapped to mutable object identifiers and other application specific metadata. - According to some embodiments, the block store 115 may include immutable object addressable block storage. The block store 115 may form an underlying storage foundation that allows for the storing of blocks of objects. The identifiers of the blocks are a unique representation of the object, generated for example by using an SHA1 hash function. The present technology may also use other cryptographic hash functions that would be known to one of ordinary skill in the art with the present disclosure before them.
- The
architecture 100 may include a deduplication system, hereinafter referred to assystem 125 that provides distributed and deduplicated data storage. - In some instances, the
system 125 receives input data streams from aclient device 130. For example, an input data stream may include a snapshot or an incremental file for theclient device 130. The client device may include an end user computing system, an appliance, such as a backup appliance, a server, or any other computing device that may include objects such as files, directories, disks, and so forth. - In some instances the API may encapsulate messages and their respective operations, allowing for efficient writing of objects over a network, such as
network 135. In some instances, thenetwork 135 may comprise a local area network (“LAN”), a wide area network (“WAN”), or any other private or public network, such as the Internet. - The
system 125 may divide or separate an input data stream into a plurality of chunks, also referred to as blocks, segments, pieces, and so forth. Any method for separating the input data stream into chunks that would be known to one of ordinary skill in the art may also likewise be utilized in accordance with the present technology. As each chunk is received (or created), the chunks are passed to containers 120 a-n, which may also be referred to as blobs. Containers 120 a-n may be filled with chunks, which are received sequentially around the same time thus maintaining temporal locality also spatial locality within the same container. Additionally, each of the chunks may be encrypted or otherwise hashed so as to create a unique identifier for the chunk of data. For example, a chunk may be hashed using SHA1 to produce a SHA1 key value for the chunk. In some instances, the input data stream may arrive at thesystem 125 in an already-chunked manner. Optionally, each of the hashed chunk values may be incorporated by thesystem 125 into Merkel nodes and the Merkel nodes may be arrange into a Merkel tree at thedata store 105. - Additional details regarding the creation Merkle trees and the transmission of data over a network using such Merkle trees can be found in co-pending non-provisional U.S. patent application Ser. No. 13/889,164, filed on May 7, 2013, entitled “Cloud Storage Using Merkle Trees,” which is hereby incorporated by reference herein in its entirety.
- According to some embodiments, the
system 125 may generate a signature for each extent using other technologies than cryptographic hashing functions. The signature is a representation of the data included in the extent. In some instances, to generate the signature, thesystem 125 may apply an algorithm that is similar to an algorithm used for facial recognition. For example, in facial recognition, a signature for a face of an individual included in an image file may be generated. This signature may be compare facial signatures in other image files to determine if facial signatures included these additional image files corresponds to the facial signature of the individual. Thus, the “signature” is a mathematical representation of the unique facial features of the individual. These unique facial features convert into unique mathematical values that may be used to locate the individual in other image files. - Similarly, extents include data chunks that can be distinguished from other chunks on the basis of unique data features. A signature for an extent would include mathematical representations of these unique features such that comparing a signature for the extent to other signatures of other extents may allow for the
system 125 to determine similar or dissimilar extents. - Because chunks are placed sequentially (in order received relative to the input stream) into containers 120 a-n and each chunk is provided with a unique identifier, such as a hash value, locality of the chunks may be maintained. A locality index may be managed by the
system 125 that maps each chunk to its corresponding container based upon the chunk identifier. Thus, locality of data chunks is a function of the order in which the chunks are received, as well as the chunk identifiers used to distinguish chunks from one another. - According to some embodiments, the locality index may comprise a sparse index when the locality index becomes too large and cumbersome to maintain in memory. For example, the sparse index may map only the chunk signature with a container identifier. Also, in some instances, the
system 125 may split the locality index into chunks and these chunks may also be stored in the containers, along with the chunks created from the input stream. - In addition to the locality index, the
system 125 may also manage a container index for each container that provides an exact or approximate location for each chunk within the container. For example, the index may specify the offset and length of each chunk within the container. - In some instances, when the
system 125 receives subsequent input data streams (e.g., subsequent snapshots) for theclient device 130, the system may also separate the subsequent input streams into chunks and generate signatures for these chunks. When signatures for chunks of a subsequent input data stream are compared to signatures for chunks of a previous input data stream, differences deduced by thesystem 125 in these signatures may indicate that data in a particular chunk has changed. Thus, thesystem 125 may then obtain these changed chunks and store data from these changed chunks in thedata store 105. The ability for thesystem 125 to recognize changed data allows thesystem 125 to store only unique data in the data store 105 (e.g., deduplicated data). - When comparing signatures and/or data between an input data stream and deduplicated data that is stored in the
data store 105, thesystem 125 may employ either exact or approximated deduplication methods. In some instances, thesystem 125 may also use approximated deduplication methods initially, followed by a more robust exact matching deduplication method at a later time, as a means of verification. - With regard to approximate deduplication methods, the
system 125 may compare the signature of an extent to signature for similar extents store in thedata store 105. Any difference in signatures between similar extents for the same object such as a file, indicate that the data of the object has changed. - In some instances, the
system 125 may establish rules that allow thesystem 125 to quickly process input data streams to determine if unique data blocks exist in the input data stream. If the comparison between signatures indicates that the input data stream is not likely to include unique data, thesystem 125 may ignore the input data stream. Conversely, if the comparison between signatures indicates that the input data stream is not likely to include unique data, thesystem 125 may further examine the input data stream to determine which chunks of data have changed. - For example, if the signature of an input data stream is determined by the
system 125 to be sufficiently different from a signature of an extent for the same object stored in thedata store 105, thesystem 125 may also process the input data stream using the exact deduplication method described below. - With regard to exact match deduplication methods, the
system 125 may compare signatures of chunks of an input data stream to node signatures of similar chunks stored in thedata store 105. Thesystem 125 may check matches at the chunk or extent level using hash values associated with chunks. That is, each block or chunk of data included in an extent may be associated with its own signature of identifier. The chunk may include a unique hash value of the data included in a particular chunk of data. Any change in data of a chunk will change the hash value of the chunk. Thesystem 125 can use the comparison of the signatures of the chunks to determine if data has changed in a chunk. - It will be understood that examining and comparing data streams at the block level via signature comparison allows exact matching, not simply because the comparison is being performed at a more granular level but also because any change in data for the same data block will produce different chunks having different hash values relative to one another.
- According to some embodiments, the
system 125 may load the input data stream and selected data from thedata store 105 into cache memory. Processing the input data stream and selected data from thedata store 105 may allow for faster and more efficient data analysis by thesystem 125. - In some embodiments, the
system 125 may utilize information indicative of the client device or object stored on the client device to “warm up” the data loaded into the cache. That is, instead of examining an entire input data stream, thesystem 125 may understand that the input data stream came from a particular customer or client device. Additionally, thesystem 125 may know that the input data stream refers to a particular object. Thus, thesystem 125 may not need to compare signatures for each block (e.g., chunk) of a client device to determine unique blocks. Thesystem 125, in effect, narrows the comparison down to the most likely candidate chunks or extents stored in thedata store 105. In some instances, thesystem 105 may select extents by comparing root (or head) signatures for a chunk of an input data stream to root (or head) signatures of extents stored in thedata store 105. Extents that have matching signatures may be ignored as the blocks corresponding thereto are already present. This process is known as deduplication. That is, only unique data need be transmitted and stored after its identification. - After unique blocks have been determined from the input data stream, the
system 125 may determine an appropriate location for the unique block(s) in thedata store 105 and update an index to include metadata indicative of a location of the unique block(s). The unique block(s) may then be distributed by thesystem 125 to thedata store 105 according to the locations recorded in the index. - In some instances, the
system 125 may store links to multiple containers into a single index. This single index may be referred to as a locality sensitive index. The locality sensitive index is an index that allows various local indices to be tied together into a single index, thus preserving the locality of the individual indices while allowing for interrelation of the same. Thus, thesystem 125 allows for the use of chunks while preserving the index and locality required for the deduplicated backup data store, as described in greater detail above. -
FIG. 2 illustrates an exemplary method for maintaining locality of an input stream of data. The method may comprise aninitial step 205 of receiving an input stream, such as a backup of a local machine. The method may comprise astep 210 of splitting the input stream into a plurality of chunks, according to any desired process. The method may comprise anoptional step 215 of creating an identifier for each chunk. As mentioned above, this identifier may comprise a signature or a cryptographic hash value. As the input stream is chunked, the method may comprise astep 220 of placing each of the chunks into a container in a sequential manner. - Each container may be assigned a size and when the container is full, additional chunks may be placed into an open container. Thus, containers may be filled sequentially. As chunks are placed into containers, the method may include a
step 225 of generating a locality index that maps the container in which a chunk is placed. Again, this locality is based on the temporal adjacency of the chunks in the input stream due to their association with a particular file modification process occurring on the client. In sum, chunk “locality” within the system is a function of both the order in which the chunk is received relative to the input stream, as well as a container location of the chunk after placement into a container. Locality preservation is enhanced by tracking chunks using their calculated, created, or assigned identifier. For example, a SHA1 key value for a chunk may be linked to the container in which the chunk has been placed. - Additionally, the method may comprise a
step 230 of generating a container index that includes a location of the chunks within their respective containers. As mentioned previously, the container index may include an offset and a length for each chunk in the container. -
FIG. 3 is a flowchart of an exemplary method for managing a deduplicated backup data store. The method may comprise astep 305 of storing an initial backup of a client device such as an end user computing system. The initial backup may comprise not only blocks of data but also associated Merkle nodes, which when combined with the blocks of data comprise a distributed hash table. Again, the Merkle node is a representation or hash value of the names of the individual data blocks that comprise the files of the client. - The method may then comprise a
step 310 of receiving an input data stream from the client device. In some embodiments, the method may separate the input data stream into chunks instep 315. Once separated into chunks the method may then include astep 320 of hashing the chunks to create a key to index the data block. According to some embodiments, the index may include not only the hashes of data blocks, but also hashes of Merkle nodes. As mentioned previously, sequential chunks may be combined into an extent to maintain their temporal relatedness (which enables and enhances pre-fetching as needed). The extent itself may also be hashed. - In some instances, the method may include a
step 325 of approximating deduplication of the chunks (or extent) by generating a signature for the input data stream. This signature may be compared against the signatures of other extents stored in the deduplicated backup data store. Again, the comparison of signatures may be performed at the chunk level or alternatively at the extent level. - Next, the method may comprise a
step 330 of selecting a signature based upon the step of comparing the signature to signatures of extents. After selection of a signature, the method may comprise astep 335 of comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique blocks included in the at least a portion of the input data stream. This delineation between unique and non-unique data chunks is used in deduplicating the input data stream to ensure that only unique chunks (e.g., changed data) are stored in deduplicated backup data store. - In some instances, the method may comprise a step 340 of updating an index to reflect the inclusion of the new unique chunks in the deduplicated backup data store. The index provides a location of the unique blocks, which have been distributed to the deduplicated backup data store in a
step 345. According to some embodiments,step 345 may also include a plurality of DHTs which are linked together using a locality sensitive index that preserves locality and index of each DHT. - Referring now to
FIG. 4 , an example method for storing an input data stream in a de-duplicated manner is illustrated. For context, the input data stream is created when a user performs a file modification process to one or more files. For example, the user may edit several spreadsheets at the same time (or in close temporal proximity, such as within a few seconds or minutes of one another). To be sure, the plurality of files need not be the same type. For example the user can edit a spreadsheet and word processing document together. The changes to these files would be assembled and streamed as an input data stream. In other embodiments, as illustrated inFIG. 4 , the input data stream can be checked against the stored signature for the client to determine what parts of the input data stream need be stored in the backup store. - The input data stream can be transmitted as the file modifications occur or only after a signature comparison has been completed. For example, a prior signature of a backup for the client may have been taken at an earlier point in time. A comparison of a new signature for the client against the old signature stored on the file replication store (e.g., backup store) would indicate that the files were modified. The changed data would then be transmitted over the network to the file replication store.
- Once the input data stream is received, the method of
FIG. 4 is executed. - The method includes a step of generating 405 an input signature for at least a portion of an input data stream from a client. To be sure, the input signature is a representation of data included in the input data stream.
- The method also includes a step of comparing 410 the input signature to stored signatures of data included in a deduplicated backup data store. This process allows the system to find the signature of the client that was previously stored on the backup store.
- The method includes the system selecting 415 a stored signature based upon the step of comparing the input signature to the stored signatures of data included in a deduplicated backup data store.
- To ensure that only changed data that has not already been stored on the backup data store is transmitted to the backup data store, the method includes comparing 420 data associated with the selected stored signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream.
- Next, the method includes distributing the unique data to the deduplicated backup data store. Advantageously, only the unique data that has not been stored previously is transmitted over the network to the backup data store. This method provides a network optimization technique, ensuring that only new, unique data is transmitted over the network for any given backup or replication procedure.
- As mentioned above, input data streams are transmitted to the backup data store only upon the occurrence of a file modification process occurring on the client. Thus, as each file modification process occurs at the client, a new input data stream is created and transmitted for storage.
-
FIG. 5 illustrates an example method for storing input data streams of multiple file modification operations that occur on a client. For purposes of this example, a first file modification process occurs at a first point in time. This first file modification process occurs for a first set of files. At a second point in time, a second file modification process occurs for a second set of files. Temporal context and locality can be maintained for each of these file modification processes by storing the data in the input data streams in their own extents (e.g., containers). - Thus, the method can begin with a step of receiving 505 a first input data stream at a first point in time. The first point in time is associated with a first file modification operation for a first set of files occurring on a client. Next, the method includes segmenting 510 the first input data stream into chunks, as well as creating 515 a signature for each of the chunks. Indeed, this could include creating a Sha1 hash value, as an example.
- Next, the method includes distributing 520 each chunk to one of a first plurality of containers. Each container comprises a container identifier and the first plurality of containers is proximate to one another on a backup data store. Thus, the temporal locality of the chunks in the input data stream are represented as spatial locality on the backup data store.
- Next, the method includes creating 525 a locality index that includes a mapping of a chunk signature and a container identifier. To be sure, the chunk signatures and container identifiers for each of the chunks are related to one another because they were created from the first input data stream.
- After this process is complete, a second file modification process occurs on the client. Thus, a second de-duplicating replication process for this new file modification process ensues.
- The method includes receiving 530 a second input data stream at a second point in time. The second and first points in time are different from one another because they are associated with different file modification processes.
- To be sure, the second point in time is associated with a second file modification operation for a second set of files occurring on a client. Next, the method includes segmenting 535 the second input data stream into chunks, and creating 540 a signature for each of the chunks.
- Next, the method comprises distributing 545 each chunk to one of a second plurality of containers. As mentioned above, each container comprises a container identifier. The second plurality of containers is proximate to one another on a backup data store for ease of retrieval and pre-fetching as described above.
- The method also includes creating 550 a locality index that includes a mapping of a chunk signature and a container identifier. Again, the chunk signatures and container identifiers for each of the chunks are related to one another because they were created from the second input data stream.
-
FIG. 6 illustrates anexemplary computing system 600 that may be used to implement an embodiment of the present technology. Thecomputing system 600 ofFIG. 6 includes one ormore processors 610 andmemory 620.Main memory 620 stores, in part, instructions and data for execution byprocessor 610.Main memory 620 can store the executable code when thesystem 600 is in operation. Thesystem 600 ofFIG. 6 may further include amass storage device 630, portable storage medium drive(s) 640,output devices 650,user input devices 660, agraphics display 670, and otherperipheral devices 680. Thesystem 600 may also comprisenetwork storage 645. - The components shown in
FIG. 6 are depicted as being connected via asingle bus 690. The components may be connected through one or more data transport means.Processor unit 610 andmain memory 620 may be connected via a local microprocessor bus, and themass storage device 630, peripheral device(s) 680,portable storage device 640, and graphics display 670 may be connected via one or more input/output (I/O) buses. -
Mass storage device 630, which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use byprocessor unit 610.Mass storage device 630 can store the system software for implementing embodiments of the present technology for purposes of loading that software intomain memory 620. -
Portable storage device 640 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk or digital video disc, to input and output data and code to and from thecomputing system 600 ofFIG. 6 . The system software for implementing embodiments of the present technology may be stored on such a portable medium and input to thecomputing system 600 via theportable storage device 640. -
Input devices 660 provide a portion of a user interface.Input devices 660 may include an alphanumeric keypad, such as a keyboard, for inputting alphanumeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys. Additionally, thesystem 600 as shown inFIG. 6 includesoutput devices 650. Suitable output devices include speakers, printers, network interfaces, and monitors. - Graphics display 670 may include a liquid crystal display (LCD) or other suitable display device. Graphics display 670 receives textual and graphical information, and processes the information for output to the display device.
-
Peripherals 680 may include any type of computer support device to add additional functionality to the computing system. Peripheral device(s) 680 may include a modem or a router. - The components contained in the
computing system 600 ofFIG. 6 are those typically found in computing systems that may be suitable for use with embodiments of the present technology and are intended to represent a broad category of such computer components that are well known in the art. Thus, thecomputing system 600 ofFIG. 6 can be a personal computer, hand held computing system, telephone, mobile computing system, workstation, server, minicomputer, mainframe computer, or any other computing system. The computer can also include different bus configurations, networked platforms, multi-processor platforms, etc. Various operating systems can be used including UNIX, Linux, Windows, Macintosh OS, Palm OS, and other suitable operating systems. - Some of the above-described functions may be composed of instructions that are stored on storage media (e.g., computer-readable medium). The instructions may be retrieved and executed by the processor. Some examples of storage media are memory devices, tapes, disks, and the like. The instructions are operational when executed by the processor to direct the processor to operate in accord with the technology. Those skilled in the art are familiar with instructions, processor(s), and storage media.
- It is noteworthy that any hardware platform suitable for performing the processing described herein is suitable for use with the technology. The terms “computer-readable storage medium” and “computer-readable storage media” as used herein refer to any medium or media that participate in providing instructions to a CPU for execution. Such media can take many forms, including, but not limited to, non-volatile media, volatile media and transmission media. Non-volatile media include, for example, optical or magnetic disks, such as a fixed disk. Volatile media include dynamic memory, such as system RAM. Transmission media include coaxial cables, copper wire and fiber optics, among others, including the wires that comprise one embodiment of a bus. Transmission media can also take the form of acoustic or light waves, such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic medium, a CD-ROM disk, digital video disk (DVD), any other optical medium, any other physical medium with patterns of marks or holes, a RAM, a PROM, an EPROM, an EEPROM, a FLASHEPROM, any other memory chip or data exchange adapter, a carrier wave, or any other medium from which a computer can read.
- Various forms of computer-readable media may be involved in carrying one or more sequences of one or more instructions to a CPU for execution. A bus carries the data to system RAM, from which a CPU retrieves and executes the instructions. The instructions received by system RAM can optionally be stored on a fixed disk either before or after execution by a CPU.
- Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Exemplary embodiments were chosen and described in order to best explain the principles of the present technology and its practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
- Aspects of the present invention are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
- The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
- While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. The descriptions are not intended to limit the scope of the technology to the particular forms set forth herein. Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above-described exemplary embodiments. It should be understood that the above description is illustrative and not restrictive. To the contrary, the present descriptions are intended to cover such alternatives, modifications, and equivalents as may be included within the spirit and scope of the technology as defined by the appended claims and otherwise appreciated by one of ordinary skill in the art. The scope of the technology should, therefore, be determined not with reference to the above description, but instead should be determined with reference to the appended claims along with their full scope of equivalents.
Claims (18)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/977,614 US20190108103A9 (en) | 2013-05-07 | 2015-12-21 | Computing device replication using file system change detection methods and systems |
US15/360,836 US10284437B2 (en) | 2010-09-30 | 2016-11-23 | Cloud-based virtual machines and offices |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/889,164 US9705730B1 (en) | 2013-05-07 | 2013-05-07 | Cloud storage using Merkle trees |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/889,164 Continuation-In-Part US9705730B1 (en) | 2010-09-30 | 2013-05-07 | Cloud storage using Merkle trees |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/977,614 Continuation-In-Part US20190108103A9 (en) | 2013-05-07 | 2015-12-21 | Computing device replication using file system change detection methods and systems |
Publications (1)
Publication Number | Publication Date |
---|---|
US20170090786A1 true US20170090786A1 (en) | 2017-03-30 |
Family
ID=58409284
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/889,164 Active 2034-11-09 US9705730B1 (en) | 2010-09-30 | 2013-05-07 | Cloud storage using Merkle trees |
US14/864,850 Abandoned US20170090786A1 (en) | 2010-09-30 | 2015-09-24 | Distributed and Deduplicating Data Storage System and Methods of Use |
US14/977,614 Abandoned US20190108103A9 (en) | 2013-05-07 | 2015-12-21 | Computing device replication using file system change detection methods and systems |
US15/599,417 Active US10599533B2 (en) | 2013-05-07 | 2017-05-18 | Cloud storage using merkle trees |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/889,164 Active 2034-11-09 US9705730B1 (en) | 2010-09-30 | 2013-05-07 | Cloud storage using Merkle trees |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/977,614 Abandoned US20190108103A9 (en) | 2013-05-07 | 2015-12-21 | Computing device replication using file system change detection methods and systems |
US15/599,417 Active US10599533B2 (en) | 2013-05-07 | 2017-05-18 | Cloud storage using merkle trees |
Country Status (1)
Country | Link |
---|---|
US (4) | US9705730B1 (en) |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150205815A1 (en) * | 2010-12-14 | 2015-07-23 | Commvault Systems, Inc. | Distributed deduplicated storage system |
US9705730B1 (en) | 2013-05-07 | 2017-07-11 | Axcient, Inc. | Cloud storage using Merkle trees |
US9785647B1 (en) | 2012-10-02 | 2017-10-10 | Axcient, Inc. | File system virtualization |
US9852140B1 (en) | 2012-11-07 | 2017-12-26 | Axcient, Inc. | Efficient file replication |
US9858156B2 (en) | 2012-06-13 | 2018-01-02 | Commvault Systems, Inc. | Dedicated client-side signature generator in a networked storage system |
US9898225B2 (en) | 2010-09-30 | 2018-02-20 | Commvault Systems, Inc. | Content aligned block-based deduplication |
US9934238B2 (en) | 2014-10-29 | 2018-04-03 | Commvault Systems, Inc. | Accessing a file system using tiered deduplication |
US20180109501A1 (en) * | 2016-10-17 | 2018-04-19 | Microsoft Technology Licensing, Llc | Migration containers |
US9998344B2 (en) | 2013-03-07 | 2018-06-12 | Efolder, Inc. | Protection status determinations for computing devices |
US10061663B2 (en) | 2015-12-30 | 2018-08-28 | Commvault Systems, Inc. | Rebuilding deduplication data in a distributed deduplication data storage system |
US10126973B2 (en) | 2010-09-30 | 2018-11-13 | Commvault Systems, Inc. | Systems and methods for retaining and using data block signatures in data protection operations |
US10191816B2 (en) | 2010-12-14 | 2019-01-29 | Commvault Systems, Inc. | Client-side repository in a networked deduplicated storage system |
CN109408279A (en) * | 2017-08-16 | 2019-03-01 | 北京京东尚科信息技术有限公司 | Data back up method and device |
US10229133B2 (en) | 2013-01-11 | 2019-03-12 | Commvault Systems, Inc. | High availability distributed deduplicated storage system |
CN109614036A (en) * | 2018-11-16 | 2019-04-12 | 新华三技术有限公司成都分公司 | The dispositions method and device of memory space |
US10282129B1 (en) | 2017-10-24 | 2019-05-07 | Bottomline Technologies (De), Inc. | Tenant aware, variable length, deduplication of stored data |
US10284437B2 (en) | 2010-09-30 | 2019-05-07 | Efolder, Inc. | Cloud-based virtual machines and offices |
US10339106B2 (en) | 2015-04-09 | 2019-07-02 | Commvault Systems, Inc. | Highly reusable deduplication database after disaster recovery |
US10380072B2 (en) | 2014-03-17 | 2019-08-13 | Commvault Systems, Inc. | Managing deletions from a deduplication database |
US10481825B2 (en) | 2015-05-26 | 2019-11-19 | Commvault Systems, Inc. | Replication using deduplicated secondary copy data |
US10540327B2 (en) | 2009-07-08 | 2020-01-21 | Commvault Systems, Inc. | Synchronized data deduplication |
US10574751B2 (en) * | 2016-03-22 | 2020-02-25 | International Business Machines Corporation | Identifying data for deduplication in a network storage environment |
US10671370B2 (en) * | 2018-05-30 | 2020-06-02 | Red Hat, Inc. | Distributing file system states |
US20200409796A1 (en) * | 2019-06-28 | 2020-12-31 | Rubrik, Inc. | Data management system with limited control of external compute and storage resources |
US10921987B1 (en) * | 2019-07-31 | 2021-02-16 | EMC IP Holding Company LLC | Deduplication of large block aggregates using representative block digests |
US11010485B1 (en) * | 2017-03-02 | 2021-05-18 | Apple Inc. | Cloud messaging system |
US11010258B2 (en) | 2018-11-27 | 2021-05-18 | Commvault Systems, Inc. | Generating backup copies through interoperability between components of a data storage management system and appliances for data storage and deduplication |
US11016859B2 (en) | 2008-06-24 | 2021-05-25 | Commvault Systems, Inc. | De-duplication systems and methods for application-specific data |
US11392553B1 (en) | 2018-04-24 | 2022-07-19 | Pure Storage, Inc. | Remote data management |
US11436344B1 (en) | 2018-04-24 | 2022-09-06 | Pure Storage, Inc. | Secure encryption in deduplication cluster |
US11442896B2 (en) | 2019-12-04 | 2022-09-13 | Commvault Systems, Inc. | Systems and methods for optimizing restoration of deduplicated data stored in cloud-based storage resources |
US11463264B2 (en) | 2019-05-08 | 2022-10-04 | Commvault Systems, Inc. | Use of data block signatures for monitoring in an information management system |
US11604583B2 (en) | 2017-11-28 | 2023-03-14 | Pure Storage, Inc. | Policy based data tiering |
US11675741B2 (en) | 2019-06-28 | 2023-06-13 | Rubrik, Inc. | Adaptable multi-layered storage for deduplicating electronic messages |
US11687424B2 (en) | 2020-05-28 | 2023-06-27 | Commvault Systems, Inc. | Automated media agent state management |
US11698727B2 (en) | 2018-12-14 | 2023-07-11 | Commvault Systems, Inc. | Performing secondary copy operations based on deduplication performance |
US11829251B2 (en) | 2019-04-10 | 2023-11-28 | Commvault Systems, Inc. | Restore using deduplicated secondary copy data |
US11868214B1 (en) * | 2020-02-02 | 2024-01-09 | Veritas Technologies Llc | Methods and systems for affinity aware container prefetching |
Families Citing this family (44)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10325032B2 (en) * | 2014-02-19 | 2019-06-18 | Snowflake Inc. | Resource provisioning systems and methods |
US10223394B1 (en) * | 2015-03-24 | 2019-03-05 | Amazon Technologies, Inc. | Data reconciliation |
KR101977109B1 (en) * | 2015-11-17 | 2019-08-28 | (주)마크애니 | Large simultaneous digital signature service system based on hash function and method thereof |
US10242065B1 (en) * | 2016-06-30 | 2019-03-26 | EMC IP Holding Company LLC | Combining merkle trees in graph databases |
US10867040B2 (en) * | 2016-10-17 | 2020-12-15 | Datto, Inc. | Systems and methods for detecting ransomware infection |
US10909105B2 (en) * | 2016-11-28 | 2021-02-02 | Sap Se | Logical logging for in-memory metadata store |
US10291408B2 (en) | 2016-12-23 | 2019-05-14 | Amazon Technologies, Inc. | Generation of Merkle trees as proof-of-work |
US20180181310A1 (en) * | 2016-12-23 | 2018-06-28 | Cloudendure Ltd. | System and method for disk identification in a cloud based computing environment |
US10511445B1 (en) | 2017-01-05 | 2019-12-17 | Amazon Technologies, Inc. | Signature compression for hash-based signature schemes |
US10608824B1 (en) | 2017-01-09 | 2020-03-31 | Amazon Technologies, Inc. | Merkle signature scheme tree expansion |
US10652330B2 (en) | 2017-01-15 | 2020-05-12 | Google Llc | Object storage in cloud with reference counting using versions |
US11163721B1 (en) * | 2017-04-25 | 2021-11-02 | EMC IP Holding Company LLC | Snapshot change list and file system indexing |
US10387271B2 (en) * | 2017-05-10 | 2019-08-20 | Elastifile Ltd. | File system storage in cloud using data and metadata merkle trees |
US10649852B1 (en) * | 2017-07-14 | 2020-05-12 | EMC IP Holding Company LLC | Index metadata for inode based backups |
US10545696B2 (en) * | 2017-11-14 | 2020-01-28 | Samsung Electronics Co., Ltd. | Data deduplication using KVSSD |
US11177961B2 (en) * | 2017-12-07 | 2021-11-16 | Nec Corporation | Method and system for securely sharing validation information using blockchain technology |
CN108228767B (en) * | 2017-12-27 | 2022-03-15 | 中国地质大学(武汉) | Method and device for directionally deleting files by smart phone and storage device |
US10754737B2 (en) * | 2018-06-12 | 2020-08-25 | Dell Products, L.P. | Boot assist metadata tables for persistent memory device updates during a hardware fault |
US11163750B2 (en) | 2018-09-27 | 2021-11-02 | International Business Machines Corporation | Dynamic, transparent manipulation of content and/or namespaces within data storage systems |
US12093316B2 (en) | 2019-01-31 | 2024-09-17 | Hewlett Packard Enterprise Development Lp | Partial file system instances |
US11474912B2 (en) * | 2019-01-31 | 2022-10-18 | Rubrik, Inc. | Backup and restore of files with multiple hard links |
US11392541B2 (en) | 2019-03-22 | 2022-07-19 | Hewlett Packard Enterprise Development Lp | Data transfer using snapshot differencing from edge system to core system |
US10990675B2 (en) | 2019-06-04 | 2021-04-27 | Datto, Inc. | Methods and systems for detecting a ransomware attack using entropy analysis and file update patterns |
US11347881B2 (en) | 2020-04-06 | 2022-05-31 | Datto, Inc. | Methods and systems for detecting ransomware attack in incremental backup |
US11616810B2 (en) | 2019-06-04 | 2023-03-28 | Datto, Inc. | Methods and systems for ransomware detection, isolation and remediation |
US11048693B2 (en) | 2019-06-05 | 2021-06-29 | International Business Machines Corporation | Resolution of ordering inversions |
CN112887421B (en) * | 2019-07-31 | 2023-07-18 | 创新先进技术有限公司 | Block chain state data synchronization method and device and electronic equipment |
US11467775B2 (en) | 2019-10-15 | 2022-10-11 | Hewlett Packard Enterprise Development Lp | Virtual persistent volumes for containerized applications |
US11392458B2 (en) * | 2019-10-25 | 2022-07-19 | EMC IP Holding Company LLC | Reconstructing lost data objects by generating virtual user files from available nodes within a cluster |
US11461362B2 (en) * | 2020-01-29 | 2022-10-04 | EMC IP Holding Company LLC | Merkle super tree for synchronizing data buckets of unlimited size in object storage systems |
US11455319B2 (en) * | 2020-01-29 | 2022-09-27 | EMC IP Holding Company LLC | Merkle tree forest for synchronizing data buckets of unlimited size in object storage systems |
US11645161B2 (en) | 2020-03-26 | 2023-05-09 | Hewlett Packard Enterprise Development Lp | Catalog of files associated with snapshots |
US11687267B2 (en) | 2020-04-14 | 2023-06-27 | Hewlett Packard Enterprise Development Lp | Containerized application manifests and virtual persistent volumes |
US11693573B2 (en) | 2020-06-18 | 2023-07-04 | Hewlett Packard Enterprise Development Lp | Relaying storage operation requests to storage systems using underlying volume identifiers |
US11755229B2 (en) * | 2020-06-25 | 2023-09-12 | EMC IP Holding Company LLC | Archival task processing in a data storage system |
US10860717B1 (en) * | 2020-07-01 | 2020-12-08 | Morgan Stanley Services Group Inc. | Distributed system for file analysis and malware detection |
US10990676B1 (en) * | 2020-07-01 | 2021-04-27 | Morgan Stanley Services Group Inc. | File collection method for subsequent malware detection |
US11481371B2 (en) | 2020-07-27 | 2022-10-25 | Hewlett Packard Enterprise Development Lp | Storage system capacity usage estimation |
US11960773B2 (en) | 2020-07-31 | 2024-04-16 | Hewlett Packard Enterprise Development Lp | Modifying virtual persistent volumes based on analysis of performance metrics |
US20220197944A1 (en) * | 2020-12-22 | 2022-06-23 | Netapp Inc. | File metadata service |
CN112699084B (en) * | 2021-01-06 | 2022-10-28 | 青岛海尔科技有限公司 | File cleaning method and device, storage medium and electronic device |
US11599506B1 (en) * | 2021-10-28 | 2023-03-07 | EMC IP Holding Company LLC | Source namespace and file copying |
US11481372B1 (en) * | 2022-04-13 | 2022-10-25 | Aleksei Neganov | Systems and methods for indexing multi-versioned data |
US12079177B2 (en) * | 2022-05-09 | 2024-09-03 | Netapp, Inc. | Object versioning support for a file system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120330904A1 (en) * | 2011-06-27 | 2012-12-27 | International Business Machines Corporation | Efficient file system object-based deduplication |
US8639917B1 (en) * | 2009-10-20 | 2014-01-28 | Vmware, Inc. | Streaming a desktop image over wide area networks in which the desktop image is segmented into a prefetch set of files, streaming set of files and leave-behind set of files |
US20140101113A1 (en) * | 2012-10-08 | 2014-04-10 | Symantec Corporation | Locality Aware, Two-Level Fingerprint Caching |
US8745003B1 (en) * | 2011-05-13 | 2014-06-03 | Emc Corporation | Synchronization of storage using comparisons of fingerprints of blocks |
US20140244599A1 (en) * | 2013-02-22 | 2014-08-28 | Symantec Corporation | Deduplication storage system with efficient reference updating and space reclamation |
US20150112939A1 (en) * | 2013-10-18 | 2015-04-23 | Solidfire, Inc. | Incremental block level backup |
Family Cites Families (208)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5379412A (en) | 1992-04-20 | 1995-01-03 | International Business Machines Corporation | Method and system for dynamic allocation of buffer storage space during backup copying |
JP3497886B2 (en) | 1994-05-10 | 2004-02-16 | 富士通株式会社 | Server data linking device |
US5574905A (en) | 1994-05-26 | 1996-11-12 | International Business Machines Corporation | Method and apparatus for multimedia editing and data recovery |
US5860107A (en) | 1996-10-07 | 1999-01-12 | International Business Machines Corporation | Processor and method for store gathering through merged store operations |
US6272492B1 (en) | 1997-11-21 | 2001-08-07 | Ibm Corporation | Front-end proxy for transparently increasing web server functionality |
US9292111B2 (en) | 1998-01-26 | 2016-03-22 | Apple Inc. | Gesturing with a multipoint sensing device |
US6205527B1 (en) | 1998-02-24 | 2001-03-20 | Adaptec, Inc. | Intelligent backup and restoring system and method for implementing the same |
US6122629A (en) | 1998-04-30 | 2000-09-19 | Compaq Computer Corporation | Filesystem data integrity in a single system image environment |
US6604236B1 (en) | 1998-06-30 | 2003-08-05 | Iora, Ltd. | System and method for generating file updates for files stored on read-only media |
US6233589B1 (en) | 1998-07-31 | 2001-05-15 | Novell, Inc. | Method and system for reflecting differences between two files |
GB2343768A (en) | 1998-08-17 | 2000-05-17 | Connected Place Limited | Merging a sequence of delta files |
WO2001006374A2 (en) | 1999-07-16 | 2001-01-25 | Intertrust Technologies Corp. | System and method for securing an untrusted storage |
AU2001229332A1 (en) | 2000-01-10 | 2001-07-24 | Connected Corporation | Administration of a differential backup system in a client-server environment |
US6651075B1 (en) | 2000-02-16 | 2003-11-18 | Microsoft Corporation | Support for multiple temporal snapshots of same volume |
WO2001082098A1 (en) | 2000-04-27 | 2001-11-01 | Fortress Technologies, Inc. | Network interface device having primary and backup interfaces for automatic dial backup upon loss of a primary connection and method of using same |
US6971018B1 (en) | 2000-04-28 | 2005-11-29 | Microsoft Corporation | File protection service for a computer system |
EP1168174A1 (en) | 2000-06-19 | 2002-01-02 | Hewlett-Packard Company, A Delaware Corporation | Automatic backup/recovery process |
US6950871B1 (en) | 2000-06-29 | 2005-09-27 | Hitachi, Ltd. | Computer system having a storage area network and method of handling data in the computer system |
US6918091B2 (en) | 2000-11-09 | 2005-07-12 | Change Tools, Inc. | User definable interface system, method and computer program product |
EP1374093B1 (en) | 2001-03-27 | 2013-07-03 | BRITISH TELECOMMUNICATIONS public limited company | File synchronisation |
US20030011638A1 (en) | 2001-07-10 | 2003-01-16 | Sun-Woo Chung | Pop-up menu system |
US7216135B2 (en) | 2002-02-15 | 2007-05-08 | International Business Machines Corporation | File system for providing access to a snapshot dataset where disk address in the inode is equal to a ditto address for indicating that the disk address is invalid disk address |
US6877048B2 (en) | 2002-03-12 | 2005-04-05 | International Business Machines Corporation | Dynamic memory allocation between inbound and outbound buffers in a protocol handler |
US7165154B2 (en) | 2002-03-18 | 2007-01-16 | Net Integration Technologies Inc. | System and method for data backup |
US7051050B2 (en) | 2002-03-19 | 2006-05-23 | Netwrok Appliance, Inc. | System and method for restoring a single file from a snapshot |
US7058656B2 (en) | 2002-04-11 | 2006-06-06 | Sun Microsystems, Inc. | System and method of using extensions in a data structure without interfering with applications unaware of the extensions |
US7058902B2 (en) | 2002-07-30 | 2006-06-06 | Microsoft Corporation | Enhanced on-object context menus |
US7024581B1 (en) | 2002-10-09 | 2006-04-04 | Xpoint Technologies, Inc. | Data processing recovery system and method spanning multiple operating system |
US7055010B2 (en) | 2002-11-06 | 2006-05-30 | Synology Inc. | Snapshot facility allowing preservation of chronological views on block drives |
JP2004171249A (en) | 2002-11-20 | 2004-06-17 | Hitachi Ltd | Backup execution decision method for database |
US7624143B2 (en) | 2002-12-12 | 2009-11-24 | Xerox Corporation | Methods, apparatus, and program products for utilizing contextual property metadata in networked computing environments |
US7809693B2 (en) | 2003-02-10 | 2010-10-05 | Netapp, Inc. | System and method for restoring data on demand for instant volume restoration |
US7320009B1 (en) | 2003-03-28 | 2008-01-15 | Novell, Inc. | Methods and systems for file replication utilizing differences between versions of files |
US7558927B2 (en) | 2003-05-06 | 2009-07-07 | Aptare, Inc. | System to capture, transmit and persist backup and recovery meta data |
US7657509B2 (en) | 2003-05-06 | 2010-02-02 | Aptare, Inc. | System to manage and store backup and recovery meta data |
US7328366B2 (en) | 2003-06-06 | 2008-02-05 | Cascade Basic Research Corp. | Method and system for reciprocal data backup |
US20050010835A1 (en) | 2003-07-11 | 2005-01-13 | International Business Machines Corporation | Autonomic non-invasive backup and storage appliance |
US7398285B2 (en) | 2003-07-30 | 2008-07-08 | International Business Machines Corporation | Apparatus and system for asynchronous replication of a hierarchically-indexed data store |
US20050193235A1 (en) | 2003-08-05 | 2005-09-01 | Miklos Sandorfi | Emulated storage system |
US7225208B2 (en) | 2003-09-30 | 2007-05-29 | Iron Mountain Incorporated | Systems and methods for backing up data files |
JP4267420B2 (en) | 2003-10-20 | 2009-05-27 | 株式会社日立製作所 | Storage apparatus and backup acquisition method |
US7188118B2 (en) | 2003-11-26 | 2007-03-06 | Veritas Operating Corporation | System and method for detecting file content similarity within a file system |
JP4319017B2 (en) | 2003-12-02 | 2009-08-26 | 株式会社日立製作所 | Storage system control method, storage system, and storage device |
EP1538536A1 (en) | 2003-12-05 | 2005-06-08 | Sony International (Europe) GmbH | Visualization and control techniques for multimedia digital content |
US20050152192A1 (en) | 2003-12-22 | 2005-07-14 | Manfred Boldy | Reducing occupancy of digital storage devices |
US7315965B2 (en) | 2004-02-04 | 2008-01-01 | Network Appliance, Inc. | Method and system for storing data using a continuous data protection system |
US7406488B2 (en) | 2004-02-04 | 2008-07-29 | Netapp | Method and system for maintaining data in a continuous data protection system |
US7966293B1 (en) | 2004-03-09 | 2011-06-21 | Netapp, Inc. | System and method for indexing a backup using persistent consistency point images |
US7277905B2 (en) | 2004-03-31 | 2007-10-02 | Microsoft Corporation | System and method for a consistency check of a database backup |
US7246258B2 (en) | 2004-04-28 | 2007-07-17 | Lenovo (Singapore) Pte. Ltd. | Minimizing resynchronization time after backup system failures in an appliance-based business continuance architecture |
US7266655B1 (en) | 2004-04-29 | 2007-09-04 | Veritas Operating Corporation | Synthesized backup set catalog |
US7356729B2 (en) | 2004-06-14 | 2008-04-08 | Lucent Technologies Inc. | Restoration of network element through employment of bootable image |
US20060013462A1 (en) | 2004-07-15 | 2006-01-19 | Navid Sadikali | Image display system and method |
US7389314B2 (en) | 2004-08-30 | 2008-06-17 | Corio, Inc. | Database backup, refresh and cloning system and method |
US7979404B2 (en) | 2004-09-17 | 2011-07-12 | Quest Software, Inc. | Extracting data changes and storing data history to allow for instantaneous access to and reconstruction of any point-in-time data |
JP4325524B2 (en) | 2004-09-29 | 2009-09-02 | 日本電気株式会社 | Switch device and system, backup and restore method and program |
US7546323B1 (en) | 2004-09-30 | 2009-06-09 | Emc Corporation | System and methods for managing backup status reports |
US7401192B2 (en) | 2004-10-04 | 2008-07-15 | International Business Machines Corporation | Method of replicating a file using a base, delta, and reference file |
WO2007089217A2 (en) | 2004-11-05 | 2007-08-09 | Kabushiki Kaisha Toshiba | Network discovery mechanisms |
US7814057B2 (en) | 2005-04-05 | 2010-10-12 | Microsoft Corporation | Page recovery using volume snapshots and logs |
US7693138B2 (en) | 2005-07-18 | 2010-04-06 | Broadcom Corporation | Method and system for transparent TCP offload with best effort direct placement of incoming traffic |
US20070038884A1 (en) | 2005-08-10 | 2007-02-15 | Spare Backup, Inc. | System and method of remote storage of data using client software |
US7743038B1 (en) | 2005-08-24 | 2010-06-22 | Lsi Corporation | Inode based policy identifiers in a filing system |
US8429630B2 (en) | 2005-09-15 | 2013-04-23 | Ca, Inc. | Globally distributed utility computing cloud |
US9063881B2 (en) * | 2010-04-26 | 2015-06-23 | Cleversafe, Inc. | Slice retrieval in accordance with an access sequence in a dispersed storage network |
US20070112895A1 (en) | 2005-11-04 | 2007-05-17 | Sun Microsystems, Inc. | Block-based incremental backup |
JP4546387B2 (en) | 2005-11-17 | 2010-09-15 | 富士通株式会社 | Backup system, method and program |
US7730425B2 (en) | 2005-11-30 | 2010-06-01 | De Los Reyes Isabelo | Function-oriented user interface |
US20070204153A1 (en) | 2006-01-04 | 2007-08-30 | Tome Agustin J | Trusted host platform |
US20070180207A1 (en) | 2006-01-18 | 2007-08-02 | International Business Machines Corporation | Secure RFID backup/restore for computing/pervasive devices |
US7667686B2 (en) | 2006-02-01 | 2010-02-23 | Memsic, Inc. | Air-writing and motion sensing input for portable devices |
US7676763B2 (en) | 2006-02-21 | 2010-03-09 | Sap Ag | Method and system for providing an outwardly expandable radial menu |
US20070208918A1 (en) | 2006-03-01 | 2007-09-06 | Kenneth Harbin | Method and apparatus for providing virtual machine backup |
US20070220029A1 (en) | 2006-03-17 | 2007-09-20 | Novell, Inc. | System and method for hierarchical storage management using shadow volumes |
JP4911576B2 (en) | 2006-03-24 | 2012-04-04 | 株式会社メガチップス | Information processing apparatus and write-once memory utilization method |
US7650369B2 (en) | 2006-03-30 | 2010-01-19 | Fujitsu Limited | Database system management method and database system |
US7552044B2 (en) | 2006-04-21 | 2009-06-23 | Microsoft Corporation | Simulated storage area network |
US7945726B2 (en) | 2006-05-08 | 2011-05-17 | Emc Corporation | Pre-allocation and hierarchical mapping of data blocks distributed from a first processor to a second processor for use in a file system |
US7653832B2 (en) | 2006-05-08 | 2010-01-26 | Emc Corporation | Storage array virtualization using a storage block mapping protocol client and server |
US8949312B2 (en) | 2006-05-25 | 2015-02-03 | Red Hat, Inc. | Updating clients from a server |
US7568124B2 (en) | 2006-06-02 | 2009-07-28 | Microsoft Corporation | Driving data backups with data source tagging |
US8302091B2 (en) | 2006-06-05 | 2012-10-30 | International Business Machines Corporation | Installation of a bootable image for modifying the operational environment of a computing system |
US7624134B2 (en) | 2006-06-12 | 2009-11-24 | International Business Machines Corporation | Enabling access to remote storage for use with a backup program |
US7873601B1 (en) | 2006-06-29 | 2011-01-18 | Emc Corporation | Backup of incremental metadata in block based backup systems |
JP2008015768A (en) | 2006-07-05 | 2008-01-24 | Hitachi Ltd | Storage system and data management method using the same |
US7783956B2 (en) | 2006-07-12 | 2010-08-24 | Cronera Systems Incorporated | Data recorder |
US20080027998A1 (en) | 2006-07-27 | 2008-01-31 | Hitachi, Ltd. | Method and apparatus of continuous data protection for NAS |
US7809688B2 (en) | 2006-08-04 | 2010-10-05 | Apple Inc. | Managing backup of content |
US7752487B1 (en) | 2006-08-08 | 2010-07-06 | Open Invention Network, Llc | System and method for managing group policy backup |
AU2007295949B2 (en) | 2006-09-12 | 2009-08-06 | Adams Consulting Group Pty. Ltd. | Method system and apparatus for handling information |
US8332442B1 (en) | 2006-09-26 | 2012-12-11 | Symantec Corporation | Automated restoration of links when restoring individual directory service objects |
US7769731B2 (en) | 2006-10-04 | 2010-08-03 | International Business Machines Corporation | Using file backup software to generate an alert when a file modification policy is violated |
US7832008B1 (en) | 2006-10-11 | 2010-11-09 | Cisco Technology, Inc. | Protection of computer resources |
US8117163B2 (en) | 2006-10-31 | 2012-02-14 | Carbonite, Inc. | Backup and restore system for a computer |
JP4459215B2 (en) | 2006-11-09 | 2010-04-28 | 株式会社ソニー・コンピュータエンタテインメント | GAME DEVICE AND INFORMATION PROCESSING DEVICE |
US7620765B1 (en) | 2006-12-15 | 2009-11-17 | Symantec Operating Corporation | Method to delete partial virtual tape volumes |
US20080154979A1 (en) | 2006-12-21 | 2008-06-26 | International Business Machines Corporation | Apparatus, system, and method for creating a backup schedule in a san environment based on a recovery plan |
WO2008085205A2 (en) | 2006-12-29 | 2008-07-17 | Prodea Systems, Inc. | System and method for providing network support services and premises gateway support infrastructure |
US8880480B2 (en) | 2007-01-03 | 2014-11-04 | Oracle International Corporation | Method and apparatus for data rollback |
US7647338B2 (en) | 2007-02-21 | 2010-01-12 | Microsoft Corporation | Content item query formulation |
US20080229050A1 (en) | 2007-03-13 | 2008-09-18 | Sony Ericsson Mobile Communications Ab | Dynamic page on demand buffer size for power savings |
US9497028B1 (en) * | 2007-05-03 | 2016-11-15 | Google Inc. | System and method for remote storage auditing |
US7974950B2 (en) | 2007-06-05 | 2011-07-05 | International Business Machines Corporation | Applying a policy criteria to files in a backup image |
US8010900B2 (en) | 2007-06-08 | 2011-08-30 | Apple Inc. | User interface for electronic backup |
US7631155B1 (en) | 2007-06-30 | 2009-12-08 | Emc Corporation | Thin provisioning of a file system and an iSCSI LUN through a common mechanism |
US8676273B1 (en) | 2007-08-24 | 2014-03-18 | Iwao Fujisaki | Communication device |
TW200917063A (en) | 2007-10-02 | 2009-04-16 | Sunonwealth Electr Mach Ind Co | Survey method for a patent searching result |
JP4412509B2 (en) | 2007-10-05 | 2010-02-10 | 日本電気株式会社 | Storage system capacity expansion control method |
US8117164B2 (en) | 2007-12-19 | 2012-02-14 | Microsoft Corporation | Creating and utilizing network restore points |
US9503354B2 (en) | 2008-01-17 | 2016-11-22 | Aerohive Networks, Inc. | Virtualization of networking services |
JP2009205333A (en) | 2008-02-27 | 2009-09-10 | Hitachi Ltd | Computer system, storage device, and data management method |
JP4481338B2 (en) | 2008-03-28 | 2010-06-16 | 株式会社日立製作所 | Backup system, storage device, and data backup method |
JP4413976B2 (en) | 2008-05-23 | 2010-02-10 | 株式会社東芝 | Information processing apparatus and version upgrade method for information processing apparatus |
US9038087B2 (en) * | 2008-06-18 | 2015-05-19 | Microsoft Technology Licensing, Llc | Fence elision for work stealing |
US20090319653A1 (en) | 2008-06-20 | 2009-12-24 | International Business Machines Corporation | Server configuration management method |
US8826181B2 (en) | 2008-06-28 | 2014-09-02 | Apple Inc. | Moving radial menus |
US8245156B2 (en) | 2008-06-28 | 2012-08-14 | Apple Inc. | Radial menu selection |
US8060476B1 (en) | 2008-07-14 | 2011-11-15 | Quest Software, Inc. | Backup systems and methods for a virtual computing environment |
US8103718B2 (en) * | 2008-07-31 | 2012-01-24 | Microsoft Corporation | Content discovery and transfer between mobile communications nodes |
US9177271B2 (en) | 2008-08-14 | 2015-11-03 | Hewlett-Packard Development Company, L.P. | Heterogeneous information technology (IT) infrastructure management orchestration |
US8117410B2 (en) | 2008-08-25 | 2012-02-14 | Vmware, Inc. | Tracking block-level changes using snapshots |
US8279174B2 (en) | 2008-08-27 | 2012-10-02 | Lg Electronics Inc. | Display device and method of controlling the display device |
WO2010036889A1 (en) * | 2008-09-25 | 2010-04-01 | Bakbone Software, Inc. | Remote backup and restore |
US8099572B1 (en) | 2008-09-30 | 2012-01-17 | Emc Corporation | Efficient backup and restore of storage objects in a version set |
US8495624B2 (en) | 2008-10-23 | 2013-07-23 | International Business Machines Corporation | Provisioning a suitable operating system environment |
US20100104105A1 (en) | 2008-10-23 | 2010-04-29 | Digital Cinema Implementation Partners, Llc | Digital cinema asset management system |
US20100114832A1 (en) * | 2008-10-31 | 2010-05-06 | Lillibridge Mark D | Forensic snapshot |
US20100179973A1 (en) | 2008-12-31 | 2010-07-15 | Herve Carruzzo | Systems, methods, and computer programs for delivering content via a communications network |
US9383897B2 (en) | 2009-01-29 | 2016-07-05 | International Business Machines Corporation | Spiraling radial menus in computer systems |
US8352717B2 (en) | 2009-02-09 | 2013-01-08 | Cs-Solutions, Inc. | Recovery system using selectable and configurable snapshots |
US8819113B2 (en) | 2009-03-02 | 2014-08-26 | Kaseya International Limited | Remote provisioning of virtual machines |
US8504785B1 (en) | 2009-03-10 | 2013-08-06 | Symantec Corporation | Method and apparatus for backing up to tape drives with minimum write speed |
US8370835B2 (en) | 2009-03-12 | 2013-02-05 | Arend Erich Dittmer | Method for dynamically generating a configuration for a virtual machine with a virtual hard disk in an external storage device |
US8099391B1 (en) | 2009-03-17 | 2012-01-17 | Symantec Corporation | Incremental and differential backups of virtual machine files |
US8260742B2 (en) * | 2009-04-03 | 2012-09-04 | International Business Machines Corporation | Data synchronization and consistency across distributed repositories |
JP5317807B2 (en) | 2009-04-13 | 2013-10-16 | 株式会社日立製作所 | File control system and file control computer used therefor |
US20100268689A1 (en) | 2009-04-15 | 2010-10-21 | Gates Matthew S | Providing information relating to usage of a simulated snapshot |
US8601389B2 (en) | 2009-04-30 | 2013-12-03 | Apple Inc. | Scrollable menus and toolbars |
US8200926B1 (en) * | 2009-05-28 | 2012-06-12 | Symantec Corporation | Methods and systems for creating full backups |
US8549432B2 (en) | 2009-05-29 | 2013-10-01 | Apple Inc. | Radial menus |
US8345707B2 (en) * | 2009-06-03 | 2013-01-01 | Voxer Ip Llc | Method for synchronizing data maintained at a plurality of nodes |
US8321688B2 (en) | 2009-06-12 | 2012-11-27 | Microsoft Corporation | Secure and private backup storage and processing for trusted computing and data services |
US8533608B1 (en) | 2009-06-29 | 2013-09-10 | Generation E Consulting | Run-book automation platform with actionable document |
US8457018B1 (en) * | 2009-06-30 | 2013-06-04 | Emc Corporation | Merkle tree reference counts |
US20100333116A1 (en) | 2009-06-30 | 2010-12-30 | Anand Prahlad | Cloud gateway system for managing data storage to cloud storage sites |
US8244914B1 (en) | 2009-07-31 | 2012-08-14 | Symantec Corporation | Systems and methods for restoring email databases |
JP2011039804A (en) | 2009-08-12 | 2011-02-24 | Hitachi Ltd | Backup management method based on failure contents |
US8209568B2 (en) | 2009-08-21 | 2012-06-26 | Novell, Inc. | System and method for implementing an intelligent backup technique for cluster resources |
US20110055471A1 (en) * | 2009-08-28 | 2011-03-03 | Jonathan Thatcher | Apparatus, system, and method for improved data deduplication |
US9094292B2 (en) | 2009-08-31 | 2015-07-28 | Accenture Global Services Limited | Method and system for providing access to computing resources |
US8335784B2 (en) | 2009-08-31 | 2012-12-18 | Microsoft Corporation | Visual search and three-dimensional results |
US8645647B2 (en) | 2009-09-02 | 2014-02-04 | International Business Machines Corporation | Data storage snapshot with reduced copy-on-write |
JP2013011919A (en) | 2009-09-17 | 2013-01-17 | Hitachi Ltd | Storage apparatus and snapshot control method of the same |
US8767593B1 (en) | 2009-10-13 | 2014-07-01 | Signal Perfection, Ltd. | Method for managing, scheduling, monitoring and controlling audio and video communication and data collaboration |
US8589913B2 (en) | 2009-10-14 | 2013-11-19 | Vmware, Inc. | Tracking block-level writes |
US8856080B2 (en) | 2009-10-30 | 2014-10-07 | Microsoft Corporation | Backup using metadata virtual hard drive and differential virtual hard drive |
US8296410B1 (en) | 2009-11-06 | 2012-10-23 | Carbonite, Inc. | Bandwidth management in a client/server environment |
US8572337B1 (en) | 2009-12-14 | 2013-10-29 | Symantec Corporation | Systems and methods for performing live backups |
US9465532B2 (en) | 2009-12-18 | 2016-10-11 | Synaptics Incorporated | Method and apparatus for operating in pointing and enhanced gesturing modes |
US8190574B2 (en) | 2010-03-02 | 2012-05-29 | Storagecraft Technology Corporation | Systems, methods, and computer-readable media for backup and restoration of computer information |
CA2794339C (en) | 2010-03-26 | 2017-02-21 | Carbonite, Inc. | Transfer of user data between logical data sites |
US8935212B2 (en) | 2010-03-29 | 2015-01-13 | Carbonite, Inc. | Discovery of non-standard folders for backup |
WO2011123089A1 (en) | 2010-03-29 | 2011-10-06 | Carbonite, Inc. | Managing backup sets based on user feedback |
US8037345B1 (en) | 2010-03-31 | 2011-10-11 | Emc Corporation | Deterministic recovery of a file system built on a thinly provisioned logical volume having redundant metadata |
US8224935B1 (en) * | 2010-05-12 | 2012-07-17 | Symantec Corporation | Systems and methods for efficiently synchronizing configuration data within distributed computing systems |
US9298563B2 (en) | 2010-06-01 | 2016-03-29 | Hewlett Packard Enterprise Development Lp | Changing a number of disk agents to backup objects to a storage device |
WO2011159284A1 (en) | 2010-06-15 | 2011-12-22 | Hewlett-Packard Development Company, L. P. | Volume management |
US8773370B2 (en) | 2010-07-13 | 2014-07-08 | Apple Inc. | Table editing systems with gesture-based insertion and deletion of columns and rows |
US20120065802A1 (en) | 2010-09-14 | 2012-03-15 | Joulex, Inc. | System and methods for automatic power management of remote electronic devices using a mobile device |
US8606752B1 (en) | 2010-09-29 | 2013-12-10 | Symantec Corporation | Method and system of restoring items to a database while maintaining referential integrity |
US9705730B1 (en) | 2013-05-07 | 2017-07-11 | Axcient, Inc. | Cloud storage using Merkle trees |
US8924360B1 (en) | 2010-09-30 | 2014-12-30 | Axcient, Inc. | Systems and methods for restoring a file |
US8589350B1 (en) | 2012-04-02 | 2013-11-19 | Axcient, Inc. | Systems, methods, and media for synthesizing views of file system backups |
US8954544B2 (en) | 2010-09-30 | 2015-02-10 | Axcient, Inc. | Cloud-based virtual machines and offices |
US10284437B2 (en) | 2010-09-30 | 2019-05-07 | Efolder, Inc. | Cloud-based virtual machines and offices |
US9235474B1 (en) | 2011-02-17 | 2016-01-12 | Axcient, Inc. | Systems and methods for maintaining a virtual failover volume of a target computing system |
JP5816424B2 (en) | 2010-10-05 | 2015-11-18 | 富士通株式会社 | Information processing device, tape device, and program |
US8904126B2 (en) | 2010-11-16 | 2014-12-02 | Actifio, Inc. | System and method for performing a plurality of prescribed data management functions in a manner that reduces redundant access operations to primary storage |
US8417674B2 (en) * | 2010-11-16 | 2013-04-09 | Actifio, Inc. | System and method for creating deduplicated copies of data by sending difference data between near-neighbor temporal states |
US8495262B2 (en) | 2010-11-23 | 2013-07-23 | International Business Machines Corporation | Using a table to determine if user buffer is marked copy-on-write |
US8635187B2 (en) | 2011-01-07 | 2014-01-21 | Symantec Corporation | Method and system of performing incremental SQL server database backups |
US8412680B1 (en) | 2011-01-20 | 2013-04-02 | Commvault Systems, Inc | System and method for performing backup operations and reporting the results thereof |
US9311324B2 (en) * | 2011-01-26 | 2016-04-12 | Mitre Corporation | Synchronizing data among a federation of servers with intermittent or low signal bandwidth |
US8510597B2 (en) | 2011-02-08 | 2013-08-13 | Wisconsin Alumni Research Foundation | Providing restartable file systems within computing devices |
US20120210398A1 (en) | 2011-02-14 | 2012-08-16 | Bank Of America Corporation | Enhanced Backup and Retention Management |
US8458137B2 (en) | 2011-02-22 | 2013-06-04 | Bank Of America Corporation | Backup and retention monitoring |
WO2012127476A1 (en) | 2011-03-21 | 2012-09-27 | Hewlett-Packard Development Company, L.P. | Data backup prioritization |
US8621274B1 (en) | 2011-05-18 | 2013-12-31 | Netapp Inc. | Virtual machine fault tolerance |
EP2724264B1 (en) * | 2011-06-23 | 2020-12-16 | Red Hat, Inc. | Client-based data replication |
US8966457B2 (en) | 2011-11-15 | 2015-02-24 | Global Supercomputing Corporation | Method and system for converting a single-threaded software program into an application-specific supercomputer |
AU2012347866A1 (en) * | 2011-12-05 | 2014-07-24 | Doyenz Incorporated | Universal pluggable cloud disaster recovery system |
US8600947B1 (en) | 2011-12-08 | 2013-12-03 | Symantec Corporation | Systems and methods for providing backup interfaces |
US20130166511A1 (en) | 2011-12-21 | 2013-06-27 | International Business Machines Corporation | Determining an overall assessment of a likelihood of a backup set resulting in a successful restore |
US9298715B2 (en) | 2012-03-07 | 2016-03-29 | Commvault Systems, Inc. | Data storage system utilizing proxy device for storage operations |
KR101930263B1 (en) * | 2012-03-12 | 2018-12-18 | 삼성전자주식회사 | Apparatus and method for managing contents in a cloud gateway |
US9183094B2 (en) | 2012-05-25 | 2015-11-10 | Symantec Corporation | Backup image duplication |
US20140089619A1 (en) * | 2012-09-27 | 2014-03-27 | Infinera Corporation | Object replication framework for a distributed computing environment |
US20140149358A1 (en) | 2012-11-29 | 2014-05-29 | Longsand Limited | Configuring computing devices using a template |
US9021452B2 (en) | 2012-12-27 | 2015-04-28 | Commvault Systems, Inc. | Automatic identification of storage requirements, such as for use in selling data storage management solutions |
US9336226B2 (en) | 2013-01-11 | 2016-05-10 | Commvault Systems, Inc. | Criteria-based data synchronization management |
US9031829B2 (en) | 2013-02-08 | 2015-05-12 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US9940069B1 (en) * | 2013-02-27 | 2018-04-10 | EMC IP Holding Company LLC | Paging cache for storage system |
US9110964B1 (en) * | 2013-03-05 | 2015-08-18 | Emc Corporation | Metadata optimization for network replication using differential encoding |
US9397907B1 (en) | 2013-03-07 | 2016-07-19 | Axcient, Inc. | Protection status determinations for computing devices |
US9292153B1 (en) | 2013-03-07 | 2016-03-22 | Axcient, Inc. | Systems and methods for providing efficient and focused visualization of data |
US20160110261A1 (en) | 2013-05-07 | 2016-04-21 | Axcient, Inc. | Cloud storage using merkle trees |
US9774410B2 (en) | 2014-06-10 | 2017-09-26 | PB, Inc. | Radiobeacon data sharing by forwarding low energy transmissions to a cloud host |
US9954946B2 (en) * | 2015-11-24 | 2018-04-24 | Netapp, Inc. | Directory level incremental replication |
-
2013
- 2013-05-07 US US13/889,164 patent/US9705730B1/en active Active
-
2015
- 2015-09-24 US US14/864,850 patent/US20170090786A1/en not_active Abandoned
- 2015-12-21 US US14/977,614 patent/US20190108103A9/en not_active Abandoned
-
2017
- 2017-05-18 US US15/599,417 patent/US10599533B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8639917B1 (en) * | 2009-10-20 | 2014-01-28 | Vmware, Inc. | Streaming a desktop image over wide area networks in which the desktop image is segmented into a prefetch set of files, streaming set of files and leave-behind set of files |
US8745003B1 (en) * | 2011-05-13 | 2014-06-03 | Emc Corporation | Synchronization of storage using comparisons of fingerprints of blocks |
US20120330904A1 (en) * | 2011-06-27 | 2012-12-27 | International Business Machines Corporation | Efficient file system object-based deduplication |
US20140101113A1 (en) * | 2012-10-08 | 2014-04-10 | Symantec Corporation | Locality Aware, Two-Level Fingerprint Caching |
US20140244599A1 (en) * | 2013-02-22 | 2014-08-28 | Symantec Corporation | Deduplication storage system with efficient reference updating and space reclamation |
US20150112939A1 (en) * | 2013-10-18 | 2015-04-23 | Solidfire, Inc. | Incremental block level backup |
Cited By (77)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11016859B2 (en) | 2008-06-24 | 2021-05-25 | Commvault Systems, Inc. | De-duplication systems and methods for application-specific data |
US10540327B2 (en) | 2009-07-08 | 2020-01-21 | Commvault Systems, Inc. | Synchronized data deduplication |
US11288235B2 (en) | 2009-07-08 | 2022-03-29 | Commvault Systems, Inc. | Synchronized data deduplication |
US10126973B2 (en) | 2010-09-30 | 2018-11-13 | Commvault Systems, Inc. | Systems and methods for retaining and using data block signatures in data protection operations |
US10284437B2 (en) | 2010-09-30 | 2019-05-07 | Efolder, Inc. | Cloud-based virtual machines and offices |
US9898225B2 (en) | 2010-09-30 | 2018-02-20 | Commvault Systems, Inc. | Content aligned block-based deduplication |
US9898478B2 (en) * | 2010-12-14 | 2018-02-20 | Commvault Systems, Inc. | Distributed deduplicated storage system |
US10191816B2 (en) | 2010-12-14 | 2019-01-29 | Commvault Systems, Inc. | Client-side repository in a networked deduplicated storage system |
US11169888B2 (en) | 2010-12-14 | 2021-11-09 | Commvault Systems, Inc. | Client-side repository in a networked deduplicated storage system |
US20150205815A1 (en) * | 2010-12-14 | 2015-07-23 | Commvault Systems, Inc. | Distributed deduplicated storage system |
US10740295B2 (en) | 2010-12-14 | 2020-08-11 | Commvault Systems, Inc. | Distributed deduplicated storage system |
US11422976B2 (en) | 2010-12-14 | 2022-08-23 | Commvault Systems, Inc. | Distributed deduplicated storage system |
US9858156B2 (en) | 2012-06-13 | 2018-01-02 | Commvault Systems, Inc. | Dedicated client-side signature generator in a networked storage system |
US10176053B2 (en) | 2012-06-13 | 2019-01-08 | Commvault Systems, Inc. | Collaborative restore in a networked storage system |
US10387269B2 (en) | 2012-06-13 | 2019-08-20 | Commvault Systems, Inc. | Dedicated client-side signature generator in a networked storage system |
US10956275B2 (en) | 2012-06-13 | 2021-03-23 | Commvault Systems, Inc. | Collaborative restore in a networked storage system |
US9785647B1 (en) | 2012-10-02 | 2017-10-10 | Axcient, Inc. | File system virtualization |
US9852140B1 (en) | 2012-11-07 | 2017-12-26 | Axcient, Inc. | Efficient file replication |
US11169714B1 (en) | 2012-11-07 | 2021-11-09 | Efolder, Inc. | Efficient file replication |
US10229133B2 (en) | 2013-01-11 | 2019-03-12 | Commvault Systems, Inc. | High availability distributed deduplicated storage system |
US11157450B2 (en) | 2013-01-11 | 2021-10-26 | Commvault Systems, Inc. | High availability distributed deduplicated storage system |
US10003646B1 (en) | 2013-03-07 | 2018-06-19 | Efolder, Inc. | Protection status determinations for computing devices |
US9998344B2 (en) | 2013-03-07 | 2018-06-12 | Efolder, Inc. | Protection status determinations for computing devices |
US10599533B2 (en) | 2013-05-07 | 2020-03-24 | Efolder, Inc. | Cloud storage using merkle trees |
US9705730B1 (en) | 2013-05-07 | 2017-07-11 | Axcient, Inc. | Cloud storage using Merkle trees |
US11188504B2 (en) | 2014-03-17 | 2021-11-30 | Commvault Systems, Inc. | Managing deletions from a deduplication database |
US10445293B2 (en) | 2014-03-17 | 2019-10-15 | Commvault Systems, Inc. | Managing deletions from a deduplication database |
US10380072B2 (en) | 2014-03-17 | 2019-08-13 | Commvault Systems, Inc. | Managing deletions from a deduplication database |
US11119984B2 (en) | 2014-03-17 | 2021-09-14 | Commvault Systems, Inc. | Managing deletions from a deduplication database |
US10474638B2 (en) | 2014-10-29 | 2019-11-12 | Commvault Systems, Inc. | Accessing a file system using tiered deduplication |
US11113246B2 (en) | 2014-10-29 | 2021-09-07 | Commvault Systems, Inc. | Accessing a file system using tiered deduplication |
US11921675B2 (en) | 2014-10-29 | 2024-03-05 | Commvault Systems, Inc. | Accessing a file system using tiered deduplication |
US9934238B2 (en) | 2014-10-29 | 2018-04-03 | Commvault Systems, Inc. | Accessing a file system using tiered deduplication |
US10339106B2 (en) | 2015-04-09 | 2019-07-02 | Commvault Systems, Inc. | Highly reusable deduplication database after disaster recovery |
US11301420B2 (en) | 2015-04-09 | 2022-04-12 | Commvault Systems, Inc. | Highly reusable deduplication database after disaster recovery |
US10481825B2 (en) | 2015-05-26 | 2019-11-19 | Commvault Systems, Inc. | Replication using deduplicated secondary copy data |
US10481826B2 (en) | 2015-05-26 | 2019-11-19 | Commvault Systems, Inc. | Replication using deduplicated secondary copy data |
US10481824B2 (en) | 2015-05-26 | 2019-11-19 | Commvault Systems, Inc. | Replication using deduplicated secondary copy data |
US10877856B2 (en) | 2015-12-30 | 2020-12-29 | Commvault Systems, Inc. | System for redirecting requests after a secondary storage computing device failure |
US10592357B2 (en) | 2015-12-30 | 2020-03-17 | Commvault Systems, Inc. | Distributed file system in a distributed deduplication data storage system |
US10255143B2 (en) | 2015-12-30 | 2019-04-09 | Commvault Systems, Inc. | Deduplication replication in a distributed deduplication data storage system |
US10310953B2 (en) | 2015-12-30 | 2019-06-04 | Commvault Systems, Inc. | System for redirecting requests after a secondary storage computing device failure |
US10061663B2 (en) | 2015-12-30 | 2018-08-28 | Commvault Systems, Inc. | Rebuilding deduplication data in a distributed deduplication data storage system |
US10956286B2 (en) | 2015-12-30 | 2021-03-23 | Commvault Systems, Inc. | Deduplication replication in a distributed deduplication data storage system |
US10904338B2 (en) | 2016-03-22 | 2021-01-26 | International Business Machines Corporation | Identifying data for deduplication in a network storage environment |
US10574751B2 (en) * | 2016-03-22 | 2020-02-25 | International Business Machines Corporation | Identifying data for deduplication in a network storage environment |
US20180109501A1 (en) * | 2016-10-17 | 2018-04-19 | Microsoft Technology Licensing, Llc | Migration containers |
US10673823B2 (en) * | 2016-10-17 | 2020-06-02 | Microsoft Technology Licensing, Llc | Migration containers |
US11010485B1 (en) * | 2017-03-02 | 2021-05-18 | Apple Inc. | Cloud messaging system |
US12001579B1 (en) | 2017-03-02 | 2024-06-04 | Apple Inc. | Cloud messaging system |
CN109408279A (en) * | 2017-08-16 | 2019-03-01 | 北京京东尚科信息技术有限公司 | Data back up method and device |
US11620065B2 (en) | 2017-10-24 | 2023-04-04 | Bottomline Technologies Limited | Variable length deduplication of stored data |
EP3477462A3 (en) * | 2017-10-24 | 2019-06-12 | Bottomline Technologies (DE), Inc. | Tenant aware, variable length, deduplication of stored data |
US10282129B1 (en) | 2017-10-24 | 2019-05-07 | Bottomline Technologies (De), Inc. | Tenant aware, variable length, deduplication of stored data |
US11194497B2 (en) | 2017-10-24 | 2021-12-07 | Bottomline Technologies, Inc. | Variable length deduplication of stored data |
US10884643B2 (en) | 2017-10-24 | 2021-01-05 | Bottomline Technologies Limited | Variable length deduplication of stored data |
US11604583B2 (en) | 2017-11-28 | 2023-03-14 | Pure Storage, Inc. | Policy based data tiering |
US11392553B1 (en) | 2018-04-24 | 2022-07-19 | Pure Storage, Inc. | Remote data management |
US11436344B1 (en) | 2018-04-24 | 2022-09-06 | Pure Storage, Inc. | Secure encryption in deduplication cluster |
US12067131B2 (en) | 2018-04-24 | 2024-08-20 | Pure Storage, Inc. | Transitioning leadership in a cluster of nodes |
US10671370B2 (en) * | 2018-05-30 | 2020-06-02 | Red Hat, Inc. | Distributing file system states |
CN109614036A (en) * | 2018-11-16 | 2019-04-12 | 新华三技术有限公司成都分公司 | The dispositions method and device of memory space |
US11010258B2 (en) | 2018-11-27 | 2021-05-18 | Commvault Systems, Inc. | Generating backup copies through interoperability between components of a data storage management system and appliances for data storage and deduplication |
US11681587B2 (en) | 2018-11-27 | 2023-06-20 | Commvault Systems, Inc. | Generating copies through interoperability between a data storage management system and appliances for data storage and deduplication |
US11698727B2 (en) | 2018-12-14 | 2023-07-11 | Commvault Systems, Inc. | Performing secondary copy operations based on deduplication performance |
US12067242B2 (en) | 2018-12-14 | 2024-08-20 | Commvault Systems, Inc. | Performing secondary copy operations based on deduplication performance |
US11829251B2 (en) | 2019-04-10 | 2023-11-28 | Commvault Systems, Inc. | Restore using deduplicated secondary copy data |
US11463264B2 (en) | 2019-05-08 | 2022-10-04 | Commvault Systems, Inc. | Use of data block signatures for monitoring in an information management system |
US11681586B2 (en) * | 2019-06-28 | 2023-06-20 | Rubrik, Inc. | Data management system with limited control of external compute and storage resources |
US20230273864A1 (en) * | 2019-06-28 | 2023-08-31 | Rubrik, Inc. | Data management system with limited control of external compute and storage resources |
US20200409796A1 (en) * | 2019-06-28 | 2020-12-31 | Rubrik, Inc. | Data management system with limited control of external compute and storage resources |
US11914554B2 (en) | 2019-06-28 | 2024-02-27 | Rubrik, Inc. | Adaptable multi-layered storage for deduplicating electronic messages |
US11675741B2 (en) | 2019-06-28 | 2023-06-13 | Rubrik, Inc. | Adaptable multi-layered storage for deduplicating electronic messages |
US10921987B1 (en) * | 2019-07-31 | 2021-02-16 | EMC IP Holding Company LLC | Deduplication of large block aggregates using representative block digests |
US11442896B2 (en) | 2019-12-04 | 2022-09-13 | Commvault Systems, Inc. | Systems and methods for optimizing restoration of deduplicated data stored in cloud-based storage resources |
US11868214B1 (en) * | 2020-02-02 | 2024-01-09 | Veritas Technologies Llc | Methods and systems for affinity aware container prefetching |
US11687424B2 (en) | 2020-05-28 | 2023-06-27 | Commvault Systems, Inc. | Automated media agent state management |
Also Published As
Publication number | Publication date |
---|---|
US20190108103A9 (en) | 2019-04-11 |
US10599533B2 (en) | 2020-03-24 |
US9705730B1 (en) | 2017-07-11 |
US20170257254A1 (en) | 2017-09-07 |
US20170177452A1 (en) | 2017-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20170090786A1 (en) | Distributed and Deduplicating Data Storage System and Methods of Use | |
US9110603B2 (en) | Identifying modified chunks in a data set for storage | |
US11080232B2 (en) | Backup and restoration for a deduplicated file system | |
CN111492354B (en) | Database metadata in non-volatile storage | |
US8874532B2 (en) | Managing dereferenced chunks in a deduplication system | |
US9575978B2 (en) | Restoring objects in a client-server environment | |
US9305005B2 (en) | Merging entries in a deduplication index | |
US8396841B1 (en) | Method and system of multi-level and multi-mode cloud-based deduplication | |
US9910906B2 (en) | Data synchronization using redundancy detection | |
US20160110261A1 (en) | Cloud storage using merkle trees | |
US9262431B2 (en) | Efficient data deduplication in a data storage network | |
US10754731B1 (en) | Compliance audit logging based backup | |
US10284433B2 (en) | Data synchronization using redundancy detection | |
KR20130120516A (en) | Content based file chunking | |
JP2015525419A (en) | Advanced data management virtualization system | |
US11474733B2 (en) | Public cloud provider cost optimization for writing data blocks directly to object storage | |
US9749193B1 (en) | Rule-based systems for outcome-based data protection | |
CN103067519A (en) | Method and device of data distribution storage under heterogeneous platform | |
US11392868B1 (en) | Data retention cost control for data written directly to object storage | |
US11093342B1 (en) | Efficient deduplication of compressed files | |
US9971797B1 (en) | Method and system for providing clustered and parallel data mining of backup data | |
US10108647B1 (en) | Method and system for providing instant access of backup data | |
US9830471B1 (en) | Outcome-based data protection using multiple data protection systems | |
US20170124107A1 (en) | Data deduplication storage system and process | |
US20210342301A1 (en) | Filesystem managing metadata operations corresponding to a file in another filesystem |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: AXCIENT, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PARAB, NITIN;BROWN, AARON;VAN DYCK, DANE;AND OTHERS;REEL/FRAME:036983/0436 Effective date: 20150924 |
|
AS | Assignment |
Owner name: STRUCTURED ALPHA LP, CANADA Free format text: SECURITY INTEREST;ASSIGNOR:AXCIENT, INC.;REEL/FRAME:042542/0364 Effective date: 20170530 |
|
AS | Assignment |
Owner name: SILVER LAKE WATERMAN FUND, L.P., CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:AXCIENT, INC.;REEL/FRAME:042577/0901 Effective date: 20170530 |
|
AS | Assignment |
Owner name: AXCIENT, INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:SILVER LAKE WATERMAN FUND, L.P.;REEL/FRAME:043106/0389 Effective date: 20170726 |
|
AS | Assignment |
Owner name: AXCIENT, INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:STRUCTURED ALPHA LP;REEL/FRAME:043840/0227 Effective date: 20171011 |
|
AS | Assignment |
Owner name: AXCI (AN ABC) LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AXCIENT, INC.;REEL/FRAME:044367/0507 Effective date: 20170726 Owner name: AXCIENT HOLDINGS, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AXCI (AN ABC) LLC;REEL/FRAME:044368/0556 Effective date: 20170726 Owner name: EFOLDER, INC., GEORGIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AXCIENT HOLDINGS, LLC;REEL/FRAME:044370/0412 Effective date: 20170901 |
|
AS | Assignment |
Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:EFOLDER, INC.;REEL/FRAME:044563/0633 Effective date: 20160725 Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT, Free format text: SECURITY INTEREST;ASSIGNOR:EFOLDER, INC.;REEL/FRAME:044563/0633 Effective date: 20160725 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: MUFG UNION BANK, N.A., ARIZONA Free format text: SECURITY INTEREST;ASSIGNOR:EFOLDER, INC.;REEL/FRAME:061559/0703 Effective date: 20221027 |
|
AS | Assignment |
Owner name: EFOLDER, INC., COLORADO Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION;REEL/FRAME:061634/0623 Effective date: 20221027 |
|
AS | Assignment |
Owner name: EFOLDER, INC., COLORADO Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:U.S. BANK NATIONAL ASSOCIATION FORMERLY MUFG UNION BANK, N.A.;REEL/FRAME:068680/0802 Effective date: 20240919 |