US20170090786A1 - Distributed and Deduplicating Data Storage System and Methods of Use - Google Patents

Distributed and Deduplicating Data Storage System and Methods of Use Download PDF

Info

Publication number
US20170090786A1
US20170090786A1 US14/864,850 US201514864850A US2017090786A1 US 20170090786 A1 US20170090786 A1 US 20170090786A1 US 201514864850 A US201514864850 A US 201514864850A US 2017090786 A1 US2017090786 A1 US 2017090786A1
Authority
US
United States
Prior art keywords
chunks
signature
data stream
input data
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/864,850
Inventor
Nitin Parab
Aaron Brown
Dane Van Dyck
Sagar Dixit
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Axci (an Abc) LLC
Efolder Inc
Original Assignee
Axcient Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Axcient Inc filed Critical Axcient Inc
Assigned to AXCIENT, INC. reassignment AXCIENT, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BROWN, AARON, DIXIT, SAGAR, PARAB, NITIN, VAN DYCK, DANE
Priority to US14/977,614 priority Critical patent/US20190108103A9/en
Priority to US15/360,836 priority patent/US10284437B2/en
Publication of US20170090786A1 publication Critical patent/US20170090786A1/en
Assigned to STRUCTURED ALPHA LP reassignment STRUCTURED ALPHA LP SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AXCIENT, INC.
Assigned to SILVER LAKE WATERMAN FUND, L.P. reassignment SILVER LAKE WATERMAN FUND, L.P. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AXCIENT, INC.
Assigned to AXCIENT, INC. reassignment AXCIENT, INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: SILVER LAKE WATERMAN FUND, L.P.
Assigned to AXCIENT, INC. reassignment AXCIENT, INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: STRUCTURED ALPHA LP
Assigned to AXCIENT HOLDINGS, LLC reassignment AXCIENT HOLDINGS, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AXCI (AN ABC) LLC
Assigned to AXCI (AN ABC) LLC reassignment AXCI (AN ABC) LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AXCIENT, INC.
Assigned to EFOLDER, INC. reassignment EFOLDER, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AXCIENT HOLDINGS, LLC
Assigned to WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT reassignment WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: EFOLDER, INC.
Assigned to MUFG UNION BANK, N.A. reassignment MUFG UNION BANK, N.A. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: EFOLDER, INC.
Assigned to EFOLDER, INC. reassignment EFOLDER, INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: WELLS FARGO BANK, NATIONAL ASSOCIATION
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1471Saving, restoring, recovering or retrying involving logging of persistent data for recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1453Management of the data involved in backup or backup restore using de-duplication of the data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/128Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/84Using snapshots, i.e. a logical point-in-time copy of the data

Definitions

  • the present technology may be generally described as providing systems and methods for distributing and deduplicating data storage.
  • Creating large backup data stores that are efficient in terms of data storage and data retrieval are complex processes, especially for systems that store petabytes of data or greater. Additional complexities are introduced when these large backup data stores use deduplication, such as when only unique data blocks are stored. Additionally, backup data stores that use deduplication are not currently suitable for storing data using, for example, distributed hash tables (“DHT”) as the DHT may destroy the locality of the data and the index used to track the data as it is distributed to the data store.
  • DHT distributed hash tables
  • the present technology may be directed to methods that comprise: (a) generating a signature for at least a portion of an input data stream, the signature including a representation of data included in the input data stream; (b) comparing the signature to signatures of data included in a deduplicated backup data store; (c) selecting a signature based upon the step of comparing the signature to signatures of data included in a deduplicated backup data store; (d) comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream; and (e) distributing the unique data to the deduplicated backup data store.
  • the present technology may be directed to methods that comprise: (a) receiving an input data stream; (b) segmenting the input data stream into chunks; (c) creating a signature for each of the chunks; (d) distributing each chunk to one of a plurality of containers, each container comprising a container identifier; and (e) creating a locality index that includes a mapping of a chunk signature and a container identifier.
  • the present technology may be directed to systems that comprise: (a) a processor; (b) logic encoded in one or more tangible media for execution by the processor and when executed operable to perform operations comprising: (i) generating a signature for at least a portion of an input data stream, the signature including a representation of data included in the input data stream; (ii) comparing the signature to signatures of data included in a deduplicated backup data store; (iii) selecting a signature based upon the step of comparing the signature to signatures of data included in a deduplicated backup data store; (iv) comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream; and (v) distributing the unique data to the deduplicated backup data store.
  • the present technology may be directed to a non-transitory machine-readable storage medium having embodied thereon a program.
  • the program may be executed by a machine to perform a method.
  • the method may comprise: (a) generating a signature for at least a portion of an input data stream, the signature including a representation of data included in the input data stream; (b) comparing the signature to signatures of data included in a deduplicated backup data store; (c) selecting a signature based upon the step of comparing the signature to signatures of data included in a deduplicated backup data store; (d) comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream; and (e) distributing the unique data to the deduplicated backup data store.
  • the present technology may be directed to methods that comprise: (a) receiving an input data stream; (b) separating the input data stream into chunks; (c) performing one or more of an exact and an approximate matching of the chunks of the input data stream to chunks stored in a deduplicated backup data store to determine unique chunks; (d) determining one or more locations in the deduplicated backup data store for the unique chunks; (e) updating an index to include the unique chunks with their locations; and (f) distributing the unique chunks to the deduplicated backup data store according to the index.
  • FIG. 1 is a block diagram of an exemplary architecture in which embodiments of the present technology may be practiced
  • FIG. 2 is a flowchart of an exemplary method of exact matching of chunks of data to determine unique chunks
  • FIG. 3 is a flowchart of an exemplary method for providing a distributed and deduplicated data store
  • FIG. 4 is a flowchart of an example method of the present technology.
  • FIG. 5 is another example method of the present technology for storing input streams from two separate file modification operations of a client.
  • FIG. 6 illustrates an exemplary computing system that may be used to implement embodiments according to the present technology.
  • some storage systems may deduplicate block storage, where only unique data blocks are stored. This allows the system to reduce the overall amount of data blocks stored compared to systems that store complete data sets.
  • each backup e.g., snapshot or mirror
  • each backup taken of a physical system must be stored in order to allow the physical system to be restored back to a given point in time in the past, as described above.
  • DHT distributed hash tables
  • chunks may be distributed into a data storage cloud.
  • each block of data may be hashed to form the index key for a DHT and the data itself is stored as the value of the key.
  • the combination of data blocks and hash values are used to create a DHT. While the effectiveness of the methods and systems described herein may be advantageously leveraged within systems or processes that use DHTs, the present technology is not limited to these types of systems and processes. Thus, descriptions of DHTs included herein are merely provided as an exemplary use of the present technology.
  • DHTs While storage of data using DHTs can be effective in load balancing IO load across distributed nodes, unfortunately, when a DHT is used the temporal locality of the data is not maintained spatially on the disk. This is, in part, due to the fact that DHTs use the hash of the data to determine the location of the data and cryptographic hashes are by design random. For example, when multiple snapshots of a physical system are taken over time, random operations are performed on the snapshots when DHTs are used. These random operations are inefficient when compared to sequential operations. In short, DHTs are less than optimal for building deduplicated storage systems. That is, deduplicated storage systems rely on the maintaining temporal locality of the data spatially on the disk.
  • locality can be described in terms of temporality or space. For example, if a user modifies multiple files at the same time, it will be understood or assumed that the modification of these files is related to one another. By way of example, the user could be updating multiple spreadsheets within a given period of time. These spreadsheets may all be related to the same project or task that the user is working on. These file changes can be transmitted over the network efficiently in an input stream. The present technology will store these changes spatially together on the backup store, but their spatial proximity to one another on the backup store is due to their temporal adjacency relating to how they are used.
  • a DHT may randomly distribute the changes to the files anywhere in the backup store, which increases data fragmentation and slows down retrieval.
  • the backup store when one file is requested from the backup store, the backup store will automatically pre-fetch the files that were determined to be changed at the same time the requested file. Again, this benefit is possible because temporal locality (context) is determined and maintained. Even if the user does not utilize the additional files, the likelihood that they may be utilized is sufficient to justify pre-fetching the files in anticipation of use.
  • these processes greatly improve file retrieval and replication methods of backup stores.
  • the index created for the blocks of the changed files also maintains context and locality due to the manner in which it is created.
  • the updates to the index occur temporally when changes are transferred to the backup store.
  • Architecture 100 may include a deduplicated backup data store 105 ; hereinafter “data store 105 .”
  • the data store 105 may be implemented within a cloud-based computing environment.
  • a cloud-based computing environment is a resource that typically combines the computational power of a large model of processors and/or that combines the storage capacity of a large model of computer memories or storage devices.
  • systems that provide a cloud resource may be utilized exclusively by their owners; or such systems may be accessible to outside users who deploy applications within the computing infrastructure to obtain the benefit of large computational or storage resources.
  • the cloud may be formed, for example, by a network of servers, with each server (or at least a plurality thereof) providing processor and/or storage resources. These servers may manage workloads provided by multiple users (e.g., cloud resource consumers or other users). Typically, each user places workload demands upon the cloud that vary in real-time, sometimes dramatically. The nature and extent of these variations typically depend on the type of business associated with the user.
  • the data store 105 may include a block store 115 that stores unique blocks of data for one or more objects, such as a file, a group of files, or an entire disk.
  • the block store 115 may comprise a plurality of containers 120 a - n , which are utilized to store data chunks that are separated from the input data stream, as will be described in greater detail below.
  • the term “container” may also be referred to as an “extent.”
  • objects written to the block store 115 are immutable.
  • a new object identifier may be generated and provided back to the object owner.
  • the responsibility of implementing a traditional interface where object identifiers do not change on update is facilitated by the application/client.
  • the data store 105 may provide ‘mutable’ metadata storage where the client/application can manage immutable objects which are mapped to mutable object identifiers and other application specific metadata.
  • the block store 115 may include immutable object addressable block storage.
  • the block store 115 may form an underlying storage foundation that allows for the storing of blocks of objects.
  • the identifiers of the blocks are a unique representation of the object, generated for example by using an SHA1 hash function.
  • the present technology may also use other cryptographic hash functions that would be known to one of ordinary skill in the art with the present disclosure before them.
  • the architecture 100 may include a deduplication system, hereinafter referred to as system 125 that provides distributed and deduplicated data storage.
  • the system 125 receives input data streams from a client device 130 .
  • an input data stream may include a snapshot or an incremental file for the client device 130 .
  • the client device may include an end user computing system, an appliance, such as a backup appliance, a server, or any other computing device that may include objects such as files, directories, disks, and so forth.
  • the API may encapsulate messages and their respective operations, allowing for efficient writing of objects over a network, such as network 135 .
  • the network 135 may comprise a local area network (“LAN”), a wide area network (“WAN”), or any other private or public network, such as the Internet.
  • the system 125 may divide or separate an input data stream into a plurality of chunks, also referred to as blocks, segments, pieces, and so forth. Any method for separating the input data stream into chunks that would be known to one of ordinary skill in the art may also likewise be utilized in accordance with the present technology.
  • containers 120 a - n which may also be referred to as blobs.
  • Containers 120 a - n may be filled with chunks, which are received sequentially around the same time thus maintaining temporal locality also spatial locality within the same container.
  • each of the chunks may be encrypted or otherwise hashed so as to create a unique identifier for the chunk of data.
  • a chunk may be hashed using SHA1 to produce a SHA1 key value for the chunk.
  • the input data stream may arrive at the system 125 in an already-chunked manner.
  • each of the hashed chunk values may be incorporated by the system 125 into Merkel nodes and the Merkel nodes may be arrange into a Merkel tree at the data store 105 .
  • the system 125 may generate a signature for each extent using other technologies than cryptographic hashing functions.
  • the signature is a representation of the data included in the extent.
  • the system 125 may apply an algorithm that is similar to an algorithm used for facial recognition. For example, in facial recognition, a signature for a face of an individual included in an image file may be generated. This signature may be compare facial signatures in other image files to determine if facial signatures included these additional image files corresponds to the facial signature of the individual.
  • the “signature” is a mathematical representation of the unique facial features of the individual. These unique facial features convert into unique mathematical values that may be used to locate the individual in other image files.
  • extents include data chunks that can be distinguished from other chunks on the basis of unique data features.
  • a signature for an extent would include mathematical representations of these unique features such that comparing a signature for the extent to other signatures of other extents may allow for the system 125 to determine similar or dissimilar extents.
  • chunks are placed sequentially (in order received relative to the input stream) into containers 120 a - n and each chunk is provided with a unique identifier, such as a hash value, locality of the chunks may be maintained.
  • a locality index may be managed by the system 125 that maps each chunk to its corresponding container based upon the chunk identifier.
  • locality of data chunks is a function of the order in which the chunks are received, as well as the chunk identifiers used to distinguish chunks from one another.
  • the locality index may comprise a sparse index when the locality index becomes too large and cumbersome to maintain in memory.
  • the sparse index may map only the chunk signature with a container identifier.
  • the system 125 may split the locality index into chunks and these chunks may also be stored in the containers, along with the chunks created from the input stream.
  • system 125 may also manage a container index for each container that provides an exact or approximate location for each chunk within the container.
  • the index may specify the offset and length of each chunk within the container.
  • the system 125 may also separate the subsequent input streams into chunks and generate signatures for these chunks.
  • signatures for chunks of a subsequent input data stream are compared to signatures for chunks of a previous input data stream, differences deduced by the system 125 in these signatures may indicate that data in a particular chunk has changed.
  • the system 125 may then obtain these changed chunks and store data from these changed chunks in the data store 105 .
  • the ability for the system 125 to recognize changed data allows the system 125 to store only unique data in the data store 105 (e.g., deduplicated data).
  • the system 125 may employ either exact or approximated deduplication methods. In some instances, the system 125 may also use approximated deduplication methods initially, followed by a more robust exact matching deduplication method at a later time, as a means of verification.
  • the system 125 may compare the signature of an extent to signature for similar extents store in the data store 105 . Any difference in signatures between similar extents for the same object such as a file, indicate that the data of the object has changed.
  • the system 125 may establish rules that allow the system 125 to quickly process input data streams to determine if unique data blocks exist in the input data stream. If the comparison between signatures indicates that the input data stream is not likely to include unique data, the system 125 may ignore the input data stream. Conversely, if the comparison between signatures indicates that the input data stream is not likely to include unique data, the system 125 may further examine the input data stream to determine which chunks of data have changed.
  • the system 125 may also process the input data stream using the exact deduplication method described below.
  • the system 125 may compare signatures of chunks of an input data stream to node signatures of similar chunks stored in the data store 105 .
  • the system 125 may check matches at the chunk or extent level using hash values associated with chunks. That is, each block or chunk of data included in an extent may be associated with its own signature of identifier.
  • the chunk may include a unique hash value of the data included in a particular chunk of data. Any change in data of a chunk will change the hash value of the chunk.
  • the system 125 can use the comparison of the signatures of the chunks to determine if data has changed in a chunk.
  • the system 125 may load the input data stream and selected data from the data store 105 into cache memory. Processing the input data stream and selected data from the data store 105 may allow for faster and more efficient data analysis by the system 125 .
  • the system 125 may utilize information indicative of the client device or object stored on the client device to “warm up” the data loaded into the cache. That is, instead of examining an entire input data stream, the system 125 may understand that the input data stream came from a particular customer or client device. Additionally, the system 125 may know that the input data stream refers to a particular object. Thus, the system 125 may not need to compare signatures for each block (e.g., chunk) of a client device to determine unique blocks. The system 125 , in effect, narrows the comparison down to the most likely candidate chunks or extents stored in the data store 105 .
  • block e.g., chunk
  • the system 105 may select extents by comparing root (or head) signatures for a chunk of an input data stream to root (or head) signatures of extents stored in the data store 105 . Extents that have matching signatures may be ignored as the blocks corresponding thereto are already present. This process is known as deduplication. That is, only unique data need be transmitted and stored after its identification.
  • the system 125 may determine an appropriate location for the unique block(s) in the data store 105 and update an index to include metadata indicative of a location of the unique block(s). The unique block(s) may then be distributed by the system 125 to the data store 105 according to the locations recorded in the index.
  • the system 125 may store links to multiple containers into a single index.
  • This single index may be referred to as a locality sensitive index.
  • the locality sensitive index is an index that allows various local indices to be tied together into a single index, thus preserving the locality of the individual indices while allowing for interrelation of the same.
  • the system 125 allows for the use of chunks while preserving the index and locality required for the deduplicated backup data store, as described in greater detail above.
  • FIG. 2 illustrates an exemplary method for maintaining locality of an input stream of data.
  • the method may comprise an initial step 205 of receiving an input stream, such as a backup of a local machine.
  • the method may comprise a step 210 of splitting the input stream into a plurality of chunks, according to any desired process.
  • the method may comprise an optional step 215 of creating an identifier for each chunk. As mentioned above, this identifier may comprise a signature or a cryptographic hash value.
  • the method may comprise a step 220 of placing each of the chunks into a container in a sequential manner.
  • Each container may be assigned a size and when the container is full, additional chunks may be placed into an open container.
  • containers may be filled sequentially.
  • the method may include a step 225 of generating a locality index that maps the container in which a chunk is placed. Again, this locality is based on the temporal adjacency of the chunks in the input stream due to their association with a particular file modification process occurring on the client.
  • chunk “locality” within the system is a function of both the order in which the chunk is received relative to the input stream, as well as a container location of the chunk after placement into a container. Locality preservation is enhanced by tracking chunks using their calculated, created, or assigned identifier. For example, a SHA1 key value for a chunk may be linked to the container in which the chunk has been placed.
  • the method may comprise a step 230 of generating a container index that includes a location of the chunks within their respective containers.
  • the container index may include an offset and a length for each chunk in the container.
  • FIG. 3 is a flowchart of an exemplary method for managing a deduplicated backup data store.
  • the method may comprise a step 305 of storing an initial backup of a client device such as an end user computing system.
  • the initial backup may comprise not only blocks of data but also associated Merkle nodes, which when combined with the blocks of data comprise a distributed hash table.
  • the Merkle node is a representation or hash value of the names of the individual data blocks that comprise the files of the client.
  • the method may then comprise a step 310 of receiving an input data stream from the client device.
  • the method may separate the input data stream into chunks in step 315 .
  • the method may then include a step 320 of hashing the chunks to create a key to index the data block.
  • the index may include not only the hashes of data blocks, but also hashes of Merkle nodes. As mentioned previously, sequential chunks may be combined into an extent to maintain their temporal relatedness (which enables and enhances pre-fetching as needed). The extent itself may also be hashed.
  • the method may include a step 325 of approximating deduplication of the chunks (or extent) by generating a signature for the input data stream.
  • This signature may be compared against the signatures of other extents stored in the deduplicated backup data store. Again, the comparison of signatures may be performed at the chunk level or alternatively at the extent level.
  • the method may comprise a step 330 of selecting a signature based upon the step of comparing the signature to signatures of extents.
  • the method may comprise a step 335 of comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique blocks included in the at least a portion of the input data stream. This delineation between unique and non-unique data chunks is used in deduplicating the input data stream to ensure that only unique chunks (e.g., changed data) are stored in deduplicated backup data store.
  • the method may comprise a step 340 of updating an index to reflect the inclusion of the new unique chunks in the deduplicated backup data store.
  • the index provides a location of the unique blocks, which have been distributed to the deduplicated backup data store in a step 345 .
  • step 345 may also include a plurality of DHTs which are linked together using a locality sensitive index that preserves locality and index of each DHT.
  • the input data stream is created when a user performs a file modification process to one or more files.
  • the user may edit several spreadsheets at the same time (or in close temporal proximity, such as within a few seconds or minutes of one another).
  • the plurality of files need not be the same type.
  • the user can edit a spreadsheet and word processing document together.
  • the changes to these files would be assembled and streamed as an input data stream.
  • the input data stream can be checked against the stored signature for the client to determine what parts of the input data stream need be stored in the backup store.
  • the input data stream can be transmitted as the file modifications occur or only after a signature comparison has been completed. For example, a prior signature of a backup for the client may have been taken at an earlier point in time. A comparison of a new signature for the client against the old signature stored on the file replication store (e.g., backup store) would indicate that the files were modified. The changed data would then be transmitted over the network to the file replication store.
  • a prior signature of a backup for the client may have been taken at an earlier point in time.
  • a comparison of a new signature for the client against the old signature stored on the file replication store e.g., backup store
  • the changed data would then be transmitted over the network to the file replication store.
  • the method includes a step of generating 405 an input signature for at least a portion of an input data stream from a client.
  • the input signature is a representation of data included in the input data stream.
  • the method also includes a step of comparing 410 the input signature to stored signatures of data included in a deduplicated backup data store. This process allows the system to find the signature of the client that was previously stored on the backup store.
  • the method includes the system selecting 415 a stored signature based upon the step of comparing the input signature to the stored signatures of data included in a deduplicated backup data store.
  • the method includes comparing 420 data associated with the selected stored signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream.
  • the method includes distributing the unique data to the deduplicated backup data store.
  • the unique data that has not been stored previously is transmitted over the network to the backup data store.
  • This method provides a network optimization technique, ensuring that only new, unique data is transmitted over the network for any given backup or replication procedure.
  • input data streams are transmitted to the backup data store only upon the occurrence of a file modification process occurring on the client.
  • a new input data stream is created and transmitted for storage.
  • FIG. 5 illustrates an example method for storing input data streams of multiple file modification operations that occur on a client.
  • a first file modification process occurs at a first point in time. This first file modification process occurs for a first set of files.
  • a second file modification process occurs for a second set of files.
  • Temporal context and locality can be maintained for each of these file modification processes by storing the data in the input data streams in their own extents (e.g., containers).
  • the method can begin with a step of receiving 505 a first input data stream at a first point in time.
  • the first point in time is associated with a first file modification operation for a first set of files occurring on a client.
  • the method includes segmenting 510 the first input data stream into chunks, as well as creating 515 a signature for each of the chunks. Indeed, this could include creating a Sha1 hash value, as an example.
  • the method includes distributing 520 each chunk to one of a first plurality of containers.
  • Each container comprises a container identifier and the first plurality of containers is proximate to one another on a backup data store.
  • the temporal locality of the chunks in the input data stream are represented as spatial locality on the backup data store.
  • the method includes creating 525 a locality index that includes a mapping of a chunk signature and a container identifier. To be sure, the chunk signatures and container identifiers for each of the chunks are related to one another because they were created from the first input data stream.
  • the method includes receiving 530 a second input data stream at a second point in time.
  • the second and first points in time are different from one another because they are associated with different file modification processes.
  • the second point in time is associated with a second file modification operation for a second set of files occurring on a client.
  • the method includes segmenting 535 the second input data stream into chunks, and creating 540 a signature for each of the chunks.
  • the method comprises distributing 545 each chunk to one of a second plurality of containers.
  • each container comprises a container identifier.
  • the second plurality of containers is proximate to one another on a backup data store for ease of retrieval and pre-fetching as described above.
  • the method also includes creating 550 a locality index that includes a mapping of a chunk signature and a container identifier. Again, the chunk signatures and container identifiers for each of the chunks are related to one another because they were created from the second input data stream.
  • FIG. 6 illustrates an exemplary computing system 600 that may be used to implement an embodiment of the present technology.
  • the computing system 600 of FIG. 6 includes one or more processors 610 and memory 620 .
  • Main memory 620 stores, in part, instructions and data for execution by processor 610 .
  • Main memory 620 can store the executable code when the system 600 is in operation.
  • the system 600 of FIG. 6 may further include a mass storage device 630 , portable storage medium drive(s) 640 , output devices 650 , user input devices 660 , a graphics display 670 , and other peripheral devices 680 .
  • the system 600 may also comprise network storage 645 .
  • FIG. 6 The components shown in FIG. 6 are depicted as being connected via a single bus 690 .
  • the components may be connected through one or more data transport means.
  • Processor unit 610 and main memory 620 may be connected via a local microprocessor bus, and the mass storage device 630 , peripheral device(s) 680 , portable storage device 640 , and graphics display 670 may be connected via one or more input/output (I/O) buses.
  • I/O input/output
  • Mass storage device 630 which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 610 . Mass storage device 630 can store the system software for implementing embodiments of the present technology for purposes of loading that software into main memory 620 .
  • Portable storage device 640 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk or digital video disc, to input and output data and code to and from the computing system 600 of FIG. 6 .
  • the system software for implementing embodiments of the present technology may be stored on such a portable medium and input to the computing system 600 via the portable storage device 640 .
  • Input devices 660 provide a portion of a user interface.
  • Input devices 660 may include an alphanumeric keypad, such as a keyboard, for inputting alphanumeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys.
  • the system 600 as shown in FIG. 6 includes output devices 650 . Suitable output devices include speakers, printers, network interfaces, and monitors.
  • Graphics display 670 may include a liquid crystal display (LCD) or other suitable display device. Graphics display 670 receives textual and graphical information, and processes the information for output to the display device.
  • LCD liquid crystal display
  • Peripherals 680 may include any type of computer support device to add additional functionality to the computing system.
  • Peripheral device(s) 680 may include a modem or a router.
  • the components contained in the computing system 600 of FIG. 6 are those typically found in computing systems that may be suitable for use with embodiments of the present technology and are intended to represent a broad category of such computer components that are well known in the art.
  • the computing system 600 of FIG. 6 can be a personal computer, hand held computing system, telephone, mobile computing system, workstation, server, minicomputer, mainframe computer, or any other computing system.
  • the computer can also include different bus configurations, networked platforms, multi-processor platforms, etc.
  • Various operating systems can be used including UNIX, Linux, Windows, Macintosh OS, Palm OS, and other suitable operating systems.
  • Some of the above-described functions may be composed of instructions that are stored on storage media (e.g., computer-readable medium).
  • the instructions may be retrieved and executed by the processor.
  • Some examples of storage media are memory devices, tapes, disks, and the like.
  • the instructions are operational when executed by the processor to direct the processor to operate in accord with the technology. Those skilled in the art are familiar with instructions, processor(s), and storage media.
  • Non-volatile media include, for example, optical or magnetic disks, such as a fixed disk.
  • Volatile media include dynamic memory, such as system RAM.
  • Transmission media include coaxial cables, copper wire and fiber optics, among others, including the wires that comprise one embodiment of a bus.
  • Transmission media can also take the form of acoustic or light waves, such as those generated during radio frequency (RF) and infrared (IR) data communications.
  • RF radio frequency
  • IR infrared
  • Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic medium, a CD-ROM disk, digital video disk (DVD), any other optical medium, any other physical medium with patterns of marks or holes, a RAM, a PROM, an EPROM, an EEPROM, a FLASHEPROM, any other memory chip or data exchange adapter, a carrier wave, or any other medium from which a computer can read.
  • a bus carries the data to system RAM, from which a CPU retrieves and executes the instructions.
  • the instructions received by system RAM can optionally be stored on a fixed disk either before or after execution by a CPU.
  • Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Systems and methods for distributed and deduplicating a data store are provided herein. An exemplary method for distributed and deduplicating stored data, may include receiving an input data stream, segmenting the input data stream into chunks, creating a signature for each of the chunks, distributing each chunk to one of a plurality of containers, each container having a container identifier, and creating an index that includes a mapping of a chunk signature and a container identifier.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This non-provisional U.S. patent application is related to non-provisional U.S. patent application Ser. No. 13/889,164, filed on May 7, 2013, entitled “Cloud Storage Using Merkle Trees,” which is hereby incorporated by reference herein in its entirety.
  • FIELD OF THE INVENTION
  • The present technology may be generally described as providing systems and methods for distributing and deduplicating data storage.
  • BACKGROUND
  • Creating large backup data stores that are efficient in terms of data storage and data retrieval are complex processes, especially for systems that store petabytes of data or greater. Additional complexities are introduced when these large backup data stores use deduplication, such as when only unique data blocks are stored. Additionally, backup data stores that use deduplication are not currently suitable for storing data using, for example, distributed hash tables (“DHT”) as the DHT may destroy the locality of the data and the index used to track the data as it is distributed to the data store.
  • SUMMARY OF THE PRESENT TECHNOLOGY
  • According to some embodiments, the present technology may be directed to methods that comprise: (a) generating a signature for at least a portion of an input data stream, the signature including a representation of data included in the input data stream; (b) comparing the signature to signatures of data included in a deduplicated backup data store; (c) selecting a signature based upon the step of comparing the signature to signatures of data included in a deduplicated backup data store; (d) comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream; and (e) distributing the unique data to the deduplicated backup data store.
  • According to some embodiments, the present technology may be directed to methods that comprise: (a) receiving an input data stream; (b) segmenting the input data stream into chunks; (c) creating a signature for each of the chunks; (d) distributing each chunk to one of a plurality of containers, each container comprising a container identifier; and (e) creating a locality index that includes a mapping of a chunk signature and a container identifier.
  • According to some embodiments, the present technology may be directed to systems that comprise: (a) a processor; (b) logic encoded in one or more tangible media for execution by the processor and when executed operable to perform operations comprising: (i) generating a signature for at least a portion of an input data stream, the signature including a representation of data included in the input data stream; (ii) comparing the signature to signatures of data included in a deduplicated backup data store; (iii) selecting a signature based upon the step of comparing the signature to signatures of data included in a deduplicated backup data store; (iv) comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream; and (v) distributing the unique data to the deduplicated backup data store.
  • According to some embodiments, the present technology may be directed to a non-transitory machine-readable storage medium having embodied thereon a program. In some embodiments the program may be executed by a machine to perform a method. The method may comprise: (a) generating a signature for at least a portion of an input data stream, the signature including a representation of data included in the input data stream; (b) comparing the signature to signatures of data included in a deduplicated backup data store; (c) selecting a signature based upon the step of comparing the signature to signatures of data included in a deduplicated backup data store; (d) comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream; and (e) distributing the unique data to the deduplicated backup data store.
  • According to some embodiments, the present technology may be directed to methods that comprise: (a) receiving an input data stream; (b) separating the input data stream into chunks; (c) performing one or more of an exact and an approximate matching of the chunks of the input data stream to chunks stored in a deduplicated backup data store to determine unique chunks; (d) determining one or more locations in the deduplicated backup data store for the unique chunks; (e) updating an index to include the unique chunks with their locations; and (f) distributing the unique chunks to the deduplicated backup data store according to the index.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Certain embodiments of the present technology are illustrated by the accompanying figures. It will be understood that the figures are not necessarily to scale and that details not necessary for an understanding of the technology or that render other details difficult to perceive may be omitted. It will be understood that the technology is not necessarily limited to the particular embodiments illustrated herein.
  • FIG. 1 is a block diagram of an exemplary architecture in which embodiments of the present technology may be practiced;
  • FIG. 2 is a flowchart of an exemplary method of exact matching of chunks of data to determine unique chunks;
  • FIG. 3 is a flowchart of an exemplary method for providing a distributed and deduplicated data store; and
  • FIG. 4 is a flowchart of an example method of the present technology.
  • FIG. 5 is another example method of the present technology for storing input streams from two separate file modification operations of a client.
  • FIG. 6 illustrates an exemplary computing system that may be used to implement embodiments according to the present technology.
  • DESCRIPTION OF EXEMPLARY EMBODIMENTS
  • While this technology is susceptible of embodiment in many different forms, there is shown in the drawings and will herein be described in detail several specific embodiments with the understanding that the present disclosure is to be considered as an exemplification of the principles of the technology and is not intended to limit the technology to the embodiments illustrated.
  • It will be understood that like or analogous elements and/or components, referred to herein, may be identified throughout the drawings with like reference characters. It will be further understood that several of the figures are merely schematic representations of the present technology. As such, some of the components may have been distorted from their actual scale for pictorial clarity.
  • Generally speaking, building large data storage systems that allow for efficient storage and retrieval of data is a complex. In general, when data is received, it may be separated into chunks and the chunks may then be transmitted to a storage system. In some systems these data storage systems create an index for all chunks that are received and distributed. A metadata server may maintain the indexes and perform operations on the chunks. Thus, a malfunction of the metadata server may result in a loss of the chunks stored in the storage system, either actual loss of the data or a loss in the ability to track the location of the data in the storage system.
  • Additionally, some storage systems may deduplicate block storage, where only unique data blocks are stored. This allows the system to reduce the overall amount of data blocks stored compared to systems that store complete data sets. When deduplication is not utilized, each backup (e.g., snapshot or mirror) taken of a physical system must be stored in order to allow the physical system to be restored back to a given point in time in the past, as described above.
  • While the use of distributed hash tables (“DHT”) to store data is known, the use of DHTs is currently incompatible with systems that deduplicate data blocks. Advantageously, DHTs allow load balancing within storage systems, where chunks may be distributed into a data storage cloud. In one embodiment, each block of data may be hashed to form the index key for a DHT and the data itself is stored as the value of the key. The combination of data blocks and hash values are used to create a DHT. While the effectiveness of the methods and systems described herein may be advantageously leveraged within systems or processes that use DHTs, the present technology is not limited to these types of systems and processes. Thus, descriptions of DHTs included herein are merely provided as an exemplary use of the present technology.
  • While storage of data using DHTs can be effective in load balancing IO load across distributed nodes, unfortunately, when a DHT is used the temporal locality of the data is not maintained spatially on the disk. This is, in part, due to the fact that DHTs use the hash of the data to determine the location of the data and cryptographic hashes are by design random. For example, when multiple snapshots of a physical system are taken over time, random operations are performed on the snapshots when DHTs are used. These random operations are inefficient when compared to sequential operations. In short, DHTs are less than optimal for building deduplicated storage systems. That is, deduplicated storage systems rely on the maintaining temporal locality of the data spatially on the disk.
  • To be sure, as described herein, locality can be described in terms of temporality or space. For example, if a user modifies multiple files at the same time, it will be understood or assumed that the modification of these files is related to one another. By way of example, the user could be updating multiple spreadsheets within a given period of time. These spreadsheets may all be related to the same project or task that the user is working on. These file changes can be transmitted over the network efficiently in an input stream. The present technology will store these changes spatially together on the backup store, but their spatial proximity to one another on the backup store is due to their temporal adjacency relating to how they are used.
  • If these changes are stored in close spatial proximity on the backup store, context (the fact that they were modified together) is maintained. When the user requests this data from the backup store, the replication or retrieval process can be executed efficiently because all changes to the files were stored in close proximity to one another on the backup store. In contrast, a DHT may randomly distribute the changes to the files anywhere in the backup store, which increases data fragmentation and slows down retrieval.
  • In some embodiments, when one file is requested from the backup store, the backup store will automatically pre-fetch the files that were determined to be changed at the same time the requested file. Again, this benefit is possible because temporal locality (context) is determined and maintained. Even if the user does not utilize the additional files, the likelihood that they may be utilized is sufficient to justify pre-fetching the files in anticipation of use. Advantageously, these processes greatly improve file retrieval and replication methods of backup stores.
  • The index created for the blocks of the changed files also maintains context and locality due to the manner in which it is created. The updates to the index occur temporally when changes are transferred to the backup store.
  • These and other advantages of the present technology will be discussed in greater detail herein.
  • Referring now to the drawings, and more particularly, to FIG. 1, which includes a schematic diagram of an exemplary architecture 100 for practicing the present invention. Architecture 100 may include a deduplicated backup data store 105; hereinafter “data store 105.” In some instances, the data store 105 may be implemented within a cloud-based computing environment. In general, a cloud-based computing environment is a resource that typically combines the computational power of a large model of processors and/or that combines the storage capacity of a large model of computer memories or storage devices. For example, systems that provide a cloud resource may be utilized exclusively by their owners; or such systems may be accessible to outside users who deploy applications within the computing infrastructure to obtain the benefit of large computational or storage resources.
  • The cloud may be formed, for example, by a network of servers, with each server (or at least a plurality thereof) providing processor and/or storage resources. These servers may manage workloads provided by multiple users (e.g., cloud resource consumers or other users). Typically, each user places workload demands upon the cloud that vary in real-time, sometimes dramatically. The nature and extent of these variations typically depend on the type of business associated with the user.
  • In some instances the data store 105 may include a block store 115 that stores unique blocks of data for one or more objects, such as a file, a group of files, or an entire disk. For example, the block store 115 may comprise a plurality of containers 120 a-n, which are utilized to store data chunks that are separated from the input data stream, as will be described in greater detail below. The term “container” may also be referred to as an “extent.”
  • In some instances, objects written to the block store 115 are immutable. When the present technology updates an existing object to generate a new object, a new object identifier may be generated and provided back to the object owner.
  • In some instances, the responsibility of implementing a traditional interface where object identifiers do not change on update is facilitated by the application/client. In other embodiments, the data store 105 may provide ‘mutable’ metadata storage where the client/application can manage immutable objects which are mapped to mutable object identifiers and other application specific metadata.
  • According to some embodiments, the block store 115 may include immutable object addressable block storage. The block store 115 may form an underlying storage foundation that allows for the storing of blocks of objects. The identifiers of the blocks are a unique representation of the object, generated for example by using an SHA1 hash function. The present technology may also use other cryptographic hash functions that would be known to one of ordinary skill in the art with the present disclosure before them.
  • The architecture 100 may include a deduplication system, hereinafter referred to as system 125 that provides distributed and deduplicated data storage.
  • In some instances, the system 125 receives input data streams from a client device 130. For example, an input data stream may include a snapshot or an incremental file for the client device 130. The client device may include an end user computing system, an appliance, such as a backup appliance, a server, or any other computing device that may include objects such as files, directories, disks, and so forth.
  • In some instances the API may encapsulate messages and their respective operations, allowing for efficient writing of objects over a network, such as network 135. In some instances, the network 135 may comprise a local area network (“LAN”), a wide area network (“WAN”), or any other private or public network, such as the Internet.
  • The system 125 may divide or separate an input data stream into a plurality of chunks, also referred to as blocks, segments, pieces, and so forth. Any method for separating the input data stream into chunks that would be known to one of ordinary skill in the art may also likewise be utilized in accordance with the present technology. As each chunk is received (or created), the chunks are passed to containers 120 a-n, which may also be referred to as blobs. Containers 120 a-n may be filled with chunks, which are received sequentially around the same time thus maintaining temporal locality also spatial locality within the same container. Additionally, each of the chunks may be encrypted or otherwise hashed so as to create a unique identifier for the chunk of data. For example, a chunk may be hashed using SHA1 to produce a SHA1 key value for the chunk. In some instances, the input data stream may arrive at the system 125 in an already-chunked manner. Optionally, each of the hashed chunk values may be incorporated by the system 125 into Merkel nodes and the Merkel nodes may be arrange into a Merkel tree at the data store 105.
  • Additional details regarding the creation Merkle trees and the transmission of data over a network using such Merkle trees can be found in co-pending non-provisional U.S. patent application Ser. No. 13/889,164, filed on May 7, 2013, entitled “Cloud Storage Using Merkle Trees,” which is hereby incorporated by reference herein in its entirety.
  • According to some embodiments, the system 125 may generate a signature for each extent using other technologies than cryptographic hashing functions. The signature is a representation of the data included in the extent. In some instances, to generate the signature, the system 125 may apply an algorithm that is similar to an algorithm used for facial recognition. For example, in facial recognition, a signature for a face of an individual included in an image file may be generated. This signature may be compare facial signatures in other image files to determine if facial signatures included these additional image files corresponds to the facial signature of the individual. Thus, the “signature” is a mathematical representation of the unique facial features of the individual. These unique facial features convert into unique mathematical values that may be used to locate the individual in other image files.
  • Similarly, extents include data chunks that can be distinguished from other chunks on the basis of unique data features. A signature for an extent would include mathematical representations of these unique features such that comparing a signature for the extent to other signatures of other extents may allow for the system 125 to determine similar or dissimilar extents.
  • Because chunks are placed sequentially (in order received relative to the input stream) into containers 120 a-n and each chunk is provided with a unique identifier, such as a hash value, locality of the chunks may be maintained. A locality index may be managed by the system 125 that maps each chunk to its corresponding container based upon the chunk identifier. Thus, locality of data chunks is a function of the order in which the chunks are received, as well as the chunk identifiers used to distinguish chunks from one another.
  • According to some embodiments, the locality index may comprise a sparse index when the locality index becomes too large and cumbersome to maintain in memory. For example, the sparse index may map only the chunk signature with a container identifier. Also, in some instances, the system 125 may split the locality index into chunks and these chunks may also be stored in the containers, along with the chunks created from the input stream.
  • In addition to the locality index, the system 125 may also manage a container index for each container that provides an exact or approximate location for each chunk within the container. For example, the index may specify the offset and length of each chunk within the container.
  • In some instances, when the system 125 receives subsequent input data streams (e.g., subsequent snapshots) for the client device 130, the system may also separate the subsequent input streams into chunks and generate signatures for these chunks. When signatures for chunks of a subsequent input data stream are compared to signatures for chunks of a previous input data stream, differences deduced by the system 125 in these signatures may indicate that data in a particular chunk has changed. Thus, the system 125 may then obtain these changed chunks and store data from these changed chunks in the data store 105. The ability for the system 125 to recognize changed data allows the system 125 to store only unique data in the data store 105 (e.g., deduplicated data).
  • When comparing signatures and/or data between an input data stream and deduplicated data that is stored in the data store 105, the system 125 may employ either exact or approximated deduplication methods. In some instances, the system 125 may also use approximated deduplication methods initially, followed by a more robust exact matching deduplication method at a later time, as a means of verification.
  • With regard to approximate deduplication methods, the system 125 may compare the signature of an extent to signature for similar extents store in the data store 105. Any difference in signatures between similar extents for the same object such as a file, indicate that the data of the object has changed.
  • In some instances, the system 125 may establish rules that allow the system 125 to quickly process input data streams to determine if unique data blocks exist in the input data stream. If the comparison between signatures indicates that the input data stream is not likely to include unique data, the system 125 may ignore the input data stream. Conversely, if the comparison between signatures indicates that the input data stream is not likely to include unique data, the system 125 may further examine the input data stream to determine which chunks of data have changed.
  • For example, if the signature of an input data stream is determined by the system 125 to be sufficiently different from a signature of an extent for the same object stored in the data store 105, the system 125 may also process the input data stream using the exact deduplication method described below.
  • With regard to exact match deduplication methods, the system 125 may compare signatures of chunks of an input data stream to node signatures of similar chunks stored in the data store 105. The system 125 may check matches at the chunk or extent level using hash values associated with chunks. That is, each block or chunk of data included in an extent may be associated with its own signature of identifier. The chunk may include a unique hash value of the data included in a particular chunk of data. Any change in data of a chunk will change the hash value of the chunk. The system 125 can use the comparison of the signatures of the chunks to determine if data has changed in a chunk.
  • It will be understood that examining and comparing data streams at the block level via signature comparison allows exact matching, not simply because the comparison is being performed at a more granular level but also because any change in data for the same data block will produce different chunks having different hash values relative to one another.
  • According to some embodiments, the system 125 may load the input data stream and selected data from the data store 105 into cache memory. Processing the input data stream and selected data from the data store 105 may allow for faster and more efficient data analysis by the system 125.
  • In some embodiments, the system 125 may utilize information indicative of the client device or object stored on the client device to “warm up” the data loaded into the cache. That is, instead of examining an entire input data stream, the system 125 may understand that the input data stream came from a particular customer or client device. Additionally, the system 125 may know that the input data stream refers to a particular object. Thus, the system 125 may not need to compare signatures for each block (e.g., chunk) of a client device to determine unique blocks. The system 125, in effect, narrows the comparison down to the most likely candidate chunks or extents stored in the data store 105. In some instances, the system 105 may select extents by comparing root (or head) signatures for a chunk of an input data stream to root (or head) signatures of extents stored in the data store 105. Extents that have matching signatures may be ignored as the blocks corresponding thereto are already present. This process is known as deduplication. That is, only unique data need be transmitted and stored after its identification.
  • After unique blocks have been determined from the input data stream, the system 125 may determine an appropriate location for the unique block(s) in the data store 105 and update an index to include metadata indicative of a location of the unique block(s). The unique block(s) may then be distributed by the system 125 to the data store 105 according to the locations recorded in the index.
  • In some instances, the system 125 may store links to multiple containers into a single index. This single index may be referred to as a locality sensitive index. The locality sensitive index is an index that allows various local indices to be tied together into a single index, thus preserving the locality of the individual indices while allowing for interrelation of the same. Thus, the system 125 allows for the use of chunks while preserving the index and locality required for the deduplicated backup data store, as described in greater detail above.
  • FIG. 2 illustrates an exemplary method for maintaining locality of an input stream of data. The method may comprise an initial step 205 of receiving an input stream, such as a backup of a local machine. The method may comprise a step 210 of splitting the input stream into a plurality of chunks, according to any desired process. The method may comprise an optional step 215 of creating an identifier for each chunk. As mentioned above, this identifier may comprise a signature or a cryptographic hash value. As the input stream is chunked, the method may comprise a step 220 of placing each of the chunks into a container in a sequential manner.
  • Each container may be assigned a size and when the container is full, additional chunks may be placed into an open container. Thus, containers may be filled sequentially. As chunks are placed into containers, the method may include a step 225 of generating a locality index that maps the container in which a chunk is placed. Again, this locality is based on the temporal adjacency of the chunks in the input stream due to their association with a particular file modification process occurring on the client. In sum, chunk “locality” within the system is a function of both the order in which the chunk is received relative to the input stream, as well as a container location of the chunk after placement into a container. Locality preservation is enhanced by tracking chunks using their calculated, created, or assigned identifier. For example, a SHA1 key value for a chunk may be linked to the container in which the chunk has been placed.
  • Additionally, the method may comprise a step 230 of generating a container index that includes a location of the chunks within their respective containers. As mentioned previously, the container index may include an offset and a length for each chunk in the container.
  • FIG. 3 is a flowchart of an exemplary method for managing a deduplicated backup data store. The method may comprise a step 305 of storing an initial backup of a client device such as an end user computing system. The initial backup may comprise not only blocks of data but also associated Merkle nodes, which when combined with the blocks of data comprise a distributed hash table. Again, the Merkle node is a representation or hash value of the names of the individual data blocks that comprise the files of the client.
  • The method may then comprise a step 310 of receiving an input data stream from the client device. In some embodiments, the method may separate the input data stream into chunks in step 315. Once separated into chunks the method may then include a step 320 of hashing the chunks to create a key to index the data block. According to some embodiments, the index may include not only the hashes of data blocks, but also hashes of Merkle nodes. As mentioned previously, sequential chunks may be combined into an extent to maintain their temporal relatedness (which enables and enhances pre-fetching as needed). The extent itself may also be hashed.
  • In some instances, the method may include a step 325 of approximating deduplication of the chunks (or extent) by generating a signature for the input data stream. This signature may be compared against the signatures of other extents stored in the deduplicated backup data store. Again, the comparison of signatures may be performed at the chunk level or alternatively at the extent level.
  • Next, the method may comprise a step 330 of selecting a signature based upon the step of comparing the signature to signatures of extents. After selection of a signature, the method may comprise a step 335 of comparing data associated with the selected signature to the at least a portion of the input data stream to determine unique blocks included in the at least a portion of the input data stream. This delineation between unique and non-unique data chunks is used in deduplicating the input data stream to ensure that only unique chunks (e.g., changed data) are stored in deduplicated backup data store.
  • In some instances, the method may comprise a step 340 of updating an index to reflect the inclusion of the new unique chunks in the deduplicated backup data store. The index provides a location of the unique blocks, which have been distributed to the deduplicated backup data store in a step 345. According to some embodiments, step 345 may also include a plurality of DHTs which are linked together using a locality sensitive index that preserves locality and index of each DHT.
  • Referring now to FIG. 4, an example method for storing an input data stream in a de-duplicated manner is illustrated. For context, the input data stream is created when a user performs a file modification process to one or more files. For example, the user may edit several spreadsheets at the same time (or in close temporal proximity, such as within a few seconds or minutes of one another). To be sure, the plurality of files need not be the same type. For example the user can edit a spreadsheet and word processing document together. The changes to these files would be assembled and streamed as an input data stream. In other embodiments, as illustrated in FIG. 4, the input data stream can be checked against the stored signature for the client to determine what parts of the input data stream need be stored in the backup store.
  • The input data stream can be transmitted as the file modifications occur or only after a signature comparison has been completed. For example, a prior signature of a backup for the client may have been taken at an earlier point in time. A comparison of a new signature for the client against the old signature stored on the file replication store (e.g., backup store) would indicate that the files were modified. The changed data would then be transmitted over the network to the file replication store.
  • Once the input data stream is received, the method of FIG. 4 is executed.
  • The method includes a step of generating 405 an input signature for at least a portion of an input data stream from a client. To be sure, the input signature is a representation of data included in the input data stream.
  • The method also includes a step of comparing 410 the input signature to stored signatures of data included in a deduplicated backup data store. This process allows the system to find the signature of the client that was previously stored on the backup store.
  • The method includes the system selecting 415 a stored signature based upon the step of comparing the input signature to the stored signatures of data included in a deduplicated backup data store.
  • To ensure that only changed data that has not already been stored on the backup data store is transmitted to the backup data store, the method includes comparing 420 data associated with the selected stored signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream.
  • Next, the method includes distributing the unique data to the deduplicated backup data store. Advantageously, only the unique data that has not been stored previously is transmitted over the network to the backup data store. This method provides a network optimization technique, ensuring that only new, unique data is transmitted over the network for any given backup or replication procedure.
  • As mentioned above, input data streams are transmitted to the backup data store only upon the occurrence of a file modification process occurring on the client. Thus, as each file modification process occurs at the client, a new input data stream is created and transmitted for storage.
  • FIG. 5 illustrates an example method for storing input data streams of multiple file modification operations that occur on a client. For purposes of this example, a first file modification process occurs at a first point in time. This first file modification process occurs for a first set of files. At a second point in time, a second file modification process occurs for a second set of files. Temporal context and locality can be maintained for each of these file modification processes by storing the data in the input data streams in their own extents (e.g., containers).
  • Thus, the method can begin with a step of receiving 505 a first input data stream at a first point in time. The first point in time is associated with a first file modification operation for a first set of files occurring on a client. Next, the method includes segmenting 510 the first input data stream into chunks, as well as creating 515 a signature for each of the chunks. Indeed, this could include creating a Sha1 hash value, as an example.
  • Next, the method includes distributing 520 each chunk to one of a first plurality of containers. Each container comprises a container identifier and the first plurality of containers is proximate to one another on a backup data store. Thus, the temporal locality of the chunks in the input data stream are represented as spatial locality on the backup data store.
  • Next, the method includes creating 525 a locality index that includes a mapping of a chunk signature and a container identifier. To be sure, the chunk signatures and container identifiers for each of the chunks are related to one another because they were created from the first input data stream.
  • After this process is complete, a second file modification process occurs on the client. Thus, a second de-duplicating replication process for this new file modification process ensues.
  • The method includes receiving 530 a second input data stream at a second point in time. The second and first points in time are different from one another because they are associated with different file modification processes.
  • To be sure, the second point in time is associated with a second file modification operation for a second set of files occurring on a client. Next, the method includes segmenting 535 the second input data stream into chunks, and creating 540 a signature for each of the chunks.
  • Next, the method comprises distributing 545 each chunk to one of a second plurality of containers. As mentioned above, each container comprises a container identifier. The second plurality of containers is proximate to one another on a backup data store for ease of retrieval and pre-fetching as described above.
  • The method also includes creating 550 a locality index that includes a mapping of a chunk signature and a container identifier. Again, the chunk signatures and container identifiers for each of the chunks are related to one another because they were created from the second input data stream.
  • FIG. 6 illustrates an exemplary computing system 600 that may be used to implement an embodiment of the present technology. The computing system 600 of FIG. 6 includes one or more processors 610 and memory 620. Main memory 620 stores, in part, instructions and data for execution by processor 610. Main memory 620 can store the executable code when the system 600 is in operation. The system 600 of FIG. 6 may further include a mass storage device 630, portable storage medium drive(s) 640, output devices 650, user input devices 660, a graphics display 670, and other peripheral devices 680. The system 600 may also comprise network storage 645.
  • The components shown in FIG. 6 are depicted as being connected via a single bus 690. The components may be connected through one or more data transport means. Processor unit 610 and main memory 620 may be connected via a local microprocessor bus, and the mass storage device 630, peripheral device(s) 680, portable storage device 640, and graphics display 670 may be connected via one or more input/output (I/O) buses.
  • Mass storage device 630, which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 610. Mass storage device 630 can store the system software for implementing embodiments of the present technology for purposes of loading that software into main memory 620.
  • Portable storage device 640 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk or digital video disc, to input and output data and code to and from the computing system 600 of FIG. 6. The system software for implementing embodiments of the present technology may be stored on such a portable medium and input to the computing system 600 via the portable storage device 640.
  • Input devices 660 provide a portion of a user interface. Input devices 660 may include an alphanumeric keypad, such as a keyboard, for inputting alphanumeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys. Additionally, the system 600 as shown in FIG. 6 includes output devices 650. Suitable output devices include speakers, printers, network interfaces, and monitors.
  • Graphics display 670 may include a liquid crystal display (LCD) or other suitable display device. Graphics display 670 receives textual and graphical information, and processes the information for output to the display device.
  • Peripherals 680 may include any type of computer support device to add additional functionality to the computing system. Peripheral device(s) 680 may include a modem or a router.
  • The components contained in the computing system 600 of FIG. 6 are those typically found in computing systems that may be suitable for use with embodiments of the present technology and are intended to represent a broad category of such computer components that are well known in the art. Thus, the computing system 600 of FIG. 6 can be a personal computer, hand held computing system, telephone, mobile computing system, workstation, server, minicomputer, mainframe computer, or any other computing system. The computer can also include different bus configurations, networked platforms, multi-processor platforms, etc. Various operating systems can be used including UNIX, Linux, Windows, Macintosh OS, Palm OS, and other suitable operating systems.
  • Some of the above-described functions may be composed of instructions that are stored on storage media (e.g., computer-readable medium). The instructions may be retrieved and executed by the processor. Some examples of storage media are memory devices, tapes, disks, and the like. The instructions are operational when executed by the processor to direct the processor to operate in accord with the technology. Those skilled in the art are familiar with instructions, processor(s), and storage media.
  • It is noteworthy that any hardware platform suitable for performing the processing described herein is suitable for use with the technology. The terms “computer-readable storage medium” and “computer-readable storage media” as used herein refer to any medium or media that participate in providing instructions to a CPU for execution. Such media can take many forms, including, but not limited to, non-volatile media, volatile media and transmission media. Non-volatile media include, for example, optical or magnetic disks, such as a fixed disk. Volatile media include dynamic memory, such as system RAM. Transmission media include coaxial cables, copper wire and fiber optics, among others, including the wires that comprise one embodiment of a bus. Transmission media can also take the form of acoustic or light waves, such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic medium, a CD-ROM disk, digital video disk (DVD), any other optical medium, any other physical medium with patterns of marks or holes, a RAM, a PROM, an EPROM, an EEPROM, a FLASHEPROM, any other memory chip or data exchange adapter, a carrier wave, or any other medium from which a computer can read.
  • Various forms of computer-readable media may be involved in carrying one or more sequences of one or more instructions to a CPU for execution. A bus carries the data to system RAM, from which a CPU retrieves and executes the instructions. The instructions received by system RAM can optionally be stored on a fixed disk either before or after execution by a CPU.
  • Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Exemplary embodiments were chosen and described in order to best explain the principles of the present technology and its practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
  • Aspects of the present invention are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
  • While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. The descriptions are not intended to limit the scope of the technology to the particular forms set forth herein. Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above-described exemplary embodiments. It should be understood that the above description is illustrative and not restrictive. To the contrary, the present descriptions are intended to cover such alternatives, modifications, and equivalents as may be included within the spirit and scope of the technology as defined by the appended claims and otherwise appreciated by one of ordinary skill in the art. The scope of the technology should, therefore, be determined not with reference to the above description, but instead should be determined with reference to the appended claims along with their full scope of equivalents.

Claims (18)

What is claimed is:
1. A method, comprising:
generating an input signature for at least a portion of an input data stream from a client, the input signature including a representation of data included in the input data stream;
comparing the input signature to stored signatures of data included in a deduplicated backup data store;
selecting a stored signature based upon the step of comparing the input signature to the stored signatures of data included in a deduplicated backup data store;
comparing data associated with the selected stored signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream; and
distributing the unique data to the deduplicated backup data store.
2. The method according to claim 1, wherein generating an input signature comprises applying a facial recognition algorithm to the at least a portion of an input data stream.
3. The method according to claim 1, wherein comparing data associated with the selected stored signature to the at least a portion of the input data stream comprises:
performing an exact comparison between data of the selected stored signature to data of the at least a portion of the input data stream;
ignoring data of the at least a portion of the input data stream that is an exact match to data of the selected stored signature; and
storing in the deduplicated backup data store, data of the at least a portion of the input data stream that is not an exact match to data of the selected stored signature.
4. A method comprising:
receiving an input data stream;
segmenting the input data stream into chunks;
creating a signature for each of the chunks;
distributing each chunk to one of a plurality of containers, each container comprising a container identifier; and
creating a locality index that includes a mapping of a chunk signature and a container identifier, wherein the chunk signatures and container identifiers for each of the chunks are related to one another because they were created from the input stream.
5. The method according to claim 4, further comprising creating a container index that includes an offset and a length for each chunk included in a container.
6. The method according to claim 4, wherein the signatures of the chunks each comprise a cryptographic hash value.
7. The method according to claim 5, further comprising:
receiving a request for a file associate with one of the chunks;
pre-fetching remaining ones of the chunks or associated files; and
providing the requested file to the client.
8. The method according to claim 7, further comprising providing one or more of the remaining ones of the chunks or associated files from the pre-fetch when requested by the client.
9. A system, comprising:
a processor;
logic encoded in one or more tangible media for execution by the processor and when executed operable to perform operations comprising:
generating an input signature for at least a portion of an input data stream from a client, the input signature including a representation of data included in the input data stream;
comparing the input signature to stored signatures of data included in a deduplicated backup data store;
selecting a stored signature based upon the step of comparing the input signature to the stored signatures of data included in a deduplicated backup data store;
comparing data associated with the selected stored signature to the at least a portion of the input data stream to determine unique data included in the at least a portion of the input data stream; and
distributing the unique data to the deduplicated backup data store.
10. The system according to claim 9, wherein the deduplicated backup data store resides within a cloud.
11. The system according to claim 9, wherein generating an input signature includes the processor further executing the logic to perform operations of applying a facial recognition algorithm to the at least a portion of an input data stream.
12. The system according to claim 9, wherein comparing data of the selected stored signature to the at least a portion of the input data stream comprises includes the processor further executing the logic to perform operations of:
performing an exact comparison between data of the selected stored signature to data of the at least a portion of the input data stream;
ignoring data of the at least a portion of the input data stream that does not exactly match data of the selected stored signature; and
storing in the deduplicated backup data store, data of the at least a portion of the input data stream that does not exactly match data of the selected stored signature.
13. The system according to claim 9, wherein the processor further executes the logic to perform operations of:
receiving the input data stream;
segmenting the input data stream into chunks;
creating an extent from sequential chunks; and
hashing each chunk to create a signature, each signature comprising a hash value for data included in the chunk.
14. The system according to claim 13, wherein the processor further executes the logic to perform operations of:
distributing unique chunks to the backup data store in proximity to one another; and
creating an index that includes a location of each of the unique distributed chunks.
15. The system according to claim 14, wherein the processor further executes the logic to perform operations of:
creating a distributed hash table link for the unique distributed chunks; and
combining distributed hash table links into a localized distributed hash table.
16. The system according to claim 9, wherein the processor further executes the logic to perform operations of selecting a stored signature based upon information indicative of an object to which the input data stream belongs.
17. A method, comprising:
receiving an input data stream;
separating the input data stream into chunks;
performing one or more of an exact and an approximate matching of the chunks of the input data stream to chunks stored in a deduplicated backup data store to determine unique chunks;
determining one or more locations in the deduplicated backup data store for the unique chunks;
updating an index to include the unique chunks with their locations; and
distributing the unique chunks to the deduplicated backup data store according to the index.
18. A method, comprising:
receiving a first input data stream at a first point in time, the first point in time being associated with a first file modification operation for a first set of files occurring on a client;
segmenting the first input data stream into chunks;
creating a signature for each of the chunks;
distributing each chunk to one of a first plurality of containers, each container comprising a container identifier, the first plurality of containers being proximate to one another on a backup data store;
creating a locality index that includes a mapping of a chunk signature and a container identifier, wherein the chunk signatures and container identifiers for each of the chunks are related to one another because they were created from the first input data stream;
receiving a second input data stream at a second point in time, the second point in time being associated with a second file modification operation for a second set of files occurring on a client;
segmenting the second input data stream into chunks;
creating a signature for each of the chunks;
distributing each chunk to one of a second plurality of containers, each container comprising a container identifier, the second plurality of containers being proximate to one another on a backup data store; and
creating a locality index that includes a mapping of a chunk signature and a container identifier, wherein the chunk signatures and container identifiers for each of the chunks are related to one another because they were created from the second input data stream.
US14/864,850 2010-09-30 2015-09-24 Distributed and Deduplicating Data Storage System and Methods of Use Abandoned US20170090786A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/977,614 US20190108103A9 (en) 2013-05-07 2015-12-21 Computing device replication using file system change detection methods and systems
US15/360,836 US10284437B2 (en) 2010-09-30 2016-11-23 Cloud-based virtual machines and offices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/889,164 US9705730B1 (en) 2013-05-07 2013-05-07 Cloud storage using Merkle trees

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/889,164 Continuation-In-Part US9705730B1 (en) 2010-09-30 2013-05-07 Cloud storage using Merkle trees

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/977,614 Continuation-In-Part US20190108103A9 (en) 2013-05-07 2015-12-21 Computing device replication using file system change detection methods and systems

Publications (1)

Publication Number Publication Date
US20170090786A1 true US20170090786A1 (en) 2017-03-30

Family

ID=58409284

Family Applications (4)

Application Number Title Priority Date Filing Date
US13/889,164 Active 2034-11-09 US9705730B1 (en) 2010-09-30 2013-05-07 Cloud storage using Merkle trees
US14/864,850 Abandoned US20170090786A1 (en) 2010-09-30 2015-09-24 Distributed and Deduplicating Data Storage System and Methods of Use
US14/977,614 Abandoned US20190108103A9 (en) 2013-05-07 2015-12-21 Computing device replication using file system change detection methods and systems
US15/599,417 Active US10599533B2 (en) 2013-05-07 2017-05-18 Cloud storage using merkle trees

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/889,164 Active 2034-11-09 US9705730B1 (en) 2010-09-30 2013-05-07 Cloud storage using Merkle trees

Family Applications After (2)

Application Number Title Priority Date Filing Date
US14/977,614 Abandoned US20190108103A9 (en) 2013-05-07 2015-12-21 Computing device replication using file system change detection methods and systems
US15/599,417 Active US10599533B2 (en) 2013-05-07 2017-05-18 Cloud storage using merkle trees

Country Status (1)

Country Link
US (4) US9705730B1 (en)

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150205815A1 (en) * 2010-12-14 2015-07-23 Commvault Systems, Inc. Distributed deduplicated storage system
US9705730B1 (en) 2013-05-07 2017-07-11 Axcient, Inc. Cloud storage using Merkle trees
US9785647B1 (en) 2012-10-02 2017-10-10 Axcient, Inc. File system virtualization
US9852140B1 (en) 2012-11-07 2017-12-26 Axcient, Inc. Efficient file replication
US9858156B2 (en) 2012-06-13 2018-01-02 Commvault Systems, Inc. Dedicated client-side signature generator in a networked storage system
US9898225B2 (en) 2010-09-30 2018-02-20 Commvault Systems, Inc. Content aligned block-based deduplication
US9934238B2 (en) 2014-10-29 2018-04-03 Commvault Systems, Inc. Accessing a file system using tiered deduplication
US20180109501A1 (en) * 2016-10-17 2018-04-19 Microsoft Technology Licensing, Llc Migration containers
US9998344B2 (en) 2013-03-07 2018-06-12 Efolder, Inc. Protection status determinations for computing devices
US10061663B2 (en) 2015-12-30 2018-08-28 Commvault Systems, Inc. Rebuilding deduplication data in a distributed deduplication data storage system
US10126973B2 (en) 2010-09-30 2018-11-13 Commvault Systems, Inc. Systems and methods for retaining and using data block signatures in data protection operations
US10191816B2 (en) 2010-12-14 2019-01-29 Commvault Systems, Inc. Client-side repository in a networked deduplicated storage system
CN109408279A (en) * 2017-08-16 2019-03-01 北京京东尚科信息技术有限公司 Data back up method and device
US10229133B2 (en) 2013-01-11 2019-03-12 Commvault Systems, Inc. High availability distributed deduplicated storage system
CN109614036A (en) * 2018-11-16 2019-04-12 新华三技术有限公司成都分公司 The dispositions method and device of memory space
US10284437B2 (en) 2010-09-30 2019-05-07 Efolder, Inc. Cloud-based virtual machines and offices
US10282129B1 (en) 2017-10-24 2019-05-07 Bottomline Technologies (De), Inc. Tenant aware, variable length, deduplication of stored data
US10339106B2 (en) 2015-04-09 2019-07-02 Commvault Systems, Inc. Highly reusable deduplication database after disaster recovery
US10380072B2 (en) 2014-03-17 2019-08-13 Commvault Systems, Inc. Managing deletions from a deduplication database
US10481825B2 (en) 2015-05-26 2019-11-19 Commvault Systems, Inc. Replication using deduplicated secondary copy data
US10540327B2 (en) 2009-07-08 2020-01-21 Commvault Systems, Inc. Synchronized data deduplication
US10574751B2 (en) * 2016-03-22 2020-02-25 International Business Machines Corporation Identifying data for deduplication in a network storage environment
US10671370B2 (en) * 2018-05-30 2020-06-02 Red Hat, Inc. Distributing file system states
US20200409796A1 (en) * 2019-06-28 2020-12-31 Rubrik, Inc. Data management system with limited control of external compute and storage resources
US10921987B1 (en) * 2019-07-31 2021-02-16 EMC IP Holding Company LLC Deduplication of large block aggregates using representative block digests
US11010485B1 (en) * 2017-03-02 2021-05-18 Apple Inc. Cloud messaging system
US11010258B2 (en) 2018-11-27 2021-05-18 Commvault Systems, Inc. Generating backup copies through interoperability between components of a data storage management system and appliances for data storage and deduplication
US11016859B2 (en) 2008-06-24 2021-05-25 Commvault Systems, Inc. De-duplication systems and methods for application-specific data
US11392553B1 (en) 2018-04-24 2022-07-19 Pure Storage, Inc. Remote data management
US11436344B1 (en) 2018-04-24 2022-09-06 Pure Storage, Inc. Secure encryption in deduplication cluster
US11442896B2 (en) 2019-12-04 2022-09-13 Commvault Systems, Inc. Systems and methods for optimizing restoration of deduplicated data stored in cloud-based storage resources
US11463264B2 (en) 2019-05-08 2022-10-04 Commvault Systems, Inc. Use of data block signatures for monitoring in an information management system
US11604583B2 (en) 2017-11-28 2023-03-14 Pure Storage, Inc. Policy based data tiering
US11675741B2 (en) 2019-06-28 2023-06-13 Rubrik, Inc. Adaptable multi-layered storage for deduplicating electronic messages
US11687424B2 (en) 2020-05-28 2023-06-27 Commvault Systems, Inc. Automated media agent state management
US11698727B2 (en) 2018-12-14 2023-07-11 Commvault Systems, Inc. Performing secondary copy operations based on deduplication performance
US11829251B2 (en) 2019-04-10 2023-11-28 Commvault Systems, Inc. Restore using deduplicated secondary copy data
US11868214B1 (en) * 2020-02-02 2024-01-09 Veritas Technologies Llc Methods and systems for affinity aware container prefetching

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10366102B2 (en) * 2014-02-19 2019-07-30 Snowflake Inc. Resource management systems and methods
US10223394B1 (en) * 2015-03-24 2019-03-05 Amazon Technologies, Inc. Data reconciliation
KR101977109B1 (en) * 2015-11-17 2019-08-28 (주)마크애니 Large simultaneous digital signature service system based on hash function and method thereof
US10242065B1 (en) * 2016-06-30 2019-03-26 EMC IP Holding Company LLC Combining merkle trees in graph databases
US10867040B2 (en) * 2016-10-17 2020-12-15 Datto, Inc. Systems and methods for detecting ransomware infection
US10909105B2 (en) * 2016-11-28 2021-02-02 Sap Se Logical logging for in-memory metadata store
US10291408B2 (en) 2016-12-23 2019-05-14 Amazon Technologies, Inc. Generation of Merkle trees as proof-of-work
US20180181310A1 (en) * 2016-12-23 2018-06-28 Cloudendure Ltd. System and method for disk identification in a cloud based computing environment
US10511445B1 (en) 2017-01-05 2019-12-17 Amazon Technologies, Inc. Signature compression for hash-based signature schemes
US10608824B1 (en) * 2017-01-09 2020-03-31 Amazon Technologies, Inc. Merkle signature scheme tree expansion
US10652330B2 (en) 2017-01-15 2020-05-12 Google Llc Object storage in cloud with reference counting using versions
US11163721B1 (en) * 2017-04-25 2021-11-02 EMC IP Holding Company LLC Snapshot change list and file system indexing
US10387271B2 (en) * 2017-05-10 2019-08-20 Elastifile Ltd. File system storage in cloud using data and metadata merkle trees
US10649852B1 (en) * 2017-07-14 2020-05-12 EMC IP Holding Company LLC Index metadata for inode based backups
US10545696B2 (en) * 2017-11-14 2020-01-28 Samsung Electronics Co., Ltd. Data deduplication using KVSSD
US11177961B2 (en) * 2017-12-07 2021-11-16 Nec Corporation Method and system for securely sharing validation information using blockchain technology
CN108228767B (en) * 2017-12-27 2022-03-15 中国地质大学(武汉) Method and device for directionally deleting files by smart phone and storage device
US10754737B2 (en) * 2018-06-12 2020-08-25 Dell Products, L.P. Boot assist metadata tables for persistent memory device updates during a hardware fault
US11163750B2 (en) 2018-09-27 2021-11-02 International Business Machines Corporation Dynamic, transparent manipulation of content and/or namespaces within data storage systems
US11474912B2 (en) * 2019-01-31 2022-10-18 Rubrik, Inc. Backup and restore of files with multiple hard links
US11392541B2 (en) 2019-03-22 2022-07-19 Hewlett Packard Enterprise Development Lp Data transfer using snapshot differencing from edge system to core system
US10990675B2 (en) 2019-06-04 2021-04-27 Datto, Inc. Methods and systems for detecting a ransomware attack using entropy analysis and file update patterns
US11616810B2 (en) 2019-06-04 2023-03-28 Datto, Inc. Methods and systems for ransomware detection, isolation and remediation
US11347881B2 (en) 2020-04-06 2022-05-31 Datto, Inc. Methods and systems for detecting ransomware attack in incremental backup
US11048693B2 (en) 2019-06-05 2021-06-29 International Business Machines Corporation Resolution of ordering inversions
CN110493325B (en) * 2019-07-31 2020-12-29 创新先进技术有限公司 Block chain state data synchronization method and device and electronic equipment
US11467775B2 (en) 2019-10-15 2022-10-11 Hewlett Packard Enterprise Development Lp Virtual persistent volumes for containerized applications
US11392458B2 (en) * 2019-10-25 2022-07-19 EMC IP Holding Company LLC Reconstructing lost data objects by generating virtual user files from available nodes within a cluster
US11461362B2 (en) * 2020-01-29 2022-10-04 EMC IP Holding Company LLC Merkle super tree for synchronizing data buckets of unlimited size in object storage systems
US11455319B2 (en) * 2020-01-29 2022-09-27 EMC IP Holding Company LLC Merkle tree forest for synchronizing data buckets of unlimited size in object storage systems
US11645161B2 (en) 2020-03-26 2023-05-09 Hewlett Packard Enterprise Development Lp Catalog of files associated with snapshots
US11687267B2 (en) 2020-04-14 2023-06-27 Hewlett Packard Enterprise Development Lp Containerized application manifests and virtual persistent volumes
US11693573B2 (en) 2020-06-18 2023-07-04 Hewlett Packard Enterprise Development Lp Relaying storage operation requests to storage systems using underlying volume identifiers
US11755229B2 (en) * 2020-06-25 2023-09-12 EMC IP Holding Company LLC Archival task processing in a data storage system
US10990676B1 (en) * 2020-07-01 2021-04-27 Morgan Stanley Services Group Inc. File collection method for subsequent malware detection
US10860717B1 (en) * 2020-07-01 2020-12-08 Morgan Stanley Services Group Inc. Distributed system for file analysis and malware detection
US11481371B2 (en) 2020-07-27 2022-10-25 Hewlett Packard Enterprise Development Lp Storage system capacity usage estimation
US11960773B2 (en) 2020-07-31 2024-04-16 Hewlett Packard Enterprise Development Lp Modifying virtual persistent volumes based on analysis of performance metrics
US20220197944A1 (en) * 2020-12-22 2022-06-23 Netapp Inc. File metadata service
CN112699084B (en) * 2021-01-06 2022-10-28 青岛海尔科技有限公司 File cleaning method and device, storage medium and electronic device
US11599506B1 (en) * 2021-10-28 2023-03-07 EMC IP Holding Company LLC Source namespace and file copying
US11481372B1 (en) * 2022-04-13 2022-10-25 Aleksei Neganov Systems and methods for indexing multi-versioned data
US20230359593A1 (en) * 2022-05-09 2023-11-09 Netapp, Inc. Object versioning support for a file system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120330904A1 (en) * 2011-06-27 2012-12-27 International Business Machines Corporation Efficient file system object-based deduplication
US8639917B1 (en) * 2009-10-20 2014-01-28 Vmware, Inc. Streaming a desktop image over wide area networks in which the desktop image is segmented into a prefetch set of files, streaming set of files and leave-behind set of files
US20140101113A1 (en) * 2012-10-08 2014-04-10 Symantec Corporation Locality Aware, Two-Level Fingerprint Caching
US8745003B1 (en) * 2011-05-13 2014-06-03 Emc Corporation Synchronization of storage using comparisons of fingerprints of blocks
US20140244599A1 (en) * 2013-02-22 2014-08-28 Symantec Corporation Deduplication storage system with efficient reference updating and space reclamation
US20150112939A1 (en) * 2013-10-18 2015-04-23 Solidfire, Inc. Incremental block level backup

Family Cites Families (208)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5379412A (en) 1992-04-20 1995-01-03 International Business Machines Corporation Method and system for dynamic allocation of buffer storage space during backup copying
JP3497886B2 (en) 1994-05-10 2004-02-16 富士通株式会社 Server data linking device
US5574905A (en) 1994-05-26 1996-11-12 International Business Machines Corporation Method and apparatus for multimedia editing and data recovery
US5860107A (en) 1996-10-07 1999-01-12 International Business Machines Corporation Processor and method for store gathering through merged store operations
US6272492B1 (en) 1997-11-21 2001-08-07 Ibm Corporation Front-end proxy for transparently increasing web server functionality
US9292111B2 (en) 1998-01-26 2016-03-22 Apple Inc. Gesturing with a multipoint sensing device
US6205527B1 (en) 1998-02-24 2001-03-20 Adaptec, Inc. Intelligent backup and restoring system and method for implementing the same
US6122629A (en) 1998-04-30 2000-09-19 Compaq Computer Corporation Filesystem data integrity in a single system image environment
US6604236B1 (en) 1998-06-30 2003-08-05 Iora, Ltd. System and method for generating file updates for files stored on read-only media
US6233589B1 (en) 1998-07-31 2001-05-15 Novell, Inc. Method and system for reflecting differences between two files
EP0981099A3 (en) 1998-08-17 2004-04-21 Connected Place Limited A method of and an apparatus for merging a sequence of delta files
AU6104800A (en) 1999-07-16 2001-02-05 Intertrust Technologies Corp. Trusted storage systems and methods
AU2001229332A1 (en) 2000-01-10 2001-07-24 Connected Corporation Administration of a differential backup system in a client-server environment
US6651075B1 (en) 2000-02-16 2003-11-18 Microsoft Corporation Support for multiple temporal snapshots of same volume
US20010056503A1 (en) 2000-04-27 2001-12-27 Hibbard Richard J. Network interface device having primary and backup interfaces for automatic dial backup upon loss of a primary connection and method of using same
US6971018B1 (en) 2000-04-28 2005-11-29 Microsoft Corporation File protection service for a computer system
EP1168174A1 (en) 2000-06-19 2002-01-02 Hewlett-Packard Company, A Delaware Corporation Automatic backup/recovery process
US6950871B1 (en) 2000-06-29 2005-09-27 Hitachi, Ltd. Computer system having a storage area network and method of handling data in the computer system
US6918091B2 (en) 2000-11-09 2005-07-12 Change Tools, Inc. User definable interface system, method and computer program product
WO2002077862A1 (en) 2001-03-27 2002-10-03 British Telecommunications Public Limited Company File synchronisation
US20030011638A1 (en) 2001-07-10 2003-01-16 Sun-Woo Chung Pop-up menu system
US7216135B2 (en) 2002-02-15 2007-05-08 International Business Machines Corporation File system for providing access to a snapshot dataset where disk address in the inode is equal to a ditto address for indicating that the disk address is invalid disk address
US6877048B2 (en) 2002-03-12 2005-04-05 International Business Machines Corporation Dynamic memory allocation between inbound and outbound buffers in a protocol handler
US7165154B2 (en) 2002-03-18 2007-01-16 Net Integration Technologies Inc. System and method for data backup
US7051050B2 (en) 2002-03-19 2006-05-23 Netwrok Appliance, Inc. System and method for restoring a single file from a snapshot
US7058656B2 (en) 2002-04-11 2006-06-06 Sun Microsystems, Inc. System and method of using extensions in a data structure without interfering with applications unaware of the extensions
US7058902B2 (en) 2002-07-30 2006-06-06 Microsoft Corporation Enhanced on-object context menus
US7024581B1 (en) 2002-10-09 2006-04-04 Xpoint Technologies, Inc. Data processing recovery system and method spanning multiple operating system
US7055010B2 (en) 2002-11-06 2006-05-30 Synology Inc. Snapshot facility allowing preservation of chronological views on block drives
JP2004171249A (en) 2002-11-20 2004-06-17 Hitachi Ltd Backup execution decision method for database
US7624143B2 (en) 2002-12-12 2009-11-24 Xerox Corporation Methods, apparatus, and program products for utilizing contextual property metadata in networked computing environments
US7809693B2 (en) 2003-02-10 2010-10-05 Netapp, Inc. System and method for restoring data on demand for instant volume restoration
US7320009B1 (en) 2003-03-28 2008-01-15 Novell, Inc. Methods and systems for file replication utilizing differences between versions of files
US7558927B2 (en) 2003-05-06 2009-07-07 Aptare, Inc. System to capture, transmit and persist backup and recovery meta data
EP1620778B1 (en) 2003-05-06 2013-12-25 Aptare, Inc. System to capture, transmit and persist backup and recovery meta data
US7328366B2 (en) 2003-06-06 2008-02-05 Cascade Basic Research Corp. Method and system for reciprocal data backup
US20050010835A1 (en) 2003-07-11 2005-01-13 International Business Machines Corporation Autonomic non-invasive backup and storage appliance
US7398285B2 (en) 2003-07-30 2008-07-08 International Business Machines Corporation Apparatus and system for asynchronous replication of a hierarchically-indexed data store
US20050193235A1 (en) 2003-08-05 2005-09-01 Miklos Sandorfi Emulated storage system
US7225208B2 (en) 2003-09-30 2007-05-29 Iron Mountain Incorporated Systems and methods for backing up data files
JP4267420B2 (en) 2003-10-20 2009-05-27 株式会社日立製作所 Storage apparatus and backup acquisition method
US7188118B2 (en) 2003-11-26 2007-03-06 Veritas Operating Corporation System and method for detecting file content similarity within a file system
JP4319017B2 (en) 2003-12-02 2009-08-26 株式会社日立製作所 Storage system control method, storage system, and storage device
EP1538536A1 (en) 2003-12-05 2005-06-08 Sony International (Europe) GmbH Visualization and control techniques for multimedia digital content
US20050152192A1 (en) 2003-12-22 2005-07-14 Manfred Boldy Reducing occupancy of digital storage devices
US7406488B2 (en) 2004-02-04 2008-07-29 Netapp Method and system for maintaining data in a continuous data protection system
US7315965B2 (en) 2004-02-04 2008-01-01 Network Appliance, Inc. Method and system for storing data using a continuous data protection system
US7966293B1 (en) 2004-03-09 2011-06-21 Netapp, Inc. System and method for indexing a backup using persistent consistency point images
US7277905B2 (en) 2004-03-31 2007-10-02 Microsoft Corporation System and method for a consistency check of a database backup
US7246258B2 (en) 2004-04-28 2007-07-17 Lenovo (Singapore) Pte. Ltd. Minimizing resynchronization time after backup system failures in an appliance-based business continuance architecture
US7266655B1 (en) 2004-04-29 2007-09-04 Veritas Operating Corporation Synthesized backup set catalog
US7356729B2 (en) 2004-06-14 2008-04-08 Lucent Technologies Inc. Restoration of network element through employment of bootable image
US20060013462A1 (en) 2004-07-15 2006-01-19 Navid Sadikali Image display system and method
US7389314B2 (en) 2004-08-30 2008-06-17 Corio, Inc. Database backup, refresh and cloning system and method
US7979404B2 (en) 2004-09-17 2011-07-12 Quest Software, Inc. Extracting data changes and storing data history to allow for instantaneous access to and reconstruction of any point-in-time data
JP4325524B2 (en) 2004-09-29 2009-09-02 日本電気株式会社 Switch device and system, backup and restore method and program
US7546323B1 (en) 2004-09-30 2009-06-09 Emc Corporation System and methods for managing backup status reports
US7401192B2 (en) 2004-10-04 2008-07-15 International Business Machines Corporation Method of replicating a file using a base, delta, and reference file
WO2007089217A2 (en) 2004-11-05 2007-08-09 Kabushiki Kaisha Toshiba Network discovery mechanisms
US7814057B2 (en) 2005-04-05 2010-10-12 Microsoft Corporation Page recovery using volume snapshots and logs
US7693138B2 (en) 2005-07-18 2010-04-06 Broadcom Corporation Method and system for transparent TCP offload with best effort direct placement of incoming traffic
US20070038884A1 (en) 2005-08-10 2007-02-15 Spare Backup, Inc. System and method of remote storage of data using client software
US7743038B1 (en) 2005-08-24 2010-06-22 Lsi Corporation Inode based policy identifiers in a filing system
US8429630B2 (en) 2005-09-15 2013-04-23 Ca, Inc. Globally distributed utility computing cloud
US20070112895A1 (en) 2005-11-04 2007-05-17 Sun Microsystems, Inc. Block-based incremental backup
JP4546387B2 (en) 2005-11-17 2010-09-15 富士通株式会社 Backup system, method and program
US7730425B2 (en) 2005-11-30 2010-06-01 De Los Reyes Isabelo Function-oriented user interface
US20070204153A1 (en) 2006-01-04 2007-08-30 Tome Agustin J Trusted host platform
US20070180207A1 (en) 2006-01-18 2007-08-02 International Business Machines Corporation Secure RFID backup/restore for computing/pervasive devices
US7667686B2 (en) 2006-02-01 2010-02-23 Memsic, Inc. Air-writing and motion sensing input for portable devices
US7676763B2 (en) 2006-02-21 2010-03-09 Sap Ag Method and system for providing an outwardly expandable radial menu
US20070208918A1 (en) 2006-03-01 2007-09-06 Kenneth Harbin Method and apparatus for providing virtual machine backup
US20070220029A1 (en) 2006-03-17 2007-09-20 Novell, Inc. System and method for hierarchical storage management using shadow volumes
JP4911576B2 (en) 2006-03-24 2012-04-04 株式会社メガチップス Information processing apparatus and write-once memory utilization method
US7650369B2 (en) 2006-03-30 2010-01-19 Fujitsu Limited Database system management method and database system
US7552044B2 (en) 2006-04-21 2009-06-23 Microsoft Corporation Simulated storage area network
US7653832B2 (en) 2006-05-08 2010-01-26 Emc Corporation Storage array virtualization using a storage block mapping protocol client and server
US7945726B2 (en) 2006-05-08 2011-05-17 Emc Corporation Pre-allocation and hierarchical mapping of data blocks distributed from a first processor to a second processor for use in a file system
US8949312B2 (en) 2006-05-25 2015-02-03 Red Hat, Inc. Updating clients from a server
US7568124B2 (en) 2006-06-02 2009-07-28 Microsoft Corporation Driving data backups with data source tagging
US8302091B2 (en) 2006-06-05 2012-10-30 International Business Machines Corporation Installation of a bootable image for modifying the operational environment of a computing system
US7624134B2 (en) 2006-06-12 2009-11-24 International Business Machines Corporation Enabling access to remote storage for use with a backup program
US7873601B1 (en) 2006-06-29 2011-01-18 Emc Corporation Backup of incremental metadata in block based backup systems
JP2008015768A (en) 2006-07-05 2008-01-24 Hitachi Ltd Storage system and data management method using the same
US7783956B2 (en) 2006-07-12 2010-08-24 Cronera Systems Incorporated Data recorder
US20080027998A1 (en) 2006-07-27 2008-01-31 Hitachi, Ltd. Method and apparatus of continuous data protection for NAS
US7809688B2 (en) 2006-08-04 2010-10-05 Apple Inc. Managing backup of content
US7752487B1 (en) 2006-08-08 2010-07-06 Open Invention Network, Llc System and method for managing group policy backup
AU2007295949B2 (en) 2006-09-12 2009-08-06 Adams Consulting Group Pty. Ltd. Method system and apparatus for handling information
US8332442B1 (en) 2006-09-26 2012-12-11 Symantec Corporation Automated restoration of links when restoring individual directory service objects
US7769731B2 (en) 2006-10-04 2010-08-03 International Business Machines Corporation Using file backup software to generate an alert when a file modification policy is violated
US7832008B1 (en) 2006-10-11 2010-11-09 Cisco Technology, Inc. Protection of computer resources
US8117163B2 (en) 2006-10-31 2012-02-14 Carbonite, Inc. Backup and restore system for a computer
JP4459215B2 (en) 2006-11-09 2010-04-28 株式会社ソニー・コンピュータエンタテインメント GAME DEVICE AND INFORMATION PROCESSING DEVICE
US7620765B1 (en) 2006-12-15 2009-11-17 Symantec Operating Corporation Method to delete partial virtual tape volumes
US20080154979A1 (en) 2006-12-21 2008-06-26 International Business Machines Corporation Apparatus, system, and method for creating a backup schedule in a san environment based on a recovery plan
WO2008085201A2 (en) 2006-12-29 2008-07-17 Prodea Systems, Inc. Managed file backup and restore at remote storage locations through multi-services gateway device at user premises
US8880480B2 (en) 2007-01-03 2014-11-04 Oracle International Corporation Method and apparatus for data rollback
US7647338B2 (en) 2007-02-21 2010-01-12 Microsoft Corporation Content item query formulation
US20080229050A1 (en) 2007-03-13 2008-09-18 Sony Ericsson Mobile Communications Ab Dynamic page on demand buffer size for power savings
US9497028B1 (en) * 2007-05-03 2016-11-15 Google Inc. System and method for remote storage auditing
US7974950B2 (en) 2007-06-05 2011-07-05 International Business Machines Corporation Applying a policy criteria to files in a backup image
US8010900B2 (en) 2007-06-08 2011-08-30 Apple Inc. User interface for electronic backup
US7631155B1 (en) 2007-06-30 2009-12-08 Emc Corporation Thin provisioning of a file system and an iSCSI LUN through a common mechanism
US8676273B1 (en) 2007-08-24 2014-03-18 Iwao Fujisaki Communication device
TW200917063A (en) 2007-10-02 2009-04-16 Sunonwealth Electr Mach Ind Co Survey method for a patent searching result
JP4412509B2 (en) 2007-10-05 2010-02-10 日本電気株式会社 Storage system capacity expansion control method
US8117164B2 (en) 2007-12-19 2012-02-14 Microsoft Corporation Creating and utilizing network restore points
US9503354B2 (en) 2008-01-17 2016-11-22 Aerohive Networks, Inc. Virtualization of networking services
JP2009205333A (en) 2008-02-27 2009-09-10 Hitachi Ltd Computer system, storage device, and data management method
JP4481338B2 (en) 2008-03-28 2010-06-16 株式会社日立製作所 Backup system, storage device, and data backup method
JP4413976B2 (en) 2008-05-23 2010-02-10 株式会社東芝 Information processing apparatus and version upgrade method for information processing apparatus
US9038087B2 (en) * 2008-06-18 2015-05-19 Microsoft Technology Licensing, Llc Fence elision for work stealing
US20090319653A1 (en) 2008-06-20 2009-12-24 International Business Machines Corporation Server configuration management method
US8245156B2 (en) 2008-06-28 2012-08-14 Apple Inc. Radial menu selection
US8826181B2 (en) 2008-06-28 2014-09-02 Apple Inc. Moving radial menus
US8060476B1 (en) 2008-07-14 2011-11-15 Quest Software, Inc. Backup systems and methods for a virtual computing environment
US8103718B2 (en) * 2008-07-31 2012-01-24 Microsoft Corporation Content discovery and transfer between mobile communications nodes
US9177271B2 (en) 2008-08-14 2015-11-03 Hewlett-Packard Development Company, L.P. Heterogeneous information technology (IT) infrastructure management orchestration
US8117410B2 (en) 2008-08-25 2012-02-14 Vmware, Inc. Tracking block-level changes using snapshots
US8279174B2 (en) 2008-08-27 2012-10-02 Lg Electronics Inc. Display device and method of controlling the display device
US8452731B2 (en) * 2008-09-25 2013-05-28 Quest Software, Inc. Remote backup and restore
US8099572B1 (en) 2008-09-30 2012-01-17 Emc Corporation Efficient backup and restore of storage objects in a version set
US20100104105A1 (en) 2008-10-23 2010-04-29 Digital Cinema Implementation Partners, Llc Digital cinema asset management system
US8495624B2 (en) 2008-10-23 2013-07-23 International Business Machines Corporation Provisioning a suitable operating system environment
US20100114832A1 (en) * 2008-10-31 2010-05-06 Lillibridge Mark D Forensic snapshot
US20100179973A1 (en) 2008-12-31 2010-07-15 Herve Carruzzo Systems, methods, and computer programs for delivering content via a communications network
US9383897B2 (en) 2009-01-29 2016-07-05 International Business Machines Corporation Spiraling radial menus in computer systems
US8352717B2 (en) 2009-02-09 2013-01-08 Cs-Solutions, Inc. Recovery system using selectable and configurable snapshots
US8819113B2 (en) 2009-03-02 2014-08-26 Kaseya International Limited Remote provisioning of virtual machines
US8504785B1 (en) 2009-03-10 2013-08-06 Symantec Corporation Method and apparatus for backing up to tape drives with minimum write speed
US8370835B2 (en) 2009-03-12 2013-02-05 Arend Erich Dittmer Method for dynamically generating a configuration for a virtual machine with a virtual hard disk in an external storage device
US8099391B1 (en) 2009-03-17 2012-01-17 Symantec Corporation Incremental and differential backups of virtual machine files
US8260742B2 (en) * 2009-04-03 2012-09-04 International Business Machines Corporation Data synchronization and consistency across distributed repositories
JP5317807B2 (en) 2009-04-13 2013-10-16 株式会社日立製作所 File control system and file control computer used therefor
US20100268689A1 (en) 2009-04-15 2010-10-21 Gates Matthew S Providing information relating to usage of a simulated snapshot
US8601389B2 (en) 2009-04-30 2013-12-03 Apple Inc. Scrollable menus and toolbars
US8200926B1 (en) * 2009-05-28 2012-06-12 Symantec Corporation Methods and systems for creating full backups
US8549432B2 (en) 2009-05-29 2013-10-01 Apple Inc. Radial menus
US8345707B2 (en) * 2009-06-03 2013-01-01 Voxer Ip Llc Method for synchronizing data maintained at a plurality of nodes
US8321688B2 (en) 2009-06-12 2012-11-27 Microsoft Corporation Secure and private backup storage and processing for trusted computing and data services
US8533608B1 (en) 2009-06-29 2013-09-10 Generation E Consulting Run-book automation platform with actionable document
US8285681B2 (en) 2009-06-30 2012-10-09 Commvault Systems, Inc. Data object store and server for a cloud storage environment, including data deduplication and data management across multiple cloud storage sites
US8457018B1 (en) * 2009-06-30 2013-06-04 Emc Corporation Merkle tree reference counts
US8244914B1 (en) 2009-07-31 2012-08-14 Symantec Corporation Systems and methods for restoring email databases
JP2011039804A (en) 2009-08-12 2011-02-24 Hitachi Ltd Backup management method based on failure contents
US8209568B2 (en) 2009-08-21 2012-06-26 Novell, Inc. System and method for implementing an intelligent backup technique for cluster resources
US20110055471A1 (en) * 2009-08-28 2011-03-03 Jonathan Thatcher Apparatus, system, and method for improved data deduplication
US9086928B2 (en) 2009-08-31 2015-07-21 Accenture Global Services Limited Provisioner within cloud console—defining images of an enterprise to be operable on different cloud computing providers
US8335784B2 (en) 2009-08-31 2012-12-18 Microsoft Corporation Visual search and three-dimensional results
US8645647B2 (en) 2009-09-02 2014-02-04 International Business Machines Corporation Data storage snapshot with reduced copy-on-write
JP2013011919A (en) 2009-09-17 2013-01-17 Hitachi Ltd Storage apparatus and snapshot control method of the same
US8767593B1 (en) 2009-10-13 2014-07-01 Signal Perfection, Ltd. Method for managing, scheduling, monitoring and controlling audio and video communication and data collaboration
US8589913B2 (en) 2009-10-14 2013-11-19 Vmware, Inc. Tracking block-level writes
US8856080B2 (en) 2009-10-30 2014-10-07 Microsoft Corporation Backup using metadata virtual hard drive and differential virtual hard drive
US8296410B1 (en) 2009-11-06 2012-10-23 Carbonite, Inc. Bandwidth management in a client/server environment
US8572337B1 (en) 2009-12-14 2013-10-29 Symantec Corporation Systems and methods for performing live backups
US9465532B2 (en) 2009-12-18 2016-10-11 Synaptics Incorporated Method and apparatus for operating in pointing and enhanced gesturing modes
US8190574B2 (en) 2010-03-02 2012-05-29 Storagecraft Technology Corporation Systems, methods, and computer-readable media for backup and restoration of computer information
WO2011119173A1 (en) 2010-03-26 2011-09-29 Carbonite, Inc. Transfer of user data between logical data sites
WO2011123090A1 (en) 2010-03-29 2011-10-06 Carbonite, Inc. Discovery of non-standard folders for backup
WO2011123089A1 (en) 2010-03-29 2011-10-06 Carbonite, Inc. Managing backup sets based on user feedback
US8037345B1 (en) 2010-03-31 2011-10-11 Emc Corporation Deterministic recovery of a file system built on a thinly provisioned logical volume having redundant metadata
US9047218B2 (en) * 2010-04-26 2015-06-02 Cleversafe, Inc. Dispersed storage network slice name verification
US8224935B1 (en) * 2010-05-12 2012-07-17 Symantec Corporation Systems and methods for efficiently synchronizing configuration data within distributed computing systems
US9298563B2 (en) 2010-06-01 2016-03-29 Hewlett Packard Enterprise Development Lp Changing a number of disk agents to backup objects to a storage device
WO2011159284A1 (en) 2010-06-15 2011-12-22 Hewlett-Packard Development Company, L. P. Volume management
US8773370B2 (en) 2010-07-13 2014-07-08 Apple Inc. Table editing systems with gesture-based insertion and deletion of columns and rows
US20120065802A1 (en) 2010-09-14 2012-03-15 Joulex, Inc. System and methods for automatic power management of remote electronic devices using a mobile device
US8606752B1 (en) 2010-09-29 2013-12-10 Symantec Corporation Method and system of restoring items to a database while maintaining referential integrity
US9235474B1 (en) 2011-02-17 2016-01-12 Axcient, Inc. Systems and methods for maintaining a virtual failover volume of a target computing system
US10284437B2 (en) 2010-09-30 2019-05-07 Efolder, Inc. Cloud-based virtual machines and offices
US8954544B2 (en) 2010-09-30 2015-02-10 Axcient, Inc. Cloud-based virtual machines and offices
US9705730B1 (en) 2013-05-07 2017-07-11 Axcient, Inc. Cloud storage using Merkle trees
US8589350B1 (en) 2012-04-02 2013-11-19 Axcient, Inc. Systems, methods, and media for synthesizing views of file system backups
US8924360B1 (en) 2010-09-30 2014-12-30 Axcient, Inc. Systems and methods for restoring a file
JP5816424B2 (en) 2010-10-05 2015-11-18 富士通株式会社 Information processing device, tape device, and program
US8904126B2 (en) 2010-11-16 2014-12-02 Actifio, Inc. System and method for performing a plurality of prescribed data management functions in a manner that reduces redundant access operations to primary storage
US8417674B2 (en) * 2010-11-16 2013-04-09 Actifio, Inc. System and method for creating deduplicated copies of data by sending difference data between near-neighbor temporal states
US8495262B2 (en) 2010-11-23 2013-07-23 International Business Machines Corporation Using a table to determine if user buffer is marked copy-on-write
US8635187B2 (en) 2011-01-07 2014-01-21 Symantec Corporation Method and system of performing incremental SQL server database backups
US8412680B1 (en) 2011-01-20 2013-04-02 Commvault Systems, Inc System and method for performing backup operations and reporting the results thereof
US9311324B2 (en) * 2011-01-26 2016-04-12 Mitre Corporation Synchronizing data among a federation of servers with intermittent or low signal bandwidth
US8510597B2 (en) 2011-02-08 2013-08-13 Wisconsin Alumni Research Foundation Providing restartable file systems within computing devices
US20120210398A1 (en) 2011-02-14 2012-08-16 Bank Of America Corporation Enhanced Backup and Retention Management
US8458137B2 (en) 2011-02-22 2013-06-04 Bank Of America Corporation Backup and retention monitoring
CN103548000A (en) 2011-03-21 2014-01-29 惠普发展公司,有限责任合伙企业 Data backup prioritization
US8621274B1 (en) 2011-05-18 2013-12-31 Netapp Inc. Virtual machine fault tolerance
EP2724264B1 (en) * 2011-06-23 2020-12-16 Red Hat, Inc. Client-based data replication
US8966457B2 (en) 2011-11-15 2015-02-24 Global Supercomputing Corporation Method and system for converting a single-threaded software program into an application-specific supercomputer
WO2013086040A2 (en) * 2011-12-05 2013-06-13 Doyenz Incorporated Universal pluggable cloud disaster recovery system
US8600947B1 (en) 2011-12-08 2013-12-03 Symantec Corporation Systems and methods for providing backup interfaces
US20130166511A1 (en) 2011-12-21 2013-06-27 International Business Machines Corporation Determining an overall assessment of a likelihood of a backup set resulting in a successful restore
US9298715B2 (en) 2012-03-07 2016-03-29 Commvault Systems, Inc. Data storage system utilizing proxy device for storage operations
KR101930263B1 (en) * 2012-03-12 2018-12-18 삼성전자주식회사 Apparatus and method for managing contents in a cloud gateway
US9274897B2 (en) 2012-05-25 2016-03-01 Symantec Corporation Backup policy migration and image duplication
US20140089619A1 (en) * 2012-09-27 2014-03-27 Infinera Corporation Object replication framework for a distributed computing environment
US20140149358A1 (en) 2012-11-29 2014-05-29 Longsand Limited Configuring computing devices using a template
US9021452B2 (en) 2012-12-27 2015-04-28 Commvault Systems, Inc. Automatic identification of storage requirements, such as for use in selling data storage management solutions
US9336226B2 (en) 2013-01-11 2016-05-10 Commvault Systems, Inc. Criteria-based data synchronization management
US9031829B2 (en) 2013-02-08 2015-05-12 Machine Zone, Inc. Systems and methods for multi-user multi-lingual communications
US9940069B1 (en) * 2013-02-27 2018-04-10 EMC IP Holding Company LLC Paging cache for storage system
US9110964B1 (en) * 2013-03-05 2015-08-18 Emc Corporation Metadata optimization for network replication using differential encoding
US9292153B1 (en) 2013-03-07 2016-03-22 Axcient, Inc. Systems and methods for providing efficient and focused visualization of data
US9397907B1 (en) 2013-03-07 2016-07-19 Axcient, Inc. Protection status determinations for computing devices
US20160110261A1 (en) 2013-05-07 2016-04-21 Axcient, Inc. Cloud storage using merkle trees
US9774410B2 (en) 2014-06-10 2017-09-26 PB, Inc. Radiobeacon data sharing by forwarding low energy transmissions to a cloud host
US9954946B2 (en) * 2015-11-24 2018-04-24 Netapp, Inc. Directory level incremental replication

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8639917B1 (en) * 2009-10-20 2014-01-28 Vmware, Inc. Streaming a desktop image over wide area networks in which the desktop image is segmented into a prefetch set of files, streaming set of files and leave-behind set of files
US8745003B1 (en) * 2011-05-13 2014-06-03 Emc Corporation Synchronization of storage using comparisons of fingerprints of blocks
US20120330904A1 (en) * 2011-06-27 2012-12-27 International Business Machines Corporation Efficient file system object-based deduplication
US20140101113A1 (en) * 2012-10-08 2014-04-10 Symantec Corporation Locality Aware, Two-Level Fingerprint Caching
US20140244599A1 (en) * 2013-02-22 2014-08-28 Symantec Corporation Deduplication storage system with efficient reference updating and space reclamation
US20150112939A1 (en) * 2013-10-18 2015-04-23 Solidfire, Inc. Incremental block level backup

Cited By (74)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11016859B2 (en) 2008-06-24 2021-05-25 Commvault Systems, Inc. De-duplication systems and methods for application-specific data
US11288235B2 (en) 2009-07-08 2022-03-29 Commvault Systems, Inc. Synchronized data deduplication
US10540327B2 (en) 2009-07-08 2020-01-21 Commvault Systems, Inc. Synchronized data deduplication
US9898225B2 (en) 2010-09-30 2018-02-20 Commvault Systems, Inc. Content aligned block-based deduplication
US10284437B2 (en) 2010-09-30 2019-05-07 Efolder, Inc. Cloud-based virtual machines and offices
US10126973B2 (en) 2010-09-30 2018-11-13 Commvault Systems, Inc. Systems and methods for retaining and using data block signatures in data protection operations
US20150205815A1 (en) * 2010-12-14 2015-07-23 Commvault Systems, Inc. Distributed deduplicated storage system
US11169888B2 (en) 2010-12-14 2021-11-09 Commvault Systems, Inc. Client-side repository in a networked deduplicated storage system
US9898478B2 (en) * 2010-12-14 2018-02-20 Commvault Systems, Inc. Distributed deduplicated storage system
US10740295B2 (en) 2010-12-14 2020-08-11 Commvault Systems, Inc. Distributed deduplicated storage system
US11422976B2 (en) 2010-12-14 2022-08-23 Commvault Systems, Inc. Distributed deduplicated storage system
US10191816B2 (en) 2010-12-14 2019-01-29 Commvault Systems, Inc. Client-side repository in a networked deduplicated storage system
US9858156B2 (en) 2012-06-13 2018-01-02 Commvault Systems, Inc. Dedicated client-side signature generator in a networked storage system
US10956275B2 (en) 2012-06-13 2021-03-23 Commvault Systems, Inc. Collaborative restore in a networked storage system
US10176053B2 (en) 2012-06-13 2019-01-08 Commvault Systems, Inc. Collaborative restore in a networked storage system
US10387269B2 (en) 2012-06-13 2019-08-20 Commvault Systems, Inc. Dedicated client-side signature generator in a networked storage system
US9785647B1 (en) 2012-10-02 2017-10-10 Axcient, Inc. File system virtualization
US11169714B1 (en) 2012-11-07 2021-11-09 Efolder, Inc. Efficient file replication
US9852140B1 (en) 2012-11-07 2017-12-26 Axcient, Inc. Efficient file replication
US11157450B2 (en) 2013-01-11 2021-10-26 Commvault Systems, Inc. High availability distributed deduplicated storage system
US10229133B2 (en) 2013-01-11 2019-03-12 Commvault Systems, Inc. High availability distributed deduplicated storage system
US9998344B2 (en) 2013-03-07 2018-06-12 Efolder, Inc. Protection status determinations for computing devices
US10003646B1 (en) 2013-03-07 2018-06-19 Efolder, Inc. Protection status determinations for computing devices
US9705730B1 (en) 2013-05-07 2017-07-11 Axcient, Inc. Cloud storage using Merkle trees
US10599533B2 (en) 2013-05-07 2020-03-24 Efolder, Inc. Cloud storage using merkle trees
US10445293B2 (en) 2014-03-17 2019-10-15 Commvault Systems, Inc. Managing deletions from a deduplication database
US11188504B2 (en) 2014-03-17 2021-11-30 Commvault Systems, Inc. Managing deletions from a deduplication database
US11119984B2 (en) 2014-03-17 2021-09-14 Commvault Systems, Inc. Managing deletions from a deduplication database
US10380072B2 (en) 2014-03-17 2019-08-13 Commvault Systems, Inc. Managing deletions from a deduplication database
US10474638B2 (en) 2014-10-29 2019-11-12 Commvault Systems, Inc. Accessing a file system using tiered deduplication
US11921675B2 (en) 2014-10-29 2024-03-05 Commvault Systems, Inc. Accessing a file system using tiered deduplication
US11113246B2 (en) 2014-10-29 2021-09-07 Commvault Systems, Inc. Accessing a file system using tiered deduplication
US9934238B2 (en) 2014-10-29 2018-04-03 Commvault Systems, Inc. Accessing a file system using tiered deduplication
US11301420B2 (en) 2015-04-09 2022-04-12 Commvault Systems, Inc. Highly reusable deduplication database after disaster recovery
US10339106B2 (en) 2015-04-09 2019-07-02 Commvault Systems, Inc. Highly reusable deduplication database after disaster recovery
US10481825B2 (en) 2015-05-26 2019-11-19 Commvault Systems, Inc. Replication using deduplicated secondary copy data
US10481826B2 (en) 2015-05-26 2019-11-19 Commvault Systems, Inc. Replication using deduplicated secondary copy data
US10481824B2 (en) 2015-05-26 2019-11-19 Commvault Systems, Inc. Replication using deduplicated secondary copy data
US10255143B2 (en) 2015-12-30 2019-04-09 Commvault Systems, Inc. Deduplication replication in a distributed deduplication data storage system
US10310953B2 (en) 2015-12-30 2019-06-04 Commvault Systems, Inc. System for redirecting requests after a secondary storage computing device failure
US10956286B2 (en) 2015-12-30 2021-03-23 Commvault Systems, Inc. Deduplication replication in a distributed deduplication data storage system
US10877856B2 (en) 2015-12-30 2020-12-29 Commvault Systems, Inc. System for redirecting requests after a secondary storage computing device failure
US10061663B2 (en) 2015-12-30 2018-08-28 Commvault Systems, Inc. Rebuilding deduplication data in a distributed deduplication data storage system
US10592357B2 (en) 2015-12-30 2020-03-17 Commvault Systems, Inc. Distributed file system in a distributed deduplication data storage system
US10904338B2 (en) 2016-03-22 2021-01-26 International Business Machines Corporation Identifying data for deduplication in a network storage environment
US10574751B2 (en) * 2016-03-22 2020-02-25 International Business Machines Corporation Identifying data for deduplication in a network storage environment
US20180109501A1 (en) * 2016-10-17 2018-04-19 Microsoft Technology Licensing, Llc Migration containers
US10673823B2 (en) * 2016-10-17 2020-06-02 Microsoft Technology Licensing, Llc Migration containers
US12001579B1 (en) 2017-03-02 2024-06-04 Apple Inc. Cloud messaging system
US11010485B1 (en) * 2017-03-02 2021-05-18 Apple Inc. Cloud messaging system
CN109408279A (en) * 2017-08-16 2019-03-01 北京京东尚科信息技术有限公司 Data back up method and device
US11620065B2 (en) 2017-10-24 2023-04-04 Bottomline Technologies Limited Variable length deduplication of stored data
EP3477462A3 (en) * 2017-10-24 2019-06-12 Bottomline Technologies (DE), Inc. Tenant aware, variable length, deduplication of stored data
US10282129B1 (en) 2017-10-24 2019-05-07 Bottomline Technologies (De), Inc. Tenant aware, variable length, deduplication of stored data
US11194497B2 (en) 2017-10-24 2021-12-07 Bottomline Technologies, Inc. Variable length deduplication of stored data
US10884643B2 (en) 2017-10-24 2021-01-05 Bottomline Technologies Limited Variable length deduplication of stored data
US11604583B2 (en) 2017-11-28 2023-03-14 Pure Storage, Inc. Policy based data tiering
US11392553B1 (en) 2018-04-24 2022-07-19 Pure Storage, Inc. Remote data management
US11436344B1 (en) 2018-04-24 2022-09-06 Pure Storage, Inc. Secure encryption in deduplication cluster
US10671370B2 (en) * 2018-05-30 2020-06-02 Red Hat, Inc. Distributing file system states
CN109614036A (en) * 2018-11-16 2019-04-12 新华三技术有限公司成都分公司 The dispositions method and device of memory space
US11010258B2 (en) 2018-11-27 2021-05-18 Commvault Systems, Inc. Generating backup copies through interoperability between components of a data storage management system and appliances for data storage and deduplication
US11681587B2 (en) 2018-11-27 2023-06-20 Commvault Systems, Inc. Generating copies through interoperability between a data storage management system and appliances for data storage and deduplication
US11698727B2 (en) 2018-12-14 2023-07-11 Commvault Systems, Inc. Performing secondary copy operations based on deduplication performance
US11829251B2 (en) 2019-04-10 2023-11-28 Commvault Systems, Inc. Restore using deduplicated secondary copy data
US11463264B2 (en) 2019-05-08 2022-10-04 Commvault Systems, Inc. Use of data block signatures for monitoring in an information management system
US11681586B2 (en) * 2019-06-28 2023-06-20 Rubrik, Inc. Data management system with limited control of external compute and storage resources
US11675741B2 (en) 2019-06-28 2023-06-13 Rubrik, Inc. Adaptable multi-layered storage for deduplicating electronic messages
US20200409796A1 (en) * 2019-06-28 2020-12-31 Rubrik, Inc. Data management system with limited control of external compute and storage resources
US11914554B2 (en) 2019-06-28 2024-02-27 Rubrik, Inc. Adaptable multi-layered storage for deduplicating electronic messages
US10921987B1 (en) * 2019-07-31 2021-02-16 EMC IP Holding Company LLC Deduplication of large block aggregates using representative block digests
US11442896B2 (en) 2019-12-04 2022-09-13 Commvault Systems, Inc. Systems and methods for optimizing restoration of deduplicated data stored in cloud-based storage resources
US11868214B1 (en) * 2020-02-02 2024-01-09 Veritas Technologies Llc Methods and systems for affinity aware container prefetching
US11687424B2 (en) 2020-05-28 2023-06-27 Commvault Systems, Inc. Automated media agent state management

Also Published As

Publication number Publication date
US20170257254A1 (en) 2017-09-07
US20170177452A1 (en) 2017-06-22
US10599533B2 (en) 2020-03-24
US20190108103A9 (en) 2019-04-11
US9705730B1 (en) 2017-07-11

Similar Documents

Publication Publication Date Title
US20170090786A1 (en) Distributed and Deduplicating Data Storage System and Methods of Use
US9110603B2 (en) Identifying modified chunks in a data set for storage
US11080232B2 (en) Backup and restoration for a deduplicated file system
US8874532B2 (en) Managing dereferenced chunks in a deduplication system
US9575978B2 (en) Restoring objects in a client-server environment
US9305005B2 (en) Merging entries in a deduplication index
US8396841B1 (en) Method and system of multi-level and multi-mode cloud-based deduplication
US9910906B2 (en) Data synchronization using redundancy detection
US20160110261A1 (en) Cloud storage using merkle trees
US9262431B2 (en) Efficient data deduplication in a data storage network
US10284433B2 (en) Data synchronization using redundancy detection
KR20130120516A (en) Content based file chunking
JP2015525419A (en) Advanced data management virtualization system
US10754731B1 (en) Compliance audit logging based backup
US9749193B1 (en) Rule-based systems for outcome-based data protection
CN103067519A (en) Method and device of data distribution storage under heterogeneous platform
US11474733B2 (en) Public cloud provider cost optimization for writing data blocks directly to object storage
US11093342B1 (en) Efficient deduplication of compressed files
US9971797B1 (en) Method and system for providing clustered and parallel data mining of backup data
US10108647B1 (en) Method and system for providing instant access of backup data
US20170124107A1 (en) Data deduplication storage system and process
US11392868B1 (en) Data retention cost control for data written directly to object storage
US20210342301A1 (en) Filesystem managing metadata operations corresponding to a file in another filesystem
US20230350762A1 (en) Targeted deduplication using server-side group fingerprints for virtual synthesis
US10719406B1 (en) Enhanced fingerprint computation for de-duplicated data

Legal Events

Date Code Title Description
AS Assignment

Owner name: AXCIENT, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PARAB, NITIN;BROWN, AARON;VAN DYCK, DANE;AND OTHERS;REEL/FRAME:036983/0436

Effective date: 20150924

AS Assignment

Owner name: STRUCTURED ALPHA LP, CANADA

Free format text: SECURITY INTEREST;ASSIGNOR:AXCIENT, INC.;REEL/FRAME:042542/0364

Effective date: 20170530

AS Assignment

Owner name: SILVER LAKE WATERMAN FUND, L.P., CALIFORNIA

Free format text: SECURITY INTEREST;ASSIGNOR:AXCIENT, INC.;REEL/FRAME:042577/0901

Effective date: 20170530

AS Assignment

Owner name: AXCIENT, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:SILVER LAKE WATERMAN FUND, L.P.;REEL/FRAME:043106/0389

Effective date: 20170726

AS Assignment

Owner name: AXCIENT, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:STRUCTURED ALPHA LP;REEL/FRAME:043840/0227

Effective date: 20171011

AS Assignment

Owner name: AXCI (AN ABC) LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AXCIENT, INC.;REEL/FRAME:044367/0507

Effective date: 20170726

Owner name: AXCIENT HOLDINGS, LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AXCI (AN ABC) LLC;REEL/FRAME:044368/0556

Effective date: 20170726

Owner name: EFOLDER, INC., GEORGIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AXCIENT HOLDINGS, LLC;REEL/FRAME:044370/0412

Effective date: 20170901

AS Assignment

Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT, CALIFORNIA

Free format text: SECURITY INTEREST;ASSIGNOR:EFOLDER, INC.;REEL/FRAME:044563/0633

Effective date: 20160725

Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT,

Free format text: SECURITY INTEREST;ASSIGNOR:EFOLDER, INC.;REEL/FRAME:044563/0633

Effective date: 20160725

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: MUFG UNION BANK, N.A., ARIZONA

Free format text: SECURITY INTEREST;ASSIGNOR:EFOLDER, INC.;REEL/FRAME:061559/0703

Effective date: 20221027

AS Assignment

Owner name: EFOLDER, INC., COLORADO

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION;REEL/FRAME:061634/0623

Effective date: 20221027