US20110161723A1 - Disaster recovery using local and cloud spanning deduplicated storage system - Google Patents

Disaster recovery using local and cloud spanning deduplicated storage system Download PDF

Info

Publication number
US20110161723A1
US20110161723A1 US12942988 US94298810A US2011161723A1 US 20110161723 A1 US20110161723 A1 US 20110161723A1 US 12942988 US12942988 US 12942988 US 94298810 A US94298810 A US 94298810A US 2011161723 A1 US2011161723 A1 US 2011161723A1
Authority
US
Grant status
Application
Patent type
Prior art keywords
data
storage
spanning
storage interface
interface
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12942988
Inventor
Greg Taleck
Vivasvat Keswani
Nitin Parab
James Mace
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Riverbed Technology Inc
Original Assignee
Riverbed Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1453Management of the data involved in backup or backup restore using de-duplication of the data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30067File systems; File servers
    • G06F17/30129Details of further file system functionalities
    • G06F17/3015Redundancy elimination performed by the file system
    • G06F17/30156De-duplication implemented within the file system, e.g. based on file segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2097Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated

Abstract

A spanning storage interface facilitates the use of cloud storage services by storage clients and may perform data deduplication. The spanning storage interface may include local storage for caching data from storage clients. A disaster recovery application includes at least first and second spanning storage interfaces at first and second network locations. The second spanning storage interface is provided for at least disaster recovery operations. The second spanning storage interface includes second local storage for improving data access performance. A copy of the local cache of the first spanning storage interface is transferred to the second local storage while the first network location is operating. In the event of a disaster affecting the first network location, the second spanning storage interface can provide data access to the first network location's data with improved performance from using the copy of local cache in the second local storage.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims priority to U.S. Provisional Patent Application No. 61/315,392, filed Mar. 18, 2010 and entitled “WAN-OPTIMIZED LOCAL AND CLOUD SPANNING DEDUPLICATED STORAGE SYSTEM” and to U.S. Provisional Patent Application No. 61/290,334, filed Dec. 28, 2009 and entitled “DEDUPLICATED OBJECT STORAGE SYSTEM AND APPLICATIONS,” which are incorporated by reference herein for all purposes. This application is related to U.S. patent application Ser. No. ______ [Docket Number R001510US], filed ______, and entitled “WAN-OPTIMIZED LOCAL AND CLOUD SPANNING DEDUPLICATED STORAGE SYSTEM,” which is incorporated by reference herein for all purposes.
  • BACKGROUND OF THE INVENTION
  • The present invention relates generally to data storage systems, and systems and methods to improve storage efficiency, compactness, performance, reliability, and compatibility. In general, data storage systems receive and store all or portions of arbitrary sets or stream of data. Data storage systems also retrieve all or portions of arbitrary sets or streams of data. A data storage system provides data storage and retrieval to one or more storage clients, such as user and server computers. Stored data may be referenced by unique identifiers and/or addresses or indices. In some implementations, the data storage system uses a file system to organize data sets into files. Files may be identified and accessed by a file system path, which may include a file name and one or more hierarchical file system directories.
  • Many data storage systems are tasked with handling enormous amounts of data. Additionally, data storage systems often provide data access to large numbers of simultaneous users and software applications. Users and software applications may access the file system via local communications connections, such as a high-speed data bus within a single computer; local area network connections, such as an Ethernet networking or storage area network (SAN) connection; and wide area network connections, such as the Internet, cellular data networks, and other low-bandwidth, high-latency data communications networks.
  • Cloud storage services are one type of data storage available via a wide-area network. Cloud storage services provide storage to users in the form of a virtualized storage device available via the Internet. In general, users access cloud storage to store and retrieve data using web services protocols, such as REST or SOAP. Cloud storage service providers manage the operation and maintenance of the physical data storage devices. Users of cloud storage can avoid the initial and ongoing costs associated with buying and maintaining storage devices. Cloud storage services typically charge users for consumption of storage resources, such as storage space and/or transfer bandwidth, on a marginal or subscription basis, with little or no upfront costs. In addition to the cost and administrative advantages, cloud storage services often provide dynamically scalable capacity to meet its users changing needs.
  • The term “data deduplication” refers to some process of eliminating redundant data for the purposes of storage or communication. Data deduplicating storage typically compares incoming data with the data already stored, and only stores the portions of the incoming data that do not match data already stored in the data storage system. Data deduplicating storage maintains metadata to determine when portions of data are no longer in use by any files or other data entities.
  • The CPU and I/O requirements for supporting an extremely large data deduplicating storage are significant, and are difficult to satisfy through vertical scaling of a single device. As a result, prior spanning storage interface may impose severe throughput, latency, and other performance penalties on storage clients. Additionally, performance considerations limit the amount and types of optimizations and compression applied by prior spanning storage interfaces.
  • Additionally, prior spanning storage interfaces have difficulty operating with cloud storage systems. Data deduplication often requires frequent comparisons of incoming data with previously-stored data to identify redundant data. However, cloud data storage is accessible only via a wide-area network, such as the Internet, with significant latency and bandwidth limitations as compared with local-area and storage-area networks. Therefore, prior spanning storage interfaces have poor performance when used with cloud storage systems.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The invention will be described with reference to the drawings, in which:
  • FIG. 1 illustrates an example of spanning storage interface according to an embodiment of the invention;
  • FIG. 2 illustrates example data structures used by a spanning storage interface according to an embodiment of the invention;
  • FIG. 3A-3B illustrates a method of converting a data stream into deduplicated data according to an embodiment of the invention;
  • FIG. 4 illustrates a method of retrieving an original data stream from deduplicated data according to an embodiment of the invention;
  • FIG. 5 illustrates a method of deleting a data stream from a spanning storage interface according to an embodiment of the invention;
  • FIG. 6 illustrates a computer system suitable for implementing embodiments of the invention; and
  • FIG. 7 illustrates an example disaster recovery application of a spanning storage interface according to an embodiment of the invention.
  • SUMMARY
  • Embodiments of the invention include a spanning storage interface adapted to facilitate the use of cloud storage services by storage clients. A spanning storage interface presents one or more data interfaces to storage clients at a network location. These data interfaces may include file, object, data backup, archival, and storage block based interfaces. Each of these data interfaces allows storage clients to store and retrieve data using non-cloud based protocols. This allows storage clients to store and retrieve data in the cloud storage service using their native or built-in functions, rather than having to be rewritten and/or reconfigured to operate with a cloud storage service.
  • To improve performance of the spanning storage interface, an embodiment of the invention performs data deduplication on data received from storage clients. Once the received data has been deduplicated, the spanning storage interface may transfer the deduplicated version of the data to the cloud storage service. By transferring data in deduplicated form to and from the cloud storage service, these embodiments of the invention improve storage performance by reducing the time and network bandwidth required to access data, as well as reducing total amount of storage required. If a storage client wishes to access data previously stored in the cloud storage service, the spanning storage interface retrieves the corresponding deduplicated data and reconstructs the original data.
  • In an embodiment, the spanning storage interface may include local storage for storing a copy or all or a portion of the data from storage clients. The local storage may be used as a local cache of frequently accessed data. In a further embodiment, the local cache stores data in its deduplicated form.
  • The spanning storage interface may operated with multiple cloud storage services to provide storage clients with a range of storage options. In a further embodiment, the spanning storage interface may send different portions of the received data to different cloud storage services based on user specified attributes or criteria, such as all or a portion of the file path associated with the received data.
  • In an embodiment, two or more spanning storage interfaces may be used in a disaster recovery application. Disaster recovery application may be used to provide redundant data access to storage clients in the event that the storage clients and/or cloud spanning storage interface at a first network location are disabled, destroyed, or otherwise inaccessible or inoperable. A disaster recovery application includes at least first and second spanning storage interfaces at first and second network locations. The second spanning storage interface is provided for at least disaster recovery operations. The second spanning storage interface includes second local storage for improving data access performance. A copy of the local cache of the first spanning storage interface is transferred to the second local storage while the first network location is operating. In the event of a disaster affecting the first network location, the second spanning storage interface can provide data access to the first network location's data with the improved performance benefit using the copy of local cache in the second local storage.
  • Embodiments of the disaster recovery application may use the second network location as a dedicated disaster recovery network location. Alternatively, the second network location may also optionally be used with one or more of its own local storage clients. In this further example, the second spanning storage interface performs data deduplication and facilitates cloud storage for data from storage clients at the second network location in addition to acting as a disaster recovery system for the first network location. In yet a further embodiment, the first spanning storage interface may act as a disaster recovery system for the second spanning storage interface, just as the second spanning storage interface may act as a disaster recovery system for the first spanning storage interface. This pairing of spanning storage interfaces for disaster recovery may be extended to three or more network locations.
  • DETAILED DESCRIPTION
  • FIG. 1 illustrates an example of spanning storage interface 100 according to an embodiment of the invention. An example installation of the spanning storage interface 100 includes one or more client systems 105, which may include client computers, server computers, and standalone network devices. Client systems 105 are connected with a spanning storage interface 125 via a local-area network and/or a storage area network 115. Cloud storage 175 is connected with the spanning storage interface 125 by at least a wide-area network 177 and optionally an additional local area network. Cloud storage 175 includes a cloud storage interface 180 for communicating with the spanning storage interface 125 via wide-area network 177 and at least one physical data storage device 185 for storing data.
  • Embodiments of spanning storage interface 100 may support a variety of different storage applications using cloud data storage, including general data storage, data backup, disaster recovery, and deduplicated cloud data storage. In the case of general data storage applications, a client, such as client 105 c, may communicate with the spanning storage interface 125 via a file system protocol, such as CIFS or NTFS, or a block-based storage protocol, such as iSCSI or IFCP. Data backup and disaster recovery applications may also use these protocols or specific backup and recovery protocols, such as VTL or OST. For backup applications, a client system 105 a may include a backup agent 110 for initiating data backups. The backup agent 110 may communicate directly with the spanning storage interface 125 or a backup server 105 b, which in spanning storage interface 100 is equivalent to a client. For cloud storage applications, a client 103 c may communicate with the spanning storage interface 125 via a web services protocol, such as SOAP or REST. The web services protocol may present a virtualized storage device to client 103 c. The web services protocol used by clients 105 to communicate with the spanning storage interface 125 may be the same or different than the protocol used by the spanning storage interface 125 to communicate with the cloud storage 175.
  • Embodiments of the spanning storage interface 100 may optimize data access to cloud storage 175 in a number of different ways. An embodiment of the spanning storage interface 125 may present clients 105 with a file system, backup device, storage array, or other data storage interface, while transparently storing and retrieving data using the cloud storage 175 via the wide-area network 177. In a further embodiment, the spanning storage interface 125 may perform data deduplication on data received from clients 105, thereby reducing the amount of storage capacity required in cloud storage 175. Additionally, because the bandwidth of the wide-area network is often limited, data deduplication by the spanning storage interface 125 increases the data access performance, as perceived by the clients 125. In still a further embodiment, the spanning storage interface 125 may locally cache a portion of the clients' data using local storage 170. The locally cached data may be accessed rapidly, further improving the perceived data access performance. As described in detail below, the spanning storage interface 125 may use a variety of different criteria for selecting the portion of the clients' data to cache locally and may locally cache data in a deduplicated form to reduce the required capacity of local storage 175.
  • An embodiment of spanning storage interface 125 includes one or more front end interfaces 130 for communicating with one or more client systems 105. Examples of front end interfaces 130 include a backup front end interface 130 a, a file system front end interface 130 b, a cloud storage front end interface 130 c, a file archival front end interface 130 d, and a object front end interface 130 e. An example backup front end interface 130 a enables backup applications, such as a backup agent 110 and/or a backup server 105 b, to store and retrieve data to and from the cloud storage 175 using data backup and recovery protocols such as VTL or OST. In this example, the backup front end interface 130 a allows the spanning storage interface 125 and cloud storage 175 to appear to clients 105 as a backup storage device.
  • An example file system front end interface 130 b enables clients 105 to store and retrieve data to and from the cloud data storage 175 using a file system protocol, such as CIFS or NTFS, or a block-based storage protocol, such as iSCSI or IFCP. In this example, the file system front end interface 130 b allows the spanning storage interface 125 and cloud storage 175 to appear to clients 105 as one or more storage devices, such as a CIFS or NTFS storage volume or a iSCSI or FibreChannel logical unit number (LUN).
  • An example cloud storage front end interface 130 c enables clients 105 to store and retrieve data to and from the cloud data storage 175 using a cloud storage protocol or API. Typically, cloud storage protocols or APIs are implemented using a web services protocol, such as SOAP or REST. In this example, the cloud storage front end interface 130 c allows the spanning storage interface 125 and cloud storage 175 to appear to clients 105 as one or more cloud storage services. By using spanning storage interface 125 to provide a cloud storage interface to clients 105, rather than letting clients 105 communicate directly with the cloud storage 175, the spanning storage interface 125 may perform data deduplication, local caching, and/or translation between different cloud storage protocols.
  • An example file archival front end interface 130 d enables clients 105 to store and retrieve file archives. Clients 105 may use the spanning storage interface 125 and the cloud storage 175 to store and retrieve files or other data in one or more archive files. The file archival front end interface 130 d allows clients 105 to store archive files using cloud storage 175 using archive file interfaces, rather than a cloud storage interface. Additionally, the spanning storage interface 125 may perform data deduplication and local caching of the file archives.
  • An example object front end interface 130 e enables clients to store and retrieve data in any arbitrary format, such as object formats and blobs or binary large objects. The object front end interface 130 e allows clients 105 to store data in arbitrary formats, such as object formats or blobs, using cloud storage 175 using object protocols, such as object serialization or blob storage protocols, rather than a cloud storage protocol. Additionally, the spanning storage interface 125 may perform data deduplication and local caching of the object or blob data.
  • An example block storage protocol front end interface 130 f enables clients to store and retrieve data using block-based storage protocols, such as iSCSI. In an embodiment, the block storage protocol front end interface 130 f appears to clients 105 as one or more logical storage volumes, such as iSCSI LUNs.
  • In an embodiment, spanning storage interface 125 also includes one or more shell file systems 145. Shell file system 145 includes a representation of the entities, such as files, directories, objects, blobs, and file archives, stored by clients 125 via the front end interfaces 130. In an embodiment, the shell file system 145 includes entities stored by the clients 125 in a shell form. In this embodiment, each entity, such as a file or other entity, is a represented by a “shell” entity that does not include the data contents of the original entity. For example, a shell file in the shell file system 145 includes the same name, file path, and file metadata as the original file. However, the shell file does not include the actual file data, which is stored in the cloud storage 175. It should be noted that although the size of the shell file is less than the size of the actual stored file (in either its original or deduplicated format, an embodiment of the shell file system 145 sets the file size metadata attribute of the shell file to the size of the original file. In a further embodiment, each entity in the shell file system 145, such as a file, directory, object, blob, or file archive, may include additional metadata for use by the spanning storage interface 125 to access the corresponding data from the cloud storage 175.
  • In an embodiment, storage blocks provided to the spanning storage interface through the block storage protocol front end interface 130 f may bypass the shell file system 145. In this embodiment, data received by the spanning storage interface in the form of storage blocks are grouped together, for example in groups of fixed size and in order of receipt. Data deduplication is then applied to each group of storage blocks and the resulting deduplicated data is transferred to the cloud storage service. In this embodiment, the spanning storage interface 125 maintains a table or other data structure that associates storage block addresses or identifiers with corresponding deduplicated storage data, so that the spanning storage interface 125 can retrieve and reconstruct the appropriate data when a storage client requests access to a previously stored storage block.
  • An embodiment of the spanning storage interface 125 includes a deduplication module 150 for deduplicating data received from clients 105. Deduplication module 150 analyzes data from clients 105 and compares incoming data with previously stored data to eliminate redundant data for the purposes of storage or communication. Data deduplication reduces the amount of storage capacity used by cloud storage 175 to store clients' data. Also, because wide-area network 177 typically has bandwidth limitations, the reduction of data size due to data deduplication also reduces the amount of time required to transfer data between clients 105 and the cloud storage 175. Additionally, deduplication module 150 retrieves deduplicated data from the cloud storage 175 and converts it back to its original form for use by clients 105.
  • In an embodiment, deduplication module 150 performs data deduplication on incoming data and temporarily stores this deduplicated data locally, such as on local storage 170. Local storage 170 may be a physical storage device connected with or integrated within the spanning storage interface 125. Local storage 170 is accessed from spanning storage interface 125 by a local storage interface 160, such as an internal or external data storage interface, or via a local-area network.
  • In an embodiment, the cloud storage 175 includes a complete and authoritative version of the clients' data. In a further embodiment, the spanning storage interface 125 may maintain local copies of some or all of the clients' data for the purpose of caching. In this embodiment, the spanning storage interface 125 uses the local storage 170 to cache client data. The spanning storage interface 125 may cache data in its deduplicated format to reduce local storage requirements or increase the effective cache size. In this embodiment, the spanning storage interface 125 may use a variety of criteria for selecting portions of the deduplicated client data for caching. For example, if the spanning storage interface 125 is used for general file storage or as a cloud storage interface, the spanning storage interface may select a specific amount or percentage of the client data for local caching. In another example, the data selected for local caching may be based on usage patterns of client data, such as frequently or recently used data. Caching criteria may be based on elapsed time and/or the type of data. In another example, the spanning storage interface 125 may maintain locally cached copies of the most recent data backups from clients, such as the most recent full backup and the previous week's incremental backups.
  • In an embodiment, replication module 155 transfers locally stored deduplicated data from the spanning storage interface 125 to the cloud storage 175. Embodiments of the deduplication module and the replication module 155 may operate in parallel and/or asynchronously, so that the bandwidth limitations of wide-area network 177 do not interfere with the throughput of the deduplication module 150. The operation of embodiments of deduplication module 150 and replication module 155 are described in detail below.
  • An embodiment of spanning storage interface 125 includes a cloud storage backend interface 165 for communicating data between the spanning storage interface 125 and the cloud storage 175. Embodiments of the cloud storage backend interface 165 may use cloud storage protocols or API and/or web services protocols, such as SOAP or REST, to store and retrieve data from the cloud storage 175. In an embodiment, the replication module transfers deduplicated data from local storage 170 to cloud storage 175 using the cloud storage backend interface 165. In an embodiment, the deduplication module retrieves deduplicated data from the cloud storage 175 using the cloud storage backend interface 165.
  • An embodiment of the spanning storage interface 125 may be configured to operate with multiple cloud storage services. In an embodiment, the spanning storage interface 125 may transfer all or portions of the dededuplicated data to two or more cloud storage services. In another embodiment, the spanning storage interface 125 may transfer different portions of the deduplicated data to different cloud storage services, such as transferring a first portion of the deduplicated storage data to a first cloud storage service, a second portion of the deduplicated storage data to a second cloud storage service, and so forth.
  • Different cloud storage services may have different advantages and/or disadvantages, such as cost, bandwidth, reliability, and replication policies. In this embodiment, a system administrator or other user may identify the different portions of data and designate the cloud storage service to be used to store deduplicated versions of these portions of the data, thereby tailoring the usage of different cloud storage services to data storage needs. The user may identify different portions of data and associated cloud storage services based on file or object name, file or object type, file directory or path, contents of the data, and/or any other criteria or attribute of the data, storage client, cloud storage service, or the spanning storage interface 125.
  • In yet a further embodiment, system administrators or other users may specify quotas for cloud storage access based on the total amount of data received from storage clients or the amount of deduplicated data transferred to the one or more cloud storage services. In this embodiment, if a data transfer exceeds or is anticipated to exceed a specified quota, the spanning storage interface 125 may abandon the storage operation and return an error message or other notification to the storage client. Embodiments may allow users to specify quotas for each storage client, a group of two or more storage clients, all of the storage clients at a network location or based on criteria or attributes associated with the cloud storage service, spanning storage interface, and/or data, such as file or object names, file or object types, file directories or paths, contents of the data.
  • In an embodiment, the spanning storage interface 125 performs data deduplication by segmenting an incoming data stream to aid data compression. For example, segmentation may be designed to produce many identical segments when the data stream includes redundant data. Multiple instances of redundant data may be represented by referencing a single copy of this data.
  • Additionally, a data stream may be segmented based on data types to aid data compression, such that different data types are in different segments. Different data compression techniques may then be applied to each segment. Data compression may also determine the length of data segments. For example, data compression may be applied to a data stream until segment boundary is reached or the segment including the compressed data reaches a predetermined size, such as 4 KB. The size threshold for compressed data segments may be based on optimizing disk or data storage device access.
  • Regardless of the technique used to segment data in the data stream, the result is a segmented data stream having its data represented as segments. In some embodiments of the invention, data segmentation occurs in memory and the segmented data stream is not written back to data storage in this form. Each segment is associated with a label. Labels are smaller in size than the segments they represent. The segmented data stream is then replaced with deduplicated data in the form of a label map and segment storage. Label map includes a sequence of labels corresponding with the sequence of data segments identified in the segmented data stream. Segment storage includes copies of the segment labels and corresponding segment data. Using the label map and the data segment storage, a storage system can reconstruct the original data stream by matching in sequence each label in a label map with its corresponding segment data from the data segment storage. In an embodiment, the deduplication module 150 and/or one or more other modules of the spanning storage interface 125 reconstruct all or a portion of the original data stream in response to a data access request from a storage client.
  • Embodiments of the invention attempt (but do not always succeed) in assigning a single label to each unique data segment. Because the segmentation of the data stream produces many identical segments when the data stream includes redundant data, these embodiments allow a single label and one copy of the corresponding segment data to represent many instances of this segment data at multiple locations in the data stream. For example, a label map may include multiple instances of a given label at different locations. Each instance of this label represents an instance of the corresponding segment data. Because the label is smaller than the corresponding segment data, representing redundant segment data using multiple instances of the same label results in a substantial size reduction of the data stream.
  • FIGS. 2, 3A-3B, 4, and 5 illustrate the operation of the deduplication module 150 and the replication module 155 according to an embodiment of the invention. FIG. 2 illustrates example data structures 200 used by a spanning storage interface according to an embodiment of the invention. An embodiment of spanning storage interface 200 includes both memory 205, which has high performance but relatively low capacity, and disk storage 210, which has high capacity but relatively low performance.
  • Memory 205 includes a slab cache data structure 215. The slab cache 215 is adapted to store a set of labels 220 and a corresponding set of data segments 225. In typical applications, the sets of labels 220 and data segments 225 stored in the slab cache 215 represent only a small fraction of the total number of data segments and labels used to represent stored data. A complete set of the labels and data segments is stored in disk storage 210.
  • An embodiment of the slab cache 215 also includes segment metadata 230, which specifies characteristics of the data segments 225. In an embodiment, the segment metadata 230 includes the lengths of the data segments 225; hashes or other characterizations of the contents of the data segments 225; and/or anchor indicators, which indicate whether a particular data segment has been designated as a representative example of the contents of a data segment slab file, as discussed in detail below.
  • An embodiment of the slab cache 215 also includes data segment reference count values. The spanning storage interface 200 recognizes that some data segments are used in multiple places in one or more data streams. For at least some of the data segments, an embodiment of the spanning storage interface 200 maintains counts, referred to as reference counts, of the number of times these data segments are used. As discussed in detail below, if a data stream includes a data segment previously defined, an embodiment of the spanning storage interface 200 may increment the reference count value associated with this data segment. Conversely, if a data stream is deleted from the spanning storage interface 200, an embodiment of the spanning storage interface 200 may decrement the reference count values associated with the data segments included in the deleted data stream. If the reference count value of a data segment drops to zero, the data segment and label may be deleted and its storage space reallocated.
  • In addition to the slab cache 215, an embodiment of the spanning storage interface 200 includes a reverse map cache 240. In an embodiment, the reverse map cache 240 maps the contents of a data segment to a label, for the labels stored in the slab cache 215. In an embodiment, a hashing or other data characterization technique is applied to segment data. The resulting value is used as an index in the reverse map cache 240 to identify an associated label in the slab cache 215. If the hash or other value derived from the segment data matches an entry in the reverse map cache 240, then this data segment has been previously defined and is stored in the slab cache 215. If the hash or other value derived from the segment data does not match any entry in the reverse map cache 240, then this data segment is not currently stored in the slab cache 215. Because the slab cache 215 only includes a portion of the total number of labels used to represent data segments, a data segment that does not match a reverse map cache entry may either have not been previously defined or may have been previously defined but not loaded into the slab cache 215.
  • In an embodiment, memory 205 of the spanning storage interface 200 also includes an anchor cache 245. Anchor cache 245 is similar to reverse map cache 240; however, anchor cache 245 matches the contents of data segments with representative data segments in data segment slab files stored on disk storage 210. A complete set of data segments are stored in one or more data segment slab files in disk storage 210. In an embodiment, one or more representative data segments from each data segment slab file are selected by the spanning storage interface 200. The spanning storage interface 200 determines hash or other data characterization values for these selected representative data segments and stores these values along with data identifying the file or disk storage location including this data segment in the anchor cache 245. In an embodiment, the data identifying the file or disk storage location of a representative data segment may be its associated label. The spanning storage interface 200 uses the anchor cache 245 to determine if a data segment from a data stream matches a data segment from another data stream previously stored in disk storage but not currently stored in the slab cache.
  • In an embodiment, potential representative data segments are identified during segmentation of a data stream. As discussed in detail below, when one or more potential representative data segments are later stored in disk storage 210, for example in a data segment slab file, an embodiment of the spanning storage interface 200 selects one or more of these potential representative data segments for inclusion in the anchor cache.
  • A variety of criteria and types of analysis may be used alone or together in various combinations to identify representative data segments in data streams and/or in data segment slab files stored in disk storage 210. For example, the spanning storage interface 200 selects the first unique data segment in a data stream as a representative data segment. In another example, the spanning storage interface 200 uses the content of the data stream to identify potential representative data segments. In still another example, the spanning storage interface 200 uses criteria based on metadata such as a file type, data type, or other attributes provided with a data stream to identify potential representative data segments. For example, data segments including specific sequences of data and/or located at specific locations within a data stream of a given type may be designated as representative data segments based on criteria or heuristics used by the spanning storage interface 200. In a further example, a random selection of unique segments in a data stream or a data segment slab file may be designated as representative data segments. In yet a further example, representative data segments may be selected at specific locations of data segment slab files, such as the middle data segment in a slab file.
  • Disk storage 210 stores a complete set of data segments and associated labels used to represent all of the data streams stored by spanning storage interface 200. In an embodiment, disk storage 210 may be comprised of multiple physical and/or logical storage devices. In a further embodiment, disk storage 210 may be implementing using a storage area network.
  • Disk storage 210 includes one or more data segment slab files 250. Each data segment slab file 250 includes a segment index 255 and a set of data segments 265. The segment index 255 specifies the location of each data segment within the data segment slab file. Data segment slab file 250 also includes segment metadata 260, similar to the segment metadata 230 discussed above. In an embodiment, segment metadata 260 in the data segment slab file 250 is a subset of the segment metadata in the slab cache 215 to improve compression performance. In this embodiment, the spanning storage interface 200 may recompute or recreate the remaining metadata attribute values for data segments upon transferring data segments into the slab cache 215.
  • Additionally, data segment slab file 250 may include data segment reference count values 270 for some or all of the data segments 265. In an embodiment, slab file 250 may include slab file metadata 275, such as a list of data segments to be deleted from the slab file 250.
  • Disk storage 210 includes one or more label map container files 280. Each label map container file 280 includes one or more label maps 290. Each of the label maps 290 corresponds with all or a portion of a deduplicated data stream stored by the spanning storage interface 200. Each of the label maps 290 includes a sequence of one or more labels corresponding with the sequence of data segments in all or a portion of a deduplicated data stream. In an embodiment, each label map also includes a label map table of contents providing the offset or relative position of sections of the label map sequence with respect to the original data stream. In one implementation, the label maps are compressed in sections, and the label map table of contents provides offsets or relative locations of sections of the label map sequence relative to the uncompressed data stream. The label map table of contents may be used to allow random or non-sequential access to a deduplicated data stream.
  • Additionally, label map container file 280 may include label map container index 285 that specifies the location of each label map within the label map container file.
  • In an embodiment, label names are used not only identify data segments, but also to locate data segments and their containing data segment slab files. For example, labels may be assigned to data segments during segmentation. Each label name may include a prefix portion and a suffix portion. The prefix portion of the label name may correspond with the file system path and/or file name of the data segment slab file used to store its associated segment. All of the data segments associated with the same label prefix may be stored in the same data segment slab file. The suffix portion of the label name may be used to specify the location of the data segment within its data segment slab file. The suffix portion of the label name may be used directly as an index or location value of its data segment or indirectly in conjunction with segment index data in the slab file. In this implementation, the complete label name associated with a data segment does not need to be stored in the slab file. Instead, the label name is represented implicitly by the storage location of the slab file and the data segment within the slab file. In a further embodiment, label names are assigned sequentially in one or more namespaces or sequences to facilitate this usage.
  • An embodiment similarly uses data stream identifiers to not only identify deduplicated data streams but to locate label maps and their containing label map containers. For example, a data stream identifier is assigned to a data stream during deduplication. Each data stream identifier name may include a prefix portion and a suffix portion. The prefix portion of the data stream identifier may correspond with the file system path and/or file name of the label map container used to store the label map representing the data stream. The suffix portion of the data stream identifier may be used to directly or indirectly specify the location of the label map within its label map container file. In a further embodiment, data stream identifiers are assigned sequentially in one or more namespaces or sequences to facilitate this usage.
  • Embodiments of the spanning storage interface 200 may specify the sizes, location, alignment, and optionally padding of data in data segment slab files 250 and label map container files 280 to optimize the performance of disk storage 210. For example, segment reference counts are frequently updated, so these may be located at the end of the data segment slab file 250 to improve update performance. In another example, data segments may be sized and aligned according to the sizes and boundaries of clusters or blocks in the disk storage 210 to improve access performance and reduce wasted storage space.
  • FIG. 3A illustrates a method 300 of converting a data stream into deduplicated data according to an embodiment of the invention. An embodiment of method 300 may be executed at least in part by a deduplication module including in a spanning storage interface. Step 305 receives all or a portion of a data stream. The data stream may be any type or format of data, including files and objects. In an embodiment, a deduplicating storage interface client provides the data stream to the spanning storage interface.
  • Step 310 uses a segmentation technique to generate one or more data segments from the data stream or portion thereof received by step 305.
  • Step 315 determines if any of the generated data segments are referenced by the anchor cache of the spanning storage interface. In an embodiment, step 315 compares a hash or other characterization of the contents of each of the data segments with entries of the anchor cache. If the hash of the data segment matches an entry of the anchor cache, then the data segment is referenced by the anchor cache. In a further embodiment, if the hash of a data segment matches an entry of the anchor cache, step 315 then compares the segment length and/or the contents of the data segment with the corresponding data segment stored in a slab file to verify that the data segment from the data stream and the previously generated instance of the data segment are identical.
  • In an embodiment, a copy of only a portion of the data segments used for data deduplication are stored locally. The full and authoritative set of data segments is stored in one or more slab files stored in the cloud storage. Because the cloud storage is accessed via a wide-area network, there are often substantial bandwidth and latency restrictions on accessing slab files from cloud storage. In an embodiment, if a data segment from the data stream matches an entry from the anchor cache, step 315 selects the slab file associated with this anchor cache entry for processing by method 355, as discussed below. In an embodiment, method 355 may retrieve one or more slab files selected by step 315 from the cloud storage in parallel and/or asynchronously with the execution of method 300.
  • Step 325 determines if any of the data segments generated in step 310 match a data segment referenced by the reverse map in memory. In an embodiment, step 325 is similar to step 315. Step 325 compares a hash or other characterization of the contents of the data segment with entries of the reverse map. In a further embodiment, if the hash of the data segment matches an entry of the reverse map (and/or previously matched an entry of the anchor cache), step 325 also compares the segment length and/or the contents of the data segment with the corresponding data segment stored in the slab cache to verify that the data segment from the data stream and the cached data segment are identical.
  • For each of the data segments from the data stream that match previously generated data segments in the slab cache, step 325 associates these data segments from the data stream with the labels assigned to their counterparts in the slab cache. Step 330 increments the reference counts for these labels based on the number of instances of their associated data segment in the data stream. For example, step 330 increments the reference count by one for each instance of the generated data segment in the data stream.
  • Conversely, if one or more the data segments from the data stream are not referenced by the reverse map, then step 335 assigns new labels to these newly generated data segments. These new labels assigned by step 335 are referred to as provisional labels. As discussed below in method 355, method 350 may replace provisional labels assigned by step 335 with previously generated labels corresponding with identical data segments in slab files retrieved from the cloud storage. Step 335 then adds the new data segments and their assigned provisional labels to the slab cache in memory. For each newly added data segment and provisional label, step 335 generates segment metadata adds it to the slab cache. Step 335 also initializes a reference count in the slab cache for each of the newly added data segments, setting each newly added provisional label's reference count to correspond with the number of currently known instances of the corresponding data segment in the data stream. For example, step 335 may initialize a reference count associated with a new provisional label and data segment to one, if the data segment occurs only once in the data stream or portion thereof received by step 305. In another example, step 335 may initialize the reference count associated with a new provisional label and data segment to a number greater than one of this data segment is used multiple times in the received portion of the data stream. Step 335 also adds the new provisional labels and hashes or other data characterizations of the new data segment to the reverse map in memory.
  • Following steps 330 or 335, the slab cache in memory has been updated with all of the data segments generated by step 310 from the received portion of the data stream, either by incrementing the reference counts of previously generated labels or adding new provisional labels and associated data segments to the slab cache. In a further embodiment, the updates to the slab cache in memory are stored in local disk storage for further processing and eventual copying to the cloud storage. In an embodiment, method 300 stores a copy of any new data segments and associated metadata in local disk storage in one or more new slab files. Additionally, any changes to previously-generated data segment metadata, such as updates in reference counts, may be stored in local storage as well.
  • Step 340 adds the sequence of labels associated with the data segments generated by step 310 to a label map. The sequence of labels may include both previously generated labels and/or provisional labels, depending upon the contents of the current data stream and any previously processed data streams. Step 340 adds labels to the label map in the same sequence as their corresponding data segments are found in the data stream.
  • Decision block 345 determines if all of the data in the data stream has been processed by steps 310 to 340. If all of the data in the data stream has not been processed, method 300 returns to step 305 to receive another portion of the data stream and to generate and process additional data segments.
  • If all of the data stream has been processed, method 300 proceeds to step 350. Step 350 adds the completed label map to a label map container file in the local disk storage. Step 350 assigns the data stream and its corresponding label map a data stream identifier. In an embodiment, the data stream identifier specifies the identity and/or the location of the label map container file in the disk storage. Step 350 may store the data stream identifier in the metadata of the corresponding file in the shell file system, such as in a reparse point in an NTFS file system or a extended attribute in an ext3 file system. Following step 350, the spanning storage interface 125 may delete the original data stream from memory or disk storage, as this data stream is now stored in deduplicated form by the spanning storage interface.
  • FIG. 3B illustrates a method 350 for transferring deduplicated data from a spanning storage interface to cloud storage. An embodiment of method 350 may be executed by a replication module operating in parallel and/or asynchronously with a deduplication module. As described above, an embodiment of the spanning storage interface includes a local copy of only a portion of the data segments used for data deduplication. The full and authoritative set of data segments is stored in one or more slab files stored in the cloud storage. Thus, this embodiment of the spanning storage interface should copy any newly added data segments or updated segment metadata to the cloud storage as soon as possible, so that the cloud storage includes a complete and authoritative set of the data segments, associated labels, and label metadata, such as reference counts.
  • In an embodiment, a complete set of slab files, including at least all of the data segments used to store a deduplicated version of the client's data, is stored in cloud storage. If step 315 in method 300 matches a data segment to an entry of the anchor cache, then the data of this segment has been previously associated with a label. To optimize the data deduplication, this previously associated label should be associated with the new data segment. Additionally, because the anchor cache only includes a representative sample of data segments in the slab file, it is likely that other data segments in the slab file associated with the matching anchor cache entry may also match other recently received data segments. Thus, step 355 retrieves one or more slab files previously selected for retrieval by step 315 in method 300.
  • In an embodiment, step 355 retrieves one or more previously selected slab files from cloud storage via the wide-area network. In an embodiment, step 355 uses the label name of the matching anchor cache entry to identify and optionally locate the data segment slab file including the previously generated instance of the data segment. In a further embodiment, copies of some of the slab files may be stored locally. In this embodiment, step 355 determines if any of the selected slab files have local copies. Step 355 then retrieves any selected slab files that do not have copies stored locally from the cloud storage.
  • Step 360 processes the selected and retrieved slab files. In an embodiment, step 360 retrieves all of the data segments included in this data segment slab file from disk storage and adds them to the slab cache in memory. Step 360 also retrieves and/or regenerates the labels and segment metadata for these data segments and adds these to the slab cache. Step 360 retrieves the segment reference counts for these data segments from the data segment slab file and adds these to the slab cache in memory. Step 360 also updates the reverse map cache with the labels and hashes or other data characterizations of the retrieved data segments.
  • In method 300, data segments that do not match reverse map cache entries are assigned provisional labels. Data segments assigned provisional labels may include data segments matching an anchor cache entry as well as data segments that do not match any anchor cache entries. Step 365 identifies the provisional labels, if any, in one or more newly created label maps and/or label map container files.
  • Step 370 compares the data segments associated with the provisional labels with the updated reverse map cache. Step 370 ignores the reverse map cache entries associated with provisional labels in this comparison; instead, step 370 determines if any provisionally labeled data segments are identical to previously generated data segments. In an embodiment, step 370 compares a hash or other characterization of the contents of these provisionally labeled data segments with the non-provisional entries of the reverse map cache, which are cache entries that are not associated with provisional labels. In a further embodiment, if the hash of the data segment matches an entry of the reverse map, step 370 also compares the segment lengths and/or the contents of these provisionally labeled data segments with the corresponding non-provisional data segments stored in the slab cache to verify that the data segment from the data stream and the cached data segment are identical.
  • For data segments that do not match cached data segments in the slab cache, an embodiment of step 375 may change their associated labels to non-provisional status. An embodiment of step 375 may update the label map, label map container file, slab file, slab cache and/or reverse map cache with this change in status.
  • For data segments that do not match cached data segments in the slab cache, an embodiment of step 380 replaces the associated provisional labels in label maps with the matching non-provisional labels. As a result of step 380, a provisional label referencing a recently created data segment is replaced with a non-provisional label referencing a previously generated segment. However, no data is lost by step 380, because the contents of the provisional data segment are identical to the previously generated non-provisional data segment, as determined by step 375.
  • Step 385 removes data segments and discards data segments associated with provisional labels that match previously generated non-provisional labels. In an embodiment, step 385 removes these provisional data segments from a slab file stored locally by a spanning storage interface. In a further embodiment, step 385 removes the provisional data segment and its associated provisional label from the slab cache and reverse map, respectively. These provisional labels and data segments may be removed because they are duplicative of previously generated data segments and labels. In an embodiment, step 385 updates the previously generated non-provisional label and data segment metadata. For example, if a provisional label is associated with a reference count, which indicates how many times this provisional label is used in one or more label maps; then step 385 may add this reference count to the reference count of the matching previously-generated non-provisional label. As a result, the reference count of this non-provisional label will be equal to the number of total number instances of this segment data, regardless of whether these instances were previously associated with the provisional label or the non-provisional label.
  • Step 390 identifies changes in the locally stored label map container files and slab files in comparison with their counterparts (if any) stored in the could storage. The changes identified by step 390 may include new label map container files and new slab files, as well as modified versions of label map container files and slab files previously stored in cloud storage. Step 395 transfers the new and changed label map container files and slab files to the cloud storage. In an embodiment, step 395 only communicates the changed or new data to the cloud storage.
  • Following step 395, the cloud storage includes a complete and authoritative version of the label maps and data segments. Thus, the slab files and label map container files stored in the cloud storage may be used to reconstruct any or all of the data previously stored by the clients via the spanning storage interface. In a further embodiment, step 395 may use atomic operations to update or add label map container and slab files in the cloud storage. In this embodiment, new and changed data is first uploaded to the cloud storage and then committed. If the transfer of data is interrupted before the commitment, for example due to a system or network failure, the previous versions of the label map container and slab files stored in the cloud storage will not be corrupted and may be used to restore client data at the same or a different location. This allows the spanning storage interface to use cloud storage as a deduplicated disaster data recovery facility.
  • Following step 395, the spanning storage interface may delete some or all of the local copies of slab files and label map container files. In a further embodiment, the spanning storage interface may maintain local copies of some or all of the slab files and label map container files for the purpose of caching. The local caching may use the local storage associated with the spanning storage interface. The spanning storage interface may cache data in its deduplicated format to reduce local storage requirements or increase the effective cache size. In this embodiment, the spanning storage interface may use a variety of criteria for selecting portions of the deduplicated client data for caching. For example, if the spanning storage interface is used for general file storage or as a cloud storage interface, the spanning storage interface may select a specific amount or percentage of the client data for local caching. In another example, the data selected for local caching may be based on usage patterns of client data, such as frequently or recently used data. Caching criteria may be based on elapsed time and/or the type of data. In another example, the spanning storage interface may maintain locally cached copies of the most recent data backups from clients, such as the most recent full backup and the previous week's incremental backups.
  • FIG. 4 illustrates a method 400 of retrieving an original data stream from deduplicated data according to an embodiment of the invention. In an embodiment, step 405 receives a data access request from a client.
  • Step 410 identifies a label map associated with the requested data. For example, if the data access request is for a file in the shell file system, an embodiment of step 410 retrieves a data stream identifier from the metadata of this shell file. Step 410 then retrieves the label map associated with the data stream identifier from memory, disk storage, or cloud storage. The label map includes a sequence of labels corresponding with a sequence of data segments representing the data stream. In an embodiment, the data stream identifier specifies the identity and/or the location of the label map container file in the disk or cloud storage. For example, a prefix portion of the data stream identifier may correspond with the file system path and/or file name or cloud data identifier of the label map container file used to store the label map representing the data stream. A suffix portion of the data stream identifier may be used to directly or indirectly specify the location of the label map within its label map container file.
  • Upon retrieving the label map associated with the data stream identifier, step 415 selects the next label in sequence in the label map. In an embodiment, method 400 may receive the data stream identifier with a request for the entire data stream. In this embodiment, the first iteration of step 415 selects the first label in the label map.
  • In another embodiment, method 400 may receive a data stream identifier with a request for only a portion of the data stream. In this embodiment, step 415 selects the first label corresponding with the beginning of the requested portion of the data stream. In an embodiment, each label map includes a label map table of contents providing the offset or relative position of each instance of a label with respect to the original data stream. The label map table of contents may be used to allow random or non-sequential access to a deduplicated data stream. In an embodiment, the requested portion of the data stream is specified with a starting data stream address or offset and/or an ending data stream offset or address. Step 415 uses this label map table of contents to identify the label corresponding with the starting data stream address or offset.
  • Decision block 420 determines if the data segment corresponding with the selected label is already stored in the slab cache in memory. In an embodiment, decision block 420 searches for the selected label in the slab cache to make this determination. If the data segment corresponding with the selected label is already stored in the slab cache in memory, then method 400 proceeds to step 430.
  • Conversely, if the data segment corresponding with the selected label is not stored in the slab cache in memory, step 425 accesses a slab data file including a previously generated instance of the data segment corresponding with the selected label. In an embodiment, step 425 uses the label name to identify and optionally locate the data segment slab file including the previously generated instance of the data segment. Step 425 may retrieve the slab file from cloud storage. In a further embodiment, step 425 first checks to see if the required slab file is cached locally by the spanning storage interface; if so, then step 425 retrieves the data segment from the local copy of the slab file, rather than from the cloud storage.
  • Step 425 retrieves at least the data segment corresponding with the selected label from its data segment slab file and adds it to the slab cache in memory. In an embodiment, step 425 retrieves all of the data segments included in this data segment slab file from local storage or cloud storage and adds them to the slab cache in memory. Step 425 also retrieves and/or generates the labels and segment metadata for the retrieved data segments and adds these to the slab cache. Step 425 retrieves the segment reference counts for these data segments from the data segment slab file and adds these to the slab cache in memory. Step 425 also updates the reverse map cache with the labels and hashes or other data characterizations of the retrieved data segments.
  • Step 430 retrieves the data segment corresponding with the selected label from the slab cache. Step 435 adds all or a portion of this data segment to a data stream buffer or other data structure used to reconstruct the requested data stream. In an embodiment, steps 430 and 435 decompress the contents of the data segment prior to adding it to the data stream buffer. In another embodiment, data segments are decompressed upon being initially added to the slab cache. In still another embodiment, one or more data segments are decompressed after being added to the data stream buffer.
  • In an embodiment, method 400 may receive a request for only a portion of the data stream. In this embodiment, step 435 may need to remove the beginning of a data segment if the data segment is the first data segment in the requested portion of the data stream, such that the beginning of the data stream buffer matches the beginning of the requested portion of the data stream. Similarly, step 435 may need to remove the end of a data segment if the data segment is the last data segment in the requested portion of the data stream, such that the end of the data stream buffer matches the end of the requested portion of the data stream.
  • Decision block 440 determines if all of the labels corresponding with the requested data in the data stream have been processed by steps 410 to 435. If all of the labels corresponding with the requested data in the data stream have not been processed, method 400 returns to step 415 to process additional labels from the label map associated with the data stream.
  • Once all of the labels associated with the requested portion of the data stream have been processed, method 400 proceeds to step 445. Step 445 returns the data stream to the deduplicating storage interface client or other entity providing the data stream. Embodiments of method 400 may output the data stream in its entirety in step 445 or output portions of the requested portion of the data stream in step 445 in parallel with performing the other steps of method 400 to reconstruct other portions of the requested portion of the data stream. For example, step 425 may be performed asynchronously with other steps of method 400 so that slab files may be retrieved from the cloud storage in the background while the spanning storage interface processes other labels in the label map.
  • FIG. 5 illustrates a method 500 of deleting a data stream from a spanning storage interface according to an embodiment of the invention. In an embodiment, step 505 receives a data stream identifier from a deduplicating storage interface client.
  • Step 510 retrieves the label map associated with the data stream identifier from memory or disk storage. The label map includes a sequence of labels corresponding with a sequence of data segments representing the data stream. In an embodiment, the data stream identifier specifies the identity and/or the location of the label map container file in the disk storage. For example, a prefix portion of the data stream identifier may correspond with the file system path and/or file name of the label map container used to store the label map representing the data stream. A suffix portion of the data stream identifier may be used to directly or indirectly specify the location of the label map within its label map container file.
  • Upon retrieving the label map associated with the data stream identifier, step 515 selects the next label in sequence in the label map. In an embodiment, the first iteration of step 515 selects the first label in the label map.
  • Decision block 520 determines if the data segment corresponding with the selected label is already stored in the slab cache in memory. In an embodiment, decision block 520 searches for the selected label in the slab cache to make this determination. If the data segment corresponding with the selected label is already stored in the slab cache in memory, then method 500 proceeds to step 530.
  • Conversely, if the data segment corresponding with the selected label is not stored in the slab cache in memory, step 525 accesses a slab data file including a previously generated instance of the data segment corresponding with the selected label. In an embodiment, step 525 uses the label name to identify and optionally locate the data segment slab file including the previously generated instance of the data segment.
  • Step 525 retrieves at least the data segment corresponding with the selected label from its data segment slab file and adds it to the slab cache in memory. In an embodiment, step 525 retrieves all of the data segments included in this data segment slab file from disk storage or cloud storage and adds them to the slab cache in memory. Step 525 also retrieves and/or generates the labels and segment metadata for the retrieved data segments and adds these to the slab cache. Step 525 retrieves the segment reference counts for these data segments from the data segment slab file and adds these to the slab cache in memory. Step 525 also updates the reverse map cache with the labels and hashes or other data characterizations of the retrieved data segments.
  • Step 530 decrements the reference count in the slab cache associated with the selected label. In an embodiment, if the reference count of a label is decremented to zero, then the label and its data segment are marked for deletion from the slab cache and its data segment slab file.
  • Decision block 535 determines if all of the labels in the label map have been processed by steps 510 to 530. If all of the labels corresponding with the requested data in the data stream have not been processed, method 500 returns to step 515 to process additional labels from the label map associated with the data stream.
  • Once all of the labels associated with the label map have been processed, method 500 proceeds to step 540. Step 540 updates the data segment slab files including any data segments affected by the deletion operation. In an embodiment, step 540 writes the updated and decremented reference counts for data segments associated with the label map back to their respective data segment slab files. In an embodiment, if the reference count of a data segment has been decremented to zero, an embodiment of step 540 marks this data segment for deletion from the data segment slab file. In a further embodiment, a garbage collection process removes unneeded data segments and associated reference counts and segment metadata from data segment slab files. An embodiment of step 540 transfers the updated slab files to the cloud storage.
  • Step 545 updates the label map container file to remove the label map associated with the data stream identifier. In an embodiment, if the disk storage supports sparse files, the label map may be deleted directly without rewriting the label map container file. In another embodiment, if sparse files are not supported by the disk storage, then unneeded label maps are marked for deletion. A garbage collection process, similar to that used by embodiments of step 540, may be used to remove unnecessary label maps by rewriting label map container files when the number or proportion of label maps marked for deletion exceeds a threshold. An embodiment of step 545 transfers the updated label map container files to the cloud storage.
  • In an embodiment, steps 525, 540, and 545 may perform transfers to and from the cloud storage via the wide-area network in parallel and/or asynchronously with other steps of method 500. Similarly to step 390 above, steps 540 and 545 may identify changes in the locally stored label map container files and slab files in comparison with their counterparts (if any) stored in the could storage. Steps 540 and 545 transfer the changed label map container files and slab files to the cloud storage. In an embodiment, steps 540 and 545 only communicates the changed or new data to the cloud storage.
  • Embodiments of method 500 may return a deletion confirmation to the deduplicating storage interface client or other entity. In one embodiment, the deletion confirmation is provided following the successful retrieval of the label map corresponding with the data stream identifier in step 510. The remainder of method 500 may be performed as a background or low priority process by the deduplication and/or replication modules without impacting the performance of the client. In another embodiment, the deletion confirmation is returned to the client following the completion of method 500.
  • A further embodiment of method 500 may allow for deletion of a specified portion of data from a data stream. In this embodiment, for data segments that are partially contained within the specified portion of the data stream, the data from these data segments is retrieved and truncated so that only data outside of the specified portion of the data stream remains. This modified data is then re-encoded as one or more revised data segments and corresponding labels, which may be new to the spanning storage interface or may match previously created data segments, as described above. The labels representing data segments contained wholly or partially within the specified portion of a data stream are removed from the label map. The reference counts of these data segments are updated accordingly. The label map is rewritten to remove unused labels and to add labels for revised data segments.
  • In an embodiment, one or more garbage collection processes removes unneeded data segments, labels, and metadata from caches and files. Embodiments of the garbage collection process or processes may be performed independently of the above methods, for example as a background or low-priority processes. Alternatively, some or all of the garbage collection processes may be performed as part of the above methods in creating or updating the slab and/or label map container files on disk storage and/or the slab cache and anchor caches in memory.
  • For example, a garbage collection process may remove unneeded data segments and associated reference counts and segment metadata from the data segment slab files. In an embodiment, the garbage collection process determines if the number or proportion of data segments marked for deletion in a data segment slab file exceeds a threshold. If this threshold is exceeded, then the entire data segment slab file is rewritten, with the data segments marked for deletion omitted from the rewritten data segment slab file.
  • In another example, a garbage collection process removes labels from the anchor cache after the corresponding data segments have been loaded into the slab cache. In an embodiment, a garbage collection process uses label metadata attributes to identify labels in the slab cache corresponding with representative data segments and then compares these identified labels with the labels in the anchor cache. If a label in the anchor cache matches a label in the slab cache, the garbage collection process removes this label from the anchor cache, as this data segment is now loaded into memory in the slab cache.
  • In many applications, some data segments may be used more frequently than other data segments. Typical frequently-used data segments can include data corresponding to repeating data patterns, such as data segments consisting entirely of null values or other data or file-format specific motifs.
  • To improve performance, an embodiment of the deduplicating data storage system stores frequently-used data segments separately from less-used data segments. In an embodiment, the deduplicating data storage system monitors the reference counts associated with data segments. When the reference count of a data segment is increased above a threshold value, that data segment is designated as a frequently-used data segment. An embodiment moves or copies this data segment to separate slab file reserved for frequently-used data segments. The frequently-used data segment is relabeled as it is transferred to the frequently-used data segment slab file.
  • In an embodiment, the frequently-used data segment slab file is similar to other data segment slab files, such as data segment slab file 250 discussed above. In still a further embodiment, data segment reference counts are not maintained or updated for frequently-used data segments; accordingly, data segment reference counts may be omitted from the frequently-used data segment slab file.
  • Embodiments of the invention may store frequently-used data segments in memory for improved performance using a variety of different techniques. In a first embodiment, all of the frequently-used data segments and their associated labels and metadata from one or more frequently-used data segment slab files may be loaded into the slab cache or a separate frequently-used data segment cache during the initialization of the deduplication data storage system. In another embodiment, hashes or other data characterizations of all of the frequently-used data segments and their associated labels from one or more frequently-used data segment slab files are initially loaded into the anchor cache or a separate, similar cache. In this embodiment, the data associated with a frequently-used data segment is loaded into the slab cache as needed, in a similar manner as with other data segments as described above.
  • In an embodiment, frequently-used data segments stored in the slab cache are accessed for deduplicating additional data streams and retrieving deduplicated data in a similar manner as other data segments, as described above. However, in an embodiment, data segment reference counts are not maintained or updated in memory for frequently-used data segments. Therefore, an embodiment of the deduplicating data storage system does not increment an associated data segment reference count when a frequently-used data segment is used to deduplicate an additional data stream and does not decrement an associated data segment reference count when a data stream including a frequently-used data segment is deleted.
  • Embodiments of the deduplicating data storage system may be used in a variety of data storage applications to store files, objects, databases, or any other type or arrangement of data in a deduplicated form.
  • FIG. 6 illustrates a computer system suitable for implementing embodiments of the invention. FIG. 6 is a block diagram of a computer system 2000, such as a personal computer or other digital device, suitable for practicing an embodiment of the invention. Embodiments of computer system 2000 may include dedicated networking devices, such as wireless access points, network switches, hubs, routers, hardware firewalls, WAN and LAN network traffic optimizers and accelerators, network attached storage devices, storage array network interfaces, and combinations thereof.
  • Computer system 2000 includes a central processing unit (CPU) 2005 for running software applications and optionally an operating system. CPU 2005 may be comprised of one or more processing cores. Memory 2010 stores applications and data for use by the CPU 2005. Examples of memory 2010 include dynamic and static random access memory. Storage 2015 provides non-volatile storage for applications and data and may include fixed or removable hard disk drives, flash memory devices, ROM memory, and CD-ROM, DVD-ROM, Blu-ray, HD-DVD, or other magnetic, optical, or solid state storage devices.
  • In a further embodiment, CPU 2005 may execute virtual machine software applications to create one or more virtual processors capable of executing additional software applications and optional additional operating systems. Virtual machine applications can include interpreters, recompilers, and just-in-time compilers to assist in executing software applications within virtual machines. Additionally, one or more CPUs 2005 or associated processing cores can include virtualization specific hardware, such as additional register sets, memory address manipulation hardware, additional virtualization-specific processor instructions, and virtual machine state maintenance and migration hardware.
  • Optional user input devices 2020 communicate user inputs from one or more users to the computer system 2000, examples of which may include keyboards, mice, joysticks, digitizer tablets, touch pads, touch screens, still or video cameras, and/or microphones. In an embodiment, user input devices may be omitted and computer system 2000 may present a user interface to a user over a network, for example using a web page or network management protocol and network management software applications.
  • Computer system 2000 includes one or more network interfaces 2025 that allow computer system 2000 to communicate with other computer systems via an electronic communications network, and may include wired or wireless communication over local area networks and wide area networks such as the Internet. Computer system 2000 may support a variety of networking protocols at one or more levels of abstraction. For example, computer system may support networking protocols at one or more layers of the seven layer OSI network model. An embodiment of network interface 2025 includes one or more wireless network interfaces adapted to communicate with wireless clients and with other wireless networking devices using radio waves, for example using the 802.11 family of protocols, such as 802.11a, 802.11b, 802.11g, and 802.11n.
  • An embodiment of the computer system 2000 may also include one or more wired networking interfaces, such as one or more Ethernet connections to communicate with other networking devices via local or wide-area networks.
  • The components of computer system 2000, including CPU 2005, memory 2010, data storage 2015, user input devices 2020, and network interface 2025 are connected via one or more data buses 2060. Additionally, some or all of the components of computer system 2000, including CPU 2005, memory 2010, data storage 2015, user input devices 2020, and network interface 2025 may be integrated together into one or more integrated circuits or integrated circuit packages. Furthermore, some or all of the components of computer system 2000 may be implemented as application specific integrated circuits (ASICS) and/or programmable logic.
  • FIG. 7 illustrates an example disaster recovery application 700 of a spanning storage interface according to an embodiment of the invention. Disaster recovery application 700 may be used to provide redundant data access to storage clients in the event that the storage clients and/or cloud spanning storage interface at a first network location are disabled, destroyed, or otherwise inaccessible or inoperable.
  • In example disaster recovery application 700, a first network location A 705 includes a first spanning storage interface 710. Spanning storage interface 710 provides storage access to one or more storage clients, such as storage client 720A and backup server 720B, via a local area network and/or a storage area network. Spanning storage interface 710 deduplicates data received from storage clients and transfers the deduplicated data via the wide area network 780 to one or more cloud storage services, such as cloud storage services 770 and 775, for storage. The spanning storage interface 710 may also retrieve deduplicated data via the wide area network 780 from one or more cloud storage services and reconstruct this data in its original form to provide to storage clients.
  • As discussed above, the spanning storage interface 710 includes local storage 715 to improve data access performance. Local storage 715 includes a local cache A 725 of a portion of the storage data provided by storage clients at network location A 705.
  • To provide disaster recovery, example application 700 includes a second network location B 735. Network location B 735 includes a second spanning storage interface 740. Spanning storage interface 740 is provided for disaster recovery operations and may be used to access the data associated with the first network location A 705 in the event that network location A 705 is disabled, destroyed, or otherwise inaccessible or inoperable.
  • To provide disaster recovery data access, the second spanning storage interface 740 can access deduplicated data stored in one or more of the cloud storage services 770 and/or 775 via wide-area network 780. The second spanning storage interface 740 reconstructs the original data from the retrieved deduplicated data and provides it to storage clients.
  • The second spanning storage interface 740 includes local storage B 745 for improving data access performance. In an embodiment, a copy 760 of some or all or the local cache A 725 used by the first spanning storage interface 710 is transferred to the local storage B 745 while the first network location 705 is operating. In the event of a disaster affecting the first network location 705, the second spanning storage interface 740 can provide data access to the first network location's data with the improved performance benefit provided by the copy of local cache A 760 in its local storage B 745.
  • Network location B 735 may be a dedicated disaster recovery network location. Alternatively, network location B may also optionally be used with one or more local storage clients, such as storage clients 750A and backup server 750B. In this further example, the second spanning storage interface B 740 performs data deduplication and facilitates cloud storage for data from storage clients 750. Like the first spanning storage interface 710, the second spanning storage interface B 740 in this example deduplicates second data received from storage clients at network location B 735 and transfers this second deduplicated data via the wide area network 780 to one or more cloud storage services, such as cloud storage services 770 and 775, for storage. The second spanning storage interface 740 may also retrieve second deduplicated data via the wide area network 780 from one or more cloud storage services and reconstruct this second data in its original form to provide to storage clients at the second network location B 735. To improve the performance of the second spanning storage interface 740, its local storage B 745 may include a local cache B 765, which includes a portion of the storage data provided by storage clients at network location B 735.
  • In yet a further embodiment, spanning storage interfaces 710 and 740 can operate in a paired disaster recovery configuration. For example, the second spanning storage interface 740 at network location B 735 may act as disaster recovery for the first spanning storage interface 710 at the first network location A 705. As described above, the local storage B 745 at the second network location B 735 may include a copy 760 of the local cache A 725 used by the first spanning storage interface 710. The copy 760 of local cache A in local storage B 745 improves the initial performance of the second spanning storage interface 740 in the event that it is required to substitute for the first spanning storage interface 710.
  • Similarly, in the paired disaster recovery configuration, first spanning storage interface 710 may act as disaster recovery for the second spanning storage interface 740. In the event that the second spanning storage interface 740 is destroyed, disabled, or otherwise available to its storage clients, the first spanning storage interface 710 may provide access to storage data associated with the network location 735. Additionally, the local storage A 715 includes a copy 730 of the local cache B 765 used by the second spanning storage interface 740. The copy 730 of the local cache B 765 is transferred to the local storage A 715 while the second spanning storage interface 740 is operating. The copied version of local cache B 730 in local storage A 715 improves the initial performance of the first spanning storage interface 710 in the event that it is required to substitute for the second spanning storage interface 740.
  • In an further embodiment, the paired disaster recovery configuration can be extended to include additional network locations, with local storage at each network location including a copy of at least one (and possibly more than one) local cache from other spanning storage interfaces.
  • In an embodiment, copies of local caches of spanning storage interfaces may be transferred directly between network locations. For example, spanning storage interfaces at different network locations may communicate with each other to transfer and update copies of their local caches at other network locations. In another embodiment, a spanning storage interface can retrieve a portion of the deduplicated data from a cloud storage service to recreate a copy of a local cache of another spanning storage interface.
  • Further embodiments can be envisioned to one of ordinary skill in the art. In other embodiments, combinations or sub-combinations of the above disclosed invention can be advantageously made. The block diagrams of the architecture and flow charts are grouped for ease of understanding. However it should be understood that combinations of blocks, additions of new blocks, re-arrangement of blocks, and the like are contemplated in alternative embodiments of the present invention.
  • The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that various modifications and changes may be made thereunto without departing from the broader spirit and scope of the invention as set forth in the claims.

Claims (21)

  1. 1. A disaster recovery system comprising:
    a first spanning storage interface at a first network location, wherein the first spanning storage interface is adapted to receive first data from at least a first storage client at the first network location and to transfer a deduplicated version of the first data to a cloud storage service via a wide-area network;
    a first local data storage at the first network location, wherein the first local data storage includes a copy of a portion of the first data;
    a second spanning storage interface at a second network location, wherein the second spanning storage interface is adapted to provide access to the first data if the first spanning storage interface is unavailable; and
    a second local data storage at the second network location, wherein the second local data storage includes a second copy of the portion of the first data.
  2. 2. The disaster recovery system of claim 1, wherein the copy of the portion of the first data is stored in deduplicated form.
  3. 3. The disaster recovery system of claim 1, wherein the first spanning storage interface is adapted to transfer the copy of the portion of the first data to the second local data storage while it is available.
  4. 4. The disaster recovery system of claim 1, wherein the second network location is connected with the first network location via a wide-area network.
  5. 5. The disaster recovery system of claim 1, wherein the second spanning storage interface is adapted to access the deduplicated version of the first data from the cloud storage service via the wide-area network.
  6. 6. The disaster recovery system of claim 1, wherein the second spanning storage interface is adapted to update the deduplicated version of the first data in the cloud storage service if the first spanning storage interface is unavailable.
  7. 7. The disaster recovery system of claim 1, wherein the second spanning storage interface is adapted to receive second data from at least a second storage client at the second network location and to transfer a deduplicated version of the second data to a cloud storage service via the wide-area network.
  8. 8. The disaster recovery system of claim 7, wherein the second local data storage includes a copy of a portion of the second data.
  9. 9. The disaster recovery system of claim 8, wherein the copy of the portion of the second data is stored in deduplicated form.
  10. 10. The disaster recovery system of claim 7, wherein the first spanning storage interface is adapted to provide access to the second data if the second spanning storage interface is unavailable.
  11. 11. The disaster recovery system of claim 10, wherein the first local data storage includes a second copy of the portion of the second data.
  12. 12. The disaster recovery system of claim 11, wherein the second spanning storage interface is adapted to transfer the copy of the portion of the second data to the first local data storage while it is available.
  13. 13. The disaster recovery system of claim 2, wherein the copy of the portion of the first data includes data segments and labels.
  14. 14. The disaster recovery system of claim 13, wherein the copy of the portion of the first data includes segment reference counts.
  15. 15. A method of improving performance of disaster recovery systems, the method comprising:
    receiving, with a first spanning storage interface, first data from at least a first storage client at the first network location;
    transferring a deduplicated version of the first data to a cloud storage service via a wide-area network;
    storing a portion of the first data in a first local data storage at the first network location; and
    transferring a copy of the portion of the first data to a second local data storage at a second network location, wherein the second network location includes a second spanning storage interface adapted to provide access to the first data if the first spanning storage interface is unavailable.
  16. 16. The method of claim 15, comprising:
    receiving, with the second spanning storage interface, second data from at least a second storage client at the second network location;
    transferring a deduplicated version of the second data to the cloud storage service via the wide-area network;
    storing a portion of the second data in the second local data storage at the second network location; and
    transferring a copy of the portion of the second data to the first local data storage at a first network location.
  17. 17. The method of claim 16, wherein the first spanning storage interface is adapted to provide access to the second data if the second spanning storage interface is unavailable.
  18. 18. The method of claim 15, wherein the copy of the portion of the first data is stored in deduplicated form.
  19. 19. The method of claim 18, wherein the copy of the portion of the first data includes data segments and labels.
  20. 20. The method of claim 19, wherein the copy of the portion of the first data includes segment reference counts.
  21. 21. The method of claim 15, wherein the first and second network locations are connected via a wide-area network.
US12942988 2009-12-28 2010-11-09 Disaster recovery using local and cloud spanning deduplicated storage system Abandoned US20110161723A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US29033409 true 2009-12-28 2009-12-28
US31539210 true 2010-03-18 2010-03-18
US12942988 US20110161723A1 (en) 2009-12-28 2010-11-09 Disaster recovery using local and cloud spanning deduplicated storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12942988 US20110161723A1 (en) 2009-12-28 2010-11-09 Disaster recovery using local and cloud spanning deduplicated storage system

Publications (1)

Publication Number Publication Date
US20110161723A1 true true US20110161723A1 (en) 2011-06-30

Family

ID=44188686

Family Applications (4)

Application Number Title Priority Date Filing Date
US12895835 Active 2031-10-12 US9501365B2 (en) 2009-12-28 2010-09-30 Cloud-based disaster recovery of backup data and metadata
US12895811 Active 2031-01-23 US8694469B2 (en) 2009-12-28 2010-09-30 Cloud synthetic backups
US12942988 Abandoned US20110161723A1 (en) 2009-12-28 2010-11-09 Disaster recovery using local and cloud spanning deduplicated storage system
US12942991 Abandoned US20110161291A1 (en) 2009-12-28 2010-11-09 Wan-optimized local and cloud spanning deduplicated storage system

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US12895835 Active 2031-10-12 US9501365B2 (en) 2009-12-28 2010-09-30 Cloud-based disaster recovery of backup data and metadata
US12895811 Active 2031-01-23 US8694469B2 (en) 2009-12-28 2010-09-30 Cloud synthetic backups

Family Applications After (1)

Application Number Title Priority Date Filing Date
US12942991 Abandoned US20110161291A1 (en) 2009-12-28 2010-11-09 Wan-optimized local and cloud spanning deduplicated storage system

Country Status (2)

Country Link
US (4) US9501365B2 (en)
WO (1) WO2011082123A1 (en)

Cited By (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110173405A1 (en) * 2010-01-13 2011-07-14 International Business Machines Corporation System and method for reducing latency time with cloud services
WO2012023050A2 (en) 2010-08-20 2012-02-23 Overtis Group Limited Secure cloud computing system and method
US20120054325A1 (en) * 2010-08-31 2012-03-01 Backa Bruce R System and Method for In-Place Data Migration
US20120094637A1 (en) * 2010-10-15 2012-04-19 Microsoft Corporation Mobile Messaging Message Notifications Processing
US20120150954A1 (en) * 2010-12-09 2012-06-14 Quantum Corporation Adaptive collaborative de-duplication
US20120150949A1 (en) * 2010-12-14 2012-06-14 Commvault Systems, Inc. Client-side repository in a networked deduplicated storage system
US20120150826A1 (en) * 2010-12-14 2012-06-14 Commvault Systems, Inc. Distributed deduplicated storage system
US20120173656A1 (en) * 2010-12-29 2012-07-05 Sorenson Iii James Christopher Reduced Bandwidth Data Uploading in Data Systems
US20120257820A1 (en) * 2011-04-07 2012-10-11 Microsoft Corporation Image analysis tools
US20120284555A1 (en) * 2011-05-02 2012-11-08 International Business Machines Corporation Optimizing disaster recovery systems during takeover operations
US20130006943A1 (en) * 2011-06-30 2013-01-03 International Business Machines Corporation Hybrid data backup in a networked computing environment
US20130151884A1 (en) * 2011-12-09 2013-06-13 Promise Technology, Inc. Cloud data storage system
US8484505B1 (en) 2010-09-30 2013-07-09 Emc Corporation Self recovery
US8504870B2 (en) 2010-09-30 2013-08-06 Emc Corporation Optimized recovery
US20130238574A1 (en) * 2010-10-11 2013-09-12 Estsoft Corp. Cloud system and file compression and transmission method in a cloud system
US8549350B1 (en) 2010-09-30 2013-10-01 Emc Corporation Multi-tier recovery
US8572340B2 (en) 2010-09-30 2013-10-29 Commvault Systems, Inc. Systems and methods for retaining and using data block signatures in data protection operations
US8577851B2 (en) 2010-09-30 2013-11-05 Commvault Systems, Inc. Content aligned block-based deduplication
US20130297884A1 (en) * 2012-05-07 2013-11-07 International Business Machines Corporation Enhancing data processing performance by cache management of fingerprint index
US20130339310A1 (en) * 2012-06-13 2013-12-19 Commvault Systems, Inc. Restore using a client side signature repository in a networked storage system
US8713364B1 (en) * 2010-09-30 2014-04-29 Emc Corporation Unified recovery
US20140164354A1 (en) * 2012-12-07 2014-06-12 International Business Machines Corporation Optimized query ordering for file path indexing in a content repository
US20140359420A1 (en) * 2013-06-04 2014-12-04 Beijing Founder Electronics Co., Ltd. Disaster Recovery Method and Apparatus Used in Document Editing and Storage Medium
US8930306B1 (en) 2009-07-08 2015-01-06 Commvault Systems, Inc. Synchronized data deduplication
US8943023B2 (en) 2010-12-29 2015-01-27 Amazon Technologies, Inc. Receiver-side data deduplication in data systems
US8943356B1 (en) 2010-09-30 2015-01-27 Emc Corporation Post backup catalogs
US8949661B1 (en) 2010-09-30 2015-02-03 Emc Corporation Federation of indices
US20150052322A1 (en) * 2013-08-16 2015-02-19 Red Hat Israel, Ltd. Systems and methods for memory deduplication by origin host in virtual machine live migration
US20150052323A1 (en) * 2013-08-16 2015-02-19 Red Hat Israel, Ltd. Systems and methods for memory deduplication by destination host in virtual machine live migration
US20150134861A1 (en) * 2013-11-14 2015-05-14 Humax Co., Ltd. Personal cloud storage chain service system and method
US9128948B1 (en) * 2010-09-15 2015-09-08 Symantec Corporation Integration of deduplicating backup server with cloud storage
US9170892B2 (en) 2010-04-19 2015-10-27 Microsoft Technology Licensing, Llc Server failure recovery
US9195549B1 (en) 2010-09-30 2015-11-24 Emc Corporation Unified recovery
US9298723B1 (en) 2012-09-19 2016-03-29 Amazon Technologies, Inc. Deduplication architecture
US9317377B1 (en) * 2011-03-23 2016-04-19 Riverbed Technology, Inc. Single-ended deduplication using cloud storage protocol
US9361328B1 (en) * 2013-01-28 2016-06-07 Veritas Us Ip Holdings Llc Selection of files for archival or deduplication
US9372726B2 (en) 2013-01-09 2016-06-21 The Research Foundation For The State University Of New York Gang migration of virtual machines using cluster-wide deduplication
US9390052B1 (en) * 2012-12-19 2016-07-12 Amazon Technologies, Inc. Distributed caching system
US20160219123A1 (en) * 2015-01-28 2016-07-28 Red Hat, Inc. Cache Data Validation
US9405763B2 (en) 2008-06-24 2016-08-02 Commvault Systems, Inc. De-duplication systems and methods for application-specific data
US9442806B1 (en) * 2010-11-30 2016-09-13 Veritas Technologies Llc Block-level deduplication
US9454441B2 (en) 2010-04-19 2016-09-27 Microsoft Technology Licensing, Llc Data layout for recovery and durability
US20160306841A1 (en) * 2015-04-14 2016-10-20 Microsoft Technology Licensing, Llc Collection record for overlapping data stream collections
US9575673B2 (en) 2014-10-29 2017-02-21 Commvault Systems, Inc. Accessing a file system using tiered deduplication
US9621399B1 (en) 2012-12-19 2017-04-11 Amazon Technologies, Inc. Distributed caching system
US9633056B2 (en) 2014-03-17 2017-04-25 Commvault Systems, Inc. Maintaining a deduplication database
US9632707B2 (en) 2012-05-07 2017-04-25 International Business Machines Corporation Enhancing tiering storage performance
US9633033B2 (en) 2013-01-11 2017-04-25 Commvault Systems, Inc. High availability distributed deduplicated storage system
US9645944B2 (en) 2012-05-07 2017-05-09 International Business Machines Corporation Enhancing data caching performance
US9766929B2 (en) 2015-04-14 2017-09-19 Microsoft Technology Licensing, Llc Processing of data stream collection record sequence
US9778856B2 (en) 2012-08-30 2017-10-03 Microsoft Technology Licensing, Llc Block-level access to parallel storage
US9798631B2 (en) 2014-02-04 2017-10-24 Microsoft Technology Licensing, Llc Block storage by decoupling ordering from durability
US9813529B2 (en) 2011-04-28 2017-11-07 Microsoft Technology Licensing, Llc Effective circuits in packet-switched networks
US9823842B2 (en) 2014-05-12 2017-11-21 The Research Foundation For The State University Of New York Gang migration of virtual machines using cluster-wide deduplication
US9959137B2 (en) 2015-04-14 2018-05-01 Microsoft Technology Licensing, Llc Transaction redo using skip element for object
US10031814B2 (en) 2015-04-14 2018-07-24 Microsoft Technology Licensing, Llc Collection record location as log tail beginning
US10057366B2 (en) * 2015-12-31 2018-08-21 Hughes Network Systems, Llc Accurate caching in adaptive video streaming based on collision resistant hash applied to segment contents and ephemeral request and URL data
US10061663B2 (en) 2015-12-30 2018-08-28 Commvault Systems, Inc. Rebuilding deduplication data in a distributed deduplication data storage system
US10102251B2 (en) 2015-04-14 2018-10-16 Microsoft Technology Licensing, Llc Lockless open collection data structure
US10133768B2 (en) 2015-04-14 2018-11-20 Microsoft Technology Licensing, Llc Latest external dependee entity in transaction record

Families Citing this family (130)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1802155A1 (en) * 2005-12-21 2007-06-27 Cronto Limited System and method for dynamic multifactor authentication
US9690790B2 (en) 2007-03-05 2017-06-27 Dell Software Inc. Method and apparatus for efficiently merging, storing and retrieving incremental data
JP4691602B2 (en) * 2009-03-19 2011-06-01 富士通株式会社 Data backup method and information processing apparatus
US8762348B2 (en) * 2009-06-09 2014-06-24 Emc Corporation Segment deduplication system with compression of segments
US8731190B2 (en) * 2009-06-09 2014-05-20 Emc Corporation Segment deduplication system with encryption and compression of segments
US8401181B2 (en) * 2009-06-09 2013-03-19 Emc Corporation Segment deduplication system with encryption of segments
US9176824B1 (en) 2010-03-12 2015-11-03 Carbonite, Inc. Methods, apparatus and systems for displaying retrieved files from storage on a remote user device
US8266280B2 (en) * 2010-03-17 2012-09-11 International Business Machines Corporation System and method for a storage area network virtualization optimization
US8818956B2 (en) * 2010-03-26 2014-08-26 Carbonite, Inc. Transfer of user data between logical data sites
US20110246721A1 (en) * 2010-03-31 2011-10-06 Sony Corporation Method and apparatus for providing automatic synchronization appliance
US8234372B2 (en) * 2010-05-05 2012-07-31 Go Daddy Operating Company, LLC Writing a file to a cloud storage solution
US8719223B2 (en) 2010-05-06 2014-05-06 Go Daddy Operating Company, LLC Cloud storage solution for reading and writing files
US8495022B1 (en) * 2010-05-13 2013-07-23 Symantec Corporation Systems and methods for synthetic backups
US8396839B1 (en) * 2010-06-25 2013-03-12 Emc Corporation Representing de-duplicated file data
US8868726B1 (en) * 2010-07-02 2014-10-21 Symantec Corporation Systems and methods for performing backups
US9678688B2 (en) * 2010-07-16 2017-06-13 EMC IP Holding Company LLC System and method for data deduplication for disk storage subsystems
US8650159B1 (en) * 2010-08-26 2014-02-11 Symantec Corporation Systems and methods for managing data in cloud storage using deduplication techniques
US8392376B2 (en) * 2010-09-03 2013-03-05 Symantec Corporation System and method for scalable reference management in a deduplication based storage system
US9400799B2 (en) * 2010-10-04 2016-07-26 Dell Products L.P. Data block migration
US9690499B1 (en) * 2010-11-04 2017-06-27 Veritas Technologies Systems and methods for cloud-based data protection storage
US20120117029A1 (en) * 2010-11-08 2012-05-10 Stephen Gold Backup policies for using different storage tiers
US8682873B2 (en) * 2010-12-01 2014-03-25 International Business Machines Corporation Efficient construction of synthetic backups within deduplication storage system
US9430330B1 (en) * 2010-12-29 2016-08-30 Netapp, Inc. System and method for managing environment metadata during data backups to a storage system
US8442952B1 (en) * 2011-03-30 2013-05-14 Emc Corporation Recovering in deduplication systems
US8868859B2 (en) * 2011-06-03 2014-10-21 Apple Inc. Methods and apparatus for multi-source restore
US9465696B2 (en) * 2011-06-03 2016-10-11 Apple Inc. Methods and apparatus for multi-phase multi-source backup
US9411687B2 (en) 2011-06-03 2016-08-09 Apple Inc. Methods and apparatus for interface in multi-phase restore
US8819471B2 (en) 2011-06-03 2014-08-26 Apple Inc. Methods and apparatus for power state based backup
US9118642B2 (en) 2011-06-05 2015-08-25 Apple Inc. Asset streaming
US8843443B1 (en) 2011-06-30 2014-09-23 Emc Corporation Efficient backup of virtual data
US9311327B1 (en) 2011-06-30 2016-04-12 Emc Corporation Updating key value databases for virtual backups
US8671075B1 (en) 2011-06-30 2014-03-11 Emc Corporation Change tracking indices in virtual machines
US8849769B1 (en) * 2011-06-30 2014-09-30 Emc Corporation Virtual machine file level recovery
US8849777B1 (en) 2011-06-30 2014-09-30 Emc Corporation File deletion detection in key value databases for virtual backups
US9229951B1 (en) 2011-06-30 2016-01-05 Emc Corporation Key value databases for virtual backups
US8949829B1 (en) 2011-06-30 2015-02-03 Emc Corporation Virtual machine disaster recovery
US9158632B1 (en) 2011-06-30 2015-10-13 Emc Corporation Efficient file browsing using key value databases for virtual backups
US8762349B2 (en) * 2011-07-14 2014-06-24 Dell Products L.P. Intelligent deduplication data prefetching
US9515884B2 (en) * 2011-09-14 2016-12-06 I.T. Analyzer Ltd. System and method for evaluating coverage of services by components of an IT infrastructure
CN102999398B (en) * 2011-09-15 2014-06-11 腾讯科技(深圳)有限公司 Method, system and device for user system recovery
US9014023B2 (en) 2011-09-15 2015-04-21 International Business Machines Corporation Mobile network services in a mobile data network
US8959223B2 (en) * 2011-09-29 2015-02-17 International Business Machines Corporation Automated high resiliency system pool
US8996700B2 (en) 2011-09-29 2015-03-31 International Business Machines Corporation Automated workload performance and availability optimization based on hardware affinity
US8549108B2 (en) * 2011-09-29 2013-10-01 Riverbed Technology, Inc. Optimized prefetching of compound data
WO2013049611A1 (en) * 2011-09-30 2013-04-04 Google Inc. Cloud storage of game state
US8843621B2 (en) 2011-10-25 2014-09-23 International Business Machines Corporation Event prediction and preemptive action identification in a networked computing environment
WO2013065084A1 (en) * 2011-11-01 2013-05-10 Hitachi, Ltd. Information system and method for managing data
US10021696B2 (en) * 2011-11-16 2018-07-10 International Business Machines Corporation Data caching at the edge of a mobile data network
US8971192B2 (en) 2011-11-16 2015-03-03 International Business Machines Corporation Data breakout at the edge of a mobile data network
EP2780796A4 (en) * 2011-11-18 2015-07-08 Dell Software Inc Method of and system for merging, storing and retrieving incremental backup data
US20130151483A1 (en) * 2011-12-07 2013-06-13 Quantum Corporation Adaptive experience based De-duplication
US9177011B2 (en) * 2011-12-22 2015-11-03 Magnet Forensics Inc. Systems and methods for locating application specific data
US8442945B1 (en) * 2012-01-03 2013-05-14 Don Doerner No touch synthetic full backup
US8892526B2 (en) * 2012-01-11 2014-11-18 Timothy STOAKES Deduplication seeding
US9158568B2 (en) 2012-01-30 2015-10-13 Hewlett-Packard Development Company, L.P. Input/output operations at a virtual block device of a storage server
US9098325B2 (en) 2012-02-28 2015-08-04 Hewlett-Packard Development Company, L.P. Persistent volume at an offset of a virtual block device of a storage server
US10133748B2 (en) * 2012-03-06 2018-11-20 International Business Machines Corporation Enhancing data retrieval performance in deduplication systems
US20130238832A1 (en) * 2012-03-07 2013-09-12 Netapp, Inc. Deduplicating hybrid storage aggregate
WO2013136339A1 (en) 2012-03-15 2013-09-19 Hewlett-Packard Development Company, L.P. Regulating replication operation
US9292815B2 (en) * 2012-03-23 2016-03-22 Commvault Systems, Inc. Automation of data storage activities
WO2013153584A1 (en) * 2012-04-13 2013-10-17 Hitachi, Ltd. Storage device
US8903764B2 (en) 2012-04-25 2014-12-02 International Business Machines Corporation Enhanced reliability in deduplication technology over storage clouds
US9633032B2 (en) * 2012-04-30 2017-04-25 Quantum Corporation Object synthesis
US9183094B2 (en) 2012-05-25 2015-11-10 Symantec Corporation Backup image duplication
US8521153B1 (en) 2012-06-18 2013-08-27 International Business Machines Corporation Using the maintenance channel in a mobile data network to provide subscriber data when a cache miss occurs
CN103514064B (en) 2012-06-28 2016-03-16 国际商业机器公司 Method and apparatus for recording backup information
US9251114B1 (en) 2012-10-12 2016-02-02 Egnyte, Inc. Systems and methods for facilitating access to private files using a cloud storage system
US20140115290A1 (en) * 2012-10-19 2014-04-24 Dell Products L.P. System and method for migration of digital assets
US20140122569A1 (en) * 2012-10-30 2014-05-01 Microsoft Corporation Bridging on premise and cloud systems via canonical cache
US9160809B2 (en) 2012-11-26 2015-10-13 Go Daddy Operating Company, LLC DNS overriding-based methods of accelerating content delivery
US9384208B2 (en) 2013-01-22 2016-07-05 Go Daddy Operating Company, LLC Configuring a cached website file removal using a pulled data list
US9141669B2 (en) 2013-01-22 2015-09-22 Go Daddy Operating Company, LLC Configuring an origin server content delivery using a pulled data list
US10042907B2 (en) * 2012-11-29 2018-08-07 Teradata Us, Inc. Providing metadata to database systems and environments with multiple processing units or modules
US9385915B2 (en) 2012-11-30 2016-07-05 Netapp, Inc. Dynamic caching technique for adaptively controlling data block copies in a distributed data processing system
US9122696B2 (en) 2012-12-06 2015-09-01 International Business Machines Corporation Sharing electronic file metadata in a networked computing environment
US9542423B2 (en) 2012-12-31 2017-01-10 Apple Inc. Backup user interface
US9459856B2 (en) * 2013-01-02 2016-10-04 International Business Machines Corporation Effective migration and upgrade of virtual machines in cloud environments
US9678971B2 (en) 2013-01-10 2017-06-13 International Business Machines Corporation Packing deduplicated data in a self-contained deduplicated repository
US9300748B2 (en) 2013-01-16 2016-03-29 Cisco Technology, Inc. Method for optimizing WAN traffic with efficient indexing scheme
US9306997B2 (en) 2013-01-16 2016-04-05 Cisco Technology, Inc. Method for optimizing WAN traffic with deduplicated storage
US9509736B2 (en) 2013-01-16 2016-11-29 Cisco Technology, Inc. Method for optimizing WAN traffic
US9015527B2 (en) * 2013-01-29 2015-04-21 Hewlett-Packard Development Company, L.P. Data backup and recovery
US9438493B2 (en) 2013-01-31 2016-09-06 Go Daddy Operating Company, LLC Monitoring network entities via a central monitoring system
US9864755B2 (en) 2013-03-08 2018-01-09 Go Daddy Operating Company, LLC Systems for associating an online file folder with a uniform resource locator
US9195672B1 (en) * 2013-03-14 2015-11-24 Emc Corporation Selective fragmentation repair
US9483494B1 (en) * 2013-03-14 2016-11-01 Emc Corporation Opportunistic fragmentation repair
US9354983B1 (en) * 2013-03-15 2016-05-31 Entreda, Inc. Integrated it service provisioning and management
US9454326B1 (en) * 2013-03-31 2016-09-27 Emc Corporation File metro cluster for site failover of data storage system
US9170996B2 (en) 2013-05-16 2015-10-27 Bank Of America Corporation Content interchange bus
US9858153B2 (en) 2013-05-29 2018-01-02 Microsoft Technology Licensing, Llc Service-based backup data restoring to devices
CN104216793B (en) 2013-05-31 2017-10-17 国际商业机器公司 Application backup, recovery methods and equipment
US9384234B2 (en) 2013-06-13 2016-07-05 Bank Of America Corporation Identification of load utility
US9384223B2 (en) 2013-06-13 2016-07-05 Bank Of America Corporation Automation of MLOAD and TPUMP conversion
US10031961B1 (en) * 2013-06-20 2018-07-24 Ca, Inc. Systems and methods for data replication
US9503541B2 (en) * 2013-08-21 2016-11-22 International Business Machines Corporation Fast mobile web applications using cloud caching
US9785643B1 (en) * 2013-09-06 2017-10-10 Veritas Technologies Llc Systems and methods for reclaiming storage space in deduplicating data systems
US20150074275A1 (en) * 2013-09-10 2015-03-12 International Business Machines Corporation Mobile application data storage allocation
CN103685453B (en) * 2013-09-11 2016-08-03 华中科技大学 One kind of cloud storage system metadata acquisition method
US9858322B2 (en) 2013-11-11 2018-01-02 Amazon Technologies, Inc. Data stream ingestion and persistence techniques
US20150134723A1 (en) * 2013-11-11 2015-05-14 Microsoft Corporation Geo-distributed disaster recovery for interactive cloud applications
US9720989B2 (en) 2013-11-11 2017-08-01 Amazon Technologies, Inc. Dynamic partitioning techniques for data streams
US9794135B2 (en) * 2013-11-11 2017-10-17 Amazon Technologies, Inc. Managed service for acquisition, storage and consumption of large-scale data streams
US20150227543A1 (en) * 2014-02-11 2015-08-13 Atlantis Computing, Inc. Method and apparatus for replication of files and file systems using a deduplication key space
CN103778034B (en) * 2014-02-26 2017-12-01 广州杰赛科技股份有限公司 Based system and method for disaster recovery data backup cloud storage
US9660933B2 (en) 2014-04-17 2017-05-23 Go Daddy Operating Company, LLC Allocating and accessing hosting server resources via continuous resource availability updates
US9501211B2 (en) 2014-04-17 2016-11-22 GoDaddy Operating Company, LLC User input processing for allocation of hosting server resources
CN105022741B (en) * 2014-04-23 2018-09-28 苏宁易购集团股份有限公司 Method and system for compressing and storing method and system for cloud
US20160357477A1 (en) * 2014-05-30 2016-12-08 Hitachi, Ltd. Method and apparatus of data deduplication storage system
US20170155639A1 (en) * 2014-06-10 2017-06-01 Alcatel Lucent Secure unified cloud storage
US9491241B1 (en) * 2014-06-30 2016-11-08 EMC IP Holding Company LLC Data storage system with native representational state transfer-based application programming interface
CN105373445A (en) * 2014-07-04 2016-03-02 施耐德电气工业公司 A backup and recovery method for PLC/HMI device files
US9864658B1 (en) * 2014-12-01 2018-01-09 Vce Company, Llc Automation of deduplication storage capacity sizing and trending analysis
CN104778095B (en) * 2015-01-20 2017-11-17 成都携恩科技有限公司 One kind of cloud data management platform
US9892003B2 (en) * 2015-02-11 2018-02-13 International Business Machines Corporation Method for automatically configuring backup client systems and backup server systems in a backup environment
US9940234B2 (en) * 2015-03-26 2018-04-10 Pure Storage, Inc. Aggressive data deduplication using lazy garbage collection
CN106295386A (en) 2015-06-02 2017-01-04 阿里巴巴集团控股有限公司 Data file protection method and apparatus and terminal device
US9894510B1 (en) * 2015-08-10 2018-02-13 Acronis International Gmbh Event-based data backup and recovery for mobile devices
US9804957B1 (en) * 2015-10-01 2017-10-31 EMC IP Holding Company LLC Block tracking data validation backup model
US9430337B1 (en) 2016-01-07 2016-08-30 International Business Machines Corporation Disaster recovery as a dynamic service
WO2017127124A1 (en) * 2016-01-22 2017-07-27 Hewlett Packard Enterprise Development Lp Ranking backup files
US20170277597A1 (en) * 2016-03-25 2017-09-28 Netapp, Inc. Efficient creation of multiple retention period based representations of a dataset backup
US10095428B1 (en) 2016-03-30 2018-10-09 EMC IP Holding Company LLC Live migration of a tree of replicas in a storage system
US9959063B1 (en) 2016-03-30 2018-05-01 EMC IP Holding Company LLC Parallel migration of multiple consistency groups in a storage system
US9959073B1 (en) 2016-03-30 2018-05-01 EMC IP Holding Company LLC Detection of host connectivity for data migration in a storage system
US20170286442A1 (en) * 2016-03-31 2017-10-05 Microsoft Technology Licensing, Llc File system support for file-level ghosting
US10083067B1 (en) 2016-06-29 2018-09-25 EMC IP Holding Company LLC Thread management in a storage system
US10048874B1 (en) 2016-06-29 2018-08-14 EMC IP Holding Company LLC Flow control with a dynamic window in a storage system with latency guarantees
US9983937B1 (en) 2016-06-29 2018-05-29 EMC IP Holding Company LLC Smooth restart of storage clusters in a storage system
US10013200B1 (en) 2016-06-29 2018-07-03 EMC IP Holding Company LLC Early compression prediction in a storage system with granular block sizes
US20180121454A1 (en) * 2016-10-28 2018-05-03 Netapp, Inc. Reducing stable data eviction with synthetic baseline snapshot and eviction state refresh

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050021869A1 (en) * 2003-06-27 2005-01-27 Aultman Joseph L. Business enterprise backup and recovery system and method
US20070220326A1 (en) * 2006-02-14 2007-09-20 Kenta Ninose Storage system and recovery method thereof
US20090182953A1 (en) * 2004-12-23 2009-07-16 Solera Networks. Inc. Method and apparatus for network packet capture distributed storage system
US20090204718A1 (en) * 2008-02-08 2009-08-13 Lawton Kevin P Using memory equivalency across compute clouds for accelerated virtual memory migration and memory de-duplication
US20090217091A1 (en) * 2008-02-26 2009-08-27 Kddi Corporation Data backing up for networked storage devices using de-duplication technique
US20090259882A1 (en) * 2008-04-15 2009-10-15 Dot Hill Systems Corporation Apparatus and method for identifying disk drives with unreported data corruption
US20090276771A1 (en) * 2005-09-15 2009-11-05 3Tera, Inc. Globally Distributed Utility Computing Cloud
US7620775B1 (en) * 2004-03-26 2009-11-17 Emc Corporation System and method for managing storage networks and providing virtualization of resources in such a network using one or more ASICs
US20100031086A1 (en) * 2008-07-31 2010-02-04 Andrew Charles Leppard Repair of a corrupt data segment used by a de-duplication engine
US20100257403A1 (en) * 2009-04-03 2010-10-07 Microsoft Corporation Restoration of a system from a set of full and partial delta system snapshots across a distributed system
US20100274772A1 (en) * 2009-04-23 2010-10-28 Allen Samuels Compressed data objects referenced via address references and compression references

Family Cites Families (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6161109A (en) * 1998-04-16 2000-12-12 International Business Machines Corporation Accumulating changes in a database management system by copying the data object to the image copy if the data object identifier of the data object is greater than the image identifier of the image copy
US6647399B2 (en) * 1999-11-29 2003-11-11 International Business Machines Corporation Method, system, program, and data structures for naming full backup versions of files and related deltas of the full backup versions
GB0207969D0 (en) * 2002-04-08 2002-05-15 Ibm Data processing arrangement and method
US8280926B2 (en) * 2003-08-05 2012-10-02 Sepaton, Inc. Scalable de-duplication mechanism
US7231502B2 (en) * 2004-02-04 2007-06-12 Falcon Stor Software, Inc. Method and system for storing data
JP2005301497A (en) * 2004-04-08 2005-10-27 Hitachi Ltd Storage management system, restoration method and its program
US8224784B2 (en) * 2004-08-13 2012-07-17 Microsoft Corporation Combined computer disaster recovery and migration tool for effective disaster recovery as well as the backup and migration of user- and system-specific information
US7979404B2 (en) * 2004-09-17 2011-07-12 Quest Software, Inc. Extracting data changes and storing data history to allow for instantaneous access to and reconstruction of any point-in-time data
US7529785B1 (en) * 2006-02-28 2009-05-05 Symantec Corporation Efficient backups using dynamically shared storage pools in peer-to-peer networks
US9930099B2 (en) * 2007-05-08 2018-03-27 Riverbed Technology, Inc. Hybrid segment-oriented file server and WAN accelerator
US8046331B1 (en) * 2007-05-25 2011-10-25 Symantec Corporation Method and apparatus for recreating placeholders
US20090204650A1 (en) * 2007-11-15 2009-08-13 Attune Systems, Inc. File Deduplication using Copy-on-Write Storage Tiers
US8117164B2 (en) * 2007-12-19 2012-02-14 Microsoft Corporation Creating and utilizing network restore points
US8166257B1 (en) * 2008-01-24 2012-04-24 Network Appliance, Inc. Automated continuous provisioning of a data storage system
US20090222509A1 (en) * 2008-02-29 2009-09-03 Chao King System and Method for Sharing Storage Devices over a Network
US8074049B2 (en) * 2008-08-26 2011-12-06 Nine Technology, Llc Online backup system with global two staged deduplication without using an indexing database
US20100082700A1 (en) * 2008-09-22 2010-04-01 Riverbed Technology, Inc. Storage system for data virtualization and deduplication
US8620884B2 (en) * 2008-10-24 2013-12-31 Microsoft Corporation Scalable blob storage integrated with scalable structured storage
US9176978B2 (en) * 2009-02-05 2015-11-03 Roderick B. Wideman Classifying data for deduplication and storage
US9275067B2 (en) * 2009-03-16 2016-03-01 International Busines Machines Corporation Apparatus and method to sequentially deduplicate data
US8769055B2 (en) * 2009-04-24 2014-07-01 Microsoft Corporation Distributed backup and versioning
US8769049B2 (en) * 2009-04-24 2014-07-01 Microsoft Corporation Intelligent tiers of backup data
US20110055471A1 (en) * 2009-08-28 2011-03-03 Jonathan Thatcher Apparatus, system, and method for improved data deduplication
US8856080B2 (en) * 2009-10-30 2014-10-07 Microsoft Corporation Backup using metadata virtual hard drive and differential virtual hard drive
US9191437B2 (en) * 2009-12-09 2015-11-17 International Business Machines Corporation Optimizing data storage among a plurality of data storage repositories
US9489266B2 (en) * 2009-12-11 2016-11-08 Google Inc. System and method of storing backup image catalog

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050021869A1 (en) * 2003-06-27 2005-01-27 Aultman Joseph L. Business enterprise backup and recovery system and method
US7620775B1 (en) * 2004-03-26 2009-11-17 Emc Corporation System and method for managing storage networks and providing virtualization of resources in such a network using one or more ASICs
US20090182953A1 (en) * 2004-12-23 2009-07-16 Solera Networks. Inc. Method and apparatus for network packet capture distributed storage system
US20090276771A1 (en) * 2005-09-15 2009-11-05 3Tera, Inc. Globally Distributed Utility Computing Cloud
US20070220326A1 (en) * 2006-02-14 2007-09-20 Kenta Ninose Storage system and recovery method thereof
US20090204718A1 (en) * 2008-02-08 2009-08-13 Lawton Kevin P Using memory equivalency across compute clouds for accelerated virtual memory migration and memory de-duplication
US20090217091A1 (en) * 2008-02-26 2009-08-27 Kddi Corporation Data backing up for networked storage devices using de-duplication technique
US20090259882A1 (en) * 2008-04-15 2009-10-15 Dot Hill Systems Corporation Apparatus and method for identifying disk drives with unreported data corruption
US20100031086A1 (en) * 2008-07-31 2010-02-04 Andrew Charles Leppard Repair of a corrupt data segment used by a de-duplication engine
US20100257403A1 (en) * 2009-04-03 2010-10-07 Microsoft Corporation Restoration of a system from a set of full and partial delta system snapshots across a distributed system
US20100274772A1 (en) * 2009-04-23 2010-10-28 Allen Samuels Compressed data objects referenced via address references and compression references

Cited By (104)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9405763B2 (en) 2008-06-24 2016-08-02 Commvault Systems, Inc. De-duplication systems and methods for application-specific data
US8930306B1 (en) 2009-07-08 2015-01-06 Commvault Systems, Inc. Synchronized data deduplication
US20110173405A1 (en) * 2010-01-13 2011-07-14 International Business Machines Corporation System and method for reducing latency time with cloud services
US9098456B2 (en) * 2010-01-13 2015-08-04 International Business Machines Corporation System and method for reducing latency time with cloud services
US9170892B2 (en) 2010-04-19 2015-10-27 Microsoft Technology Licensing, Llc Server failure recovery
US9454441B2 (en) 2010-04-19 2016-09-27 Microsoft Technology Licensing, Llc Data layout for recovery and durability
WO2012023050A2 (en) 2010-08-20 2012-02-23 Overtis Group Limited Secure cloud computing system and method
US9239690B2 (en) * 2010-08-31 2016-01-19 Bruce R. Backa System and method for in-place data migration
US20120054325A1 (en) * 2010-08-31 2012-03-01 Backa Bruce R System and Method for In-Place Data Migration
US9128948B1 (en) * 2010-09-15 2015-09-08 Symantec Corporation Integration of deduplicating backup server with cloud storage
US9639289B2 (en) 2010-09-30 2017-05-02 Commvault Systems, Inc. Systems and methods for retaining and using data block signatures in data protection operations
US9195549B1 (en) 2010-09-30 2015-11-24 Emc Corporation Unified recovery
US9619480B2 (en) 2010-09-30 2017-04-11 Commvault Systems, Inc. Content aligned block-based deduplication
US8549350B1 (en) 2010-09-30 2013-10-01 Emc Corporation Multi-tier recovery
US8504870B2 (en) 2010-09-30 2013-08-06 Emc Corporation Optimized recovery
US8484505B1 (en) 2010-09-30 2013-07-09 Emc Corporation Self recovery
US9898225B2 (en) 2010-09-30 2018-02-20 Commvault Systems, Inc. Content aligned block-based deduplication
US8572340B2 (en) 2010-09-30 2013-10-29 Commvault Systems, Inc. Systems and methods for retaining and using data block signatures in data protection operations
US8577851B2 (en) 2010-09-30 2013-11-05 Commvault Systems, Inc. Content aligned block-based deduplication
US8578109B2 (en) 2010-09-30 2013-11-05 Commvault Systems, Inc. Systems and methods for retaining and using data block signatures in data protection operations
US8949661B1 (en) 2010-09-30 2015-02-03 Emc Corporation Federation of indices
US8943356B1 (en) 2010-09-30 2015-01-27 Emc Corporation Post backup catalogs
US10126973B2 (en) 2010-09-30 2018-11-13 Commvault Systems, Inc. Systems and methods for retaining and using data block signatures in data protection operations
US9239687B2 (en) 2010-09-30 2016-01-19 Commvault Systems, Inc. Systems and methods for retaining and using data block signatures in data protection operations
US9110602B2 (en) 2010-09-30 2015-08-18 Commvault Systems, Inc. Content aligned block-based deduplication
US8713364B1 (en) * 2010-09-30 2014-04-29 Emc Corporation Unified recovery
US20130238574A1 (en) * 2010-10-11 2013-09-12 Estsoft Corp. Cloud system and file compression and transmission method in a cloud system
US20120094637A1 (en) * 2010-10-15 2012-04-19 Microsoft Corporation Mobile Messaging Message Notifications Processing
US8934925B2 (en) * 2010-10-15 2015-01-13 Microsoft Corporation Mobile messaging message notifications processing
US9442806B1 (en) * 2010-11-30 2016-09-13 Veritas Technologies Llc Block-level deduplication
US8849898B2 (en) * 2010-12-09 2014-09-30 Jeffrey Vincent TOFANO Adaptive collaborative de-duplication
US20120150954A1 (en) * 2010-12-09 2012-06-14 Quantum Corporation Adaptive collaborative de-duplication
US9104623B2 (en) 2010-12-14 2015-08-11 Commvault Systems, Inc. Client-side repository in a networked deduplicated storage system
US20120150826A1 (en) * 2010-12-14 2012-06-14 Commvault Systems, Inc. Distributed deduplicated storage system
US20120150949A1 (en) * 2010-12-14 2012-06-14 Commvault Systems, Inc. Client-side repository in a networked deduplicated storage system
US8954446B2 (en) * 2010-12-14 2015-02-10 Comm Vault Systems, Inc. Client-side repository in a networked deduplicated storage system
US20120150818A1 (en) * 2010-12-14 2012-06-14 Commvault Systems, Inc. Client-side repository in a networked deduplicated storage system
US9116850B2 (en) 2010-12-14 2015-08-25 Commvault Systems, Inc. Client-side repository in a networked deduplicated storage system
US9020900B2 (en) * 2010-12-14 2015-04-28 Commvault Systems, Inc. Distributed deduplicated storage system
US9898478B2 (en) 2010-12-14 2018-02-20 Commvault Systems, Inc. Distributed deduplicated storage system
US8943023B2 (en) 2010-12-29 2015-01-27 Amazon Technologies, Inc. Receiver-side data deduplication in data systems
US20120173656A1 (en) * 2010-12-29 2012-07-05 Sorenson Iii James Christopher Reduced Bandwidth Data Uploading in Data Systems
US9794191B2 (en) 2010-12-29 2017-10-17 Amazon Technologies, Inc. Reduced bandwidth data uploading in data systems
US9116909B2 (en) * 2010-12-29 2015-08-25 Amazon Technologies, Inc. Reduced bandwidth data uploading in data systems
US9317377B1 (en) * 2011-03-23 2016-04-19 Riverbed Technology, Inc. Single-ended deduplication using cloud storage protocol
US20120257820A1 (en) * 2011-04-07 2012-10-11 Microsoft Corporation Image analysis tools
US9813529B2 (en) 2011-04-28 2017-11-07 Microsoft Technology Licensing, Llc Effective circuits in packet-switched networks
US9983964B2 (en) 2011-05-02 2018-05-29 International Business Machines Corporation Optimizing disaster recovery systems during takeover operations
US20120284555A1 (en) * 2011-05-02 2012-11-08 International Business Machines Corporation Optimizing disaster recovery systems during takeover operations
US8671308B2 (en) * 2011-05-02 2014-03-11 International Business Machines Corporation Optimizing disaster recovery systems during takeover operations
US9361189B2 (en) 2011-05-02 2016-06-07 International Business Machines Corporation Optimizing disaster recovery systems during takeover operations
US20130006943A1 (en) * 2011-06-30 2013-01-03 International Business Machines Corporation Hybrid data backup in a networked computing environment
US9122642B2 (en) 2011-06-30 2015-09-01 International Business Machines Corporation Hybrid data backup in a networked computing environment
US8775376B2 (en) * 2011-06-30 2014-07-08 International Business Machines Corporation Hybrid data backup in a networked computing environment
US8943355B2 (en) * 2011-12-09 2015-01-27 Promise Technology, Inc. Cloud data storage system
US20130151884A1 (en) * 2011-12-09 2013-06-13 Promise Technology, Inc. Cloud data storage system
US9697139B2 (en) 2012-05-07 2017-07-04 International Business Machines Corporation Enhancing data caching performance
US9645944B2 (en) 2012-05-07 2017-05-09 International Business Machines Corporation Enhancing data caching performance
US20130297884A1 (en) * 2012-05-07 2013-11-07 International Business Machines Corporation Enhancing data processing performance by cache management of fingerprint index
US9898419B2 (en) 2012-05-07 2018-02-20 International Business Machines Corporation Enhancing data caching performance
US9098424B2 (en) * 2012-05-07 2015-08-04 International Business Machines Corporation Enhancing data processing performance by cache management of fingerprint index
US9110815B2 (en) * 2012-05-07 2015-08-18 International Business Machines Corporation Enhancing data processing performance by cache management of fingerprint index
US9495294B2 (en) 2012-05-07 2016-11-15 International Business Machines Corporation Enhancing data processing performance by cache management of fingerprint index
US20130297569A1 (en) * 2012-05-07 2013-11-07 International Business Machines Corporation Enhancing data processing performance by cache management of fingerprint index
US9632707B2 (en) 2012-05-07 2017-04-25 International Business Machines Corporation Enhancing tiering storage performance
US9251186B2 (en) 2012-06-13 2016-02-02 Commvault Systems, Inc. Backup using a client-side signature repository in a networked storage system
US20130339310A1 (en) * 2012-06-13 2013-12-19 Commvault Systems, Inc. Restore using a client side signature repository in a networked storage system
US9858156B2 (en) 2012-06-13 2018-01-02 Commvault Systems, Inc. Dedicated client-side signature generator in a networked storage system
US9218376B2 (en) 2012-06-13 2015-12-22 Commvault Systems, Inc. Intelligent data sourcing in a networked storage system
US9218374B2 (en) 2012-06-13 2015-12-22 Commvault Systems, Inc. Collaborative restore in a networked storage system
US9218375B2 (en) 2012-06-13 2015-12-22 Commvault Systems, Inc. Dedicated client-side signature generator in a networked storage system
US9778856B2 (en) 2012-08-30 2017-10-03 Microsoft Technology Licensing, Llc Block-level access to parallel storage
US9298723B1 (en) 2012-09-19 2016-03-29 Amazon Technologies, Inc. Deduplication architecture
US20140164354A1 (en) * 2012-12-07 2014-06-12 International Business Machines Corporation Optimized query ordering for file path indexing in a content repository
US9323761B2 (en) * 2012-12-07 2016-04-26 International Business Machines Corporation Optimized query ordering for file path indexing in a content repository
US9990397B2 (en) 2012-12-07 2018-06-05 International Business Machines Corporation Optimized query ordering for file path indexing in a content repository
US9621399B1 (en) 2012-12-19 2017-04-11 Amazon Technologies, Inc. Distributed caching system
US9390052B1 (en) * 2012-12-19 2016-07-12 Amazon Technologies, Inc. Distributed caching system
US9372726B2 (en) 2013-01-09 2016-06-21 The Research Foundation For The State University Of New York Gang migration of virtual machines using cluster-wide deduplication
US9633033B2 (en) 2013-01-11 2017-04-25 Commvault Systems, Inc. High availability distributed deduplicated storage system
US9665591B2 (en) 2013-01-11 2017-05-30 Commvault Systems, Inc. High availability distributed deduplicated storage system
US9361328B1 (en) * 2013-01-28 2016-06-07 Veritas Us Ip Holdings Llc Selection of files for archival or deduplication
US9442907B2 (en) * 2013-06-04 2016-09-13 Peking University Founder Group Co., Ltd. Disaster recovery method and apparatus used in document editing and storage medium
US20140359420A1 (en) * 2013-06-04 2014-12-04 Beijing Founder Electronics Co., Ltd. Disaster Recovery Method and Apparatus Used in Document Editing and Storage Medium
US20150052322A1 (en) * 2013-08-16 2015-02-19 Red Hat Israel, Ltd. Systems and methods for memory deduplication by origin host in virtual machine live migration
US9459902B2 (en) * 2013-08-16 2016-10-04 Red Hat Israel, Ltd. Memory duplication by destination host in virtual machine live migration
US9454400B2 (en) * 2013-08-16 2016-09-27 Red Hat Israel, Ltd. Memory duplication by origin host in virtual machine live migration
US20150052323A1 (en) * 2013-08-16 2015-02-19 Red Hat Israel, Ltd. Systems and methods for memory deduplication by destination host in virtual machine live migration
US20150134861A1 (en) * 2013-11-14 2015-05-14 Humax Co., Ltd. Personal cloud storage chain service system and method
US9798631B2 (en) 2014-02-04 2017-10-24 Microsoft Technology Licensing, Llc Block storage by decoupling ordering from durability
US10114709B2 (en) 2014-02-04 2018-10-30 Microsoft Technology Licensing, Llc Block storage by decoupling ordering from durability
US9633056B2 (en) 2014-03-17 2017-04-25 Commvault Systems, Inc. Maintaining a deduplication database
US9823842B2 (en) 2014-05-12 2017-11-21 The Research Foundation For The State University Of New York Gang migration of virtual machines using cluster-wide deduplication
US9934238B2 (en) 2014-10-29 2018-04-03 Commvault Systems, Inc. Accessing a file system using tiered deduplication
US9575673B2 (en) 2014-10-29 2017-02-21 Commvault Systems, Inc. Accessing a file system using tiered deduplication
US20160219123A1 (en) * 2015-01-28 2016-07-28 Red Hat, Inc. Cache Data Validation
US9959137B2 (en) 2015-04-14 2018-05-01 Microsoft Technology Licensing, Llc Transaction redo using skip element for object
US10133768B2 (en) 2015-04-14 2018-11-20 Microsoft Technology Licensing, Llc Latest external dependee entity in transaction record
US10102251B2 (en) 2015-04-14 2018-10-16 Microsoft Technology Licensing, Llc Lockless open collection data structure
US9766929B2 (en) 2015-04-14 2017-09-19 Microsoft Technology Licensing, Llc Processing of data stream collection record sequence
US20160306841A1 (en) * 2015-04-14 2016-10-20 Microsoft Technology Licensing, Llc Collection record for overlapping data stream collections
US10031814B2 (en) 2015-04-14 2018-07-24 Microsoft Technology Licensing, Llc Collection record location as log tail beginning
US10061663B2 (en) 2015-12-30 2018-08-28 Commvault Systems, Inc. Rebuilding deduplication data in a distributed deduplication data storage system
US10057366B2 (en) * 2015-12-31 2018-08-21 Hughes Network Systems, Llc Accurate caching in adaptive video streaming based on collision resistant hash applied to segment contents and ephemeral request and URL data

Also Published As

Publication number Publication date Type
WO2011082123A1 (en) 2011-07-07 application
US9501365B2 (en) 2016-11-22 grant
US20120084261A1 (en) 2012-04-05 application
US20110161291A1 (en) 2011-06-30 application
US20110161297A1 (en) 2011-06-30 application
US8694469B2 (en) 2014-04-08 grant

Similar Documents

Publication Publication Date Title
Bhagwat et al. Extreme binning: Scalable, parallel deduplication for chunk-based file backup
US7827201B1 (en) Merging containers in a multi-container system
US8527544B1 (en) Garbage collection in a storage system
US7747584B1 (en) System and method for enabling de-duplication in a storage system architecture
US20140006465A1 (en) Managing a global namespace for a distributed filesystem
US20110055471A1 (en) Apparatus, system, and method for improved data deduplication
US20130227236A1 (en) Systems and methods for storage allocation
US20070101069A1 (en) Lightweight coherency control protocol for clustered storage system
US7197490B1 (en) System and method for lazy-copy sub-volume load balancing in a network attached storage pool
US20070124341A1 (en) System and method for restoring data on demand for instant volume restoration
US8099571B1 (en) Logical block replication with deduplication
US8315985B1 (en) Optimizing the de-duplication rate for a backup stream
US20080270461A1 (en) Data containerization for reducing unused space in a file system
US7366837B2 (en) Data placement technique for striping data containers across volumes of a storage system cluster
US20140006354A1 (en) Executing a cloud command for a distributed filesystem
US8190850B1 (en) Virtual block mapping for relocating compressed and/or encrypted file data block blocks
US8539008B2 (en) Extent-based storage architecture
US7165096B2 (en) Storage area network file system
US7698501B1 (en) System and method for utilizing sparse data containers in a striped volume set
US20110107052A1 (en) Virtual Disk Mapping
US20140006357A1 (en) Restoring an archived file in a distributed filesystem
US20060248088A1 (en) System and method for multi-tiered meta-data caching and distribution in a clustered computer environment
US20130086006A1 (en) Method for removing duplicate data from a storage array
US20130060739A1 (en) Optimization of a Partially Deduplicated File
US20130097380A1 (en) Method for maintaining multiple fingerprint tables in a deduplicating storage system

Legal Events

Date Code Title Description
AS Assignment

Owner name: MORGAN STANLEY & CO. LLC, MARYLAND

Free format text: SECURITY AGREEMENT;ASSIGNORS:RIVERBED TECHNOLOGY, INC.;OPNET TECHNOLOGIES, INC.;REEL/FRAME:029646/0060

Effective date: 20121218

AS Assignment

Owner name: RIVERBED TECHNOLOGY, INC., CALIFORNIA

Free format text: RELEASE OF PATENT SECURITY INTEREST;ASSIGNOR:MORGAN STANLEY & CO. LLC, AS COLLATERAL AGENT;REEL/FRAME:032113/0425

Effective date: 20131220