US20160191247A1 - Client-side encryption in a deduplication backup system - Google Patents
Client-side encryption in a deduplication backup system Download PDFInfo
- Publication number
- US20160191247A1 US20160191247A1 US15/060,396 US201615060396A US2016191247A1 US 20160191247 A1 US20160191247 A1 US 20160191247A1 US 201615060396 A US201615060396 A US 201615060396A US 2016191247 A1 US2016191247 A1 US 2016191247A1
- Authority
- US
- United States
- Prior art keywords
- hash
- key
- backup
- storage
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
- G06F11/1453—Management of the data involved in backup or backup restore using de-duplication of the data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/32—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
- H04L9/3236—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1469—Backup restoration techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0619—Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
- G06F3/0641—De-duplication techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/065—Replication mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/32—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
- H04L9/3226—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using a predetermined code, e.g. password, passphrase or PIN
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/32—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
- H04L9/3263—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving certificates, e.g. public key certificate [PKC] or attribute certificate [AC]; Public key infrastructure [PKI] arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
- G06F11/1451—Management of the data involved in backup or backup restore by selection of backup contents
Definitions
- the embodiments disclosed herein relate to client-side encryption in a deduplication backup system.
- a storage is computer-readable media capable of storing data in blocks. Storages face a myriad of threats to the data they store and to their smooth and continuous operation. In order to mitigate these threats, a backup of the data in a storage may be created at a particular point in time to enable the restoration of the data at some future time. Such a restoration may become desirable, for example, if the storage experiences corruption of its stored data, if the storage becomes unavailable, or if a user wishes to create a second identical storage.
- a storage is typically logically divided into a finite number of fixed-length blocks.
- a storage also typically includes a file system which tracks the locations of the blocks that are allocated to each file that is stored in the storage. The file system also tracks the blocks that are not allocated to any file. The file system generally tracks allocated and free blocks using specialized data structures, referred to as file system metadata. File system metadata is also stored in designated blocks in the storage.
- file backup uses the file system of the source storage as a starting point and performs a backup by writing the files to a destination storage. Using this approach, individual files are backed up if they have been modified since the previous backup. File backup may be useful for finding and restoring a few lost or corrupted files. However, file backup may also include significant overhead in the form of bandwidth and logical overhead because file backup requires the tracking and storing of information about where each file exists within the file system of the source storage and the destination storage.
- Another common technique for backing up a source storage ignores the locations of individual files stored in the source storage and instead simply backs up all allocated blocks stored in the source storage.
- This technique is often referred to as image backup because the backup generally contains or represents an image, or copy, of the entire allocated contents of the source storage.
- individual allocated blocks are backed up if they have been modified since the previous backup.
- image backup backs up all allocated blocks of the source storage
- image backup backs up both the blocks that make up the files stored in the source storage as well as the blocks that make up the file system metadata.
- Image backup can be relatively fast compared to file backup because reliance on the file system is minimized.
- An image backup can also be relatively fast compared to a file backup because seeking during image backup may be reduced.
- blocks are generally read sequentially with relatively limited seeking.
- blocks that make up individual files may be scattered in the source storage, resulting in relatively extensive seeking.
- a standard deduplication vault in order to deduplicate the blocks of a storage, must first receive the blocks from the computer system of the storage in unencrypted form, after which the deduplication vault will store the block if it is unique, or if the vault supports encryption it will encrypt and store the encrypted block if it is unique. In this way the standard deduplication vault will support deduplication of blocks from multiple systems. However, as the standard deduplication vault requires, at least temporarily, access to the unencrypted blocks, this provides an opportunity for these blocks to be compromised should the security of the deduplication vault be compromised or faulty.
- example embodiments described herein relate to client-side encryption in a deduplication backup system.
- the example methods disclosed herein may be employed to encrypt plain-text blocks at a source system (i.e., a client) prior to sending the blocks to a deduplication vault system.
- This client-side encryption reduces the potential for an unauthorized user to access the original plain-text blocks even where the unauthorized user has access to the deduplication vault system.
- the example methods disclosed herein may also be employed to encrypt plain-text blocks in such a way that only a single encrypted block is stored in the deduplication vault storage for each unique plain-text block that is backed-up across multiple source storages of multiple clients.
- the example methods disclosed herein employ client-side encryption with deduplication which enables sensitive blocks to remain secure within the deduplication vault storage even while redundancy within and across multiple source storages is reduced or eliminated.
- This may increase the number of blocks from a source storage that are already duplicated in the deduplication vault storage at the time that a backup of the source storage is created in the deduplication vault storage, thereby decreasing the number of blocks that must be copied from the source storage to the deduplication vault storage.
- Decreasing the number of blocks that must be copied from the source storage to the deduplication vault storage during the creation of a backup may result in decreased bandwidth overhead of transporting blocks to the deduplication vault storage and increased efficiency and speed during the creation of each backup.
- a method for client-side encryption in a deduplication backup system includes a backup phase in which various steps are performed for each allocated plain text block stored in a source storage at a point in time.
- One step includes hashing, using a first cryptographic hash function, the plain text block to generate a first hash.
- Another step includes hashing, using a second cryptographic hash function, the first hash to generate a second hash.
- Another step includes searching a key-value table of a deduplication storage to determine whether the second hash matches any key in the key-value table.
- each key-value pair in the key-value table includes a key that is a hash and a value that is an encrypted block.
- Another step includes, upon determining that the second hash does not match any key in the key-value table, encrypting, using an encrypt/decrypt function, the plain text block using the first hash as an encryption password and inserting a key-value pair into the key-value table with the key being the second hash and the value being the encrypted block.
- Another step includes inserting an entry into an image map corresponding to the source storage that includes the first hash and a position of the plain text block as stored in the source storage.
- a method for client-side encryption in a deduplication backup system includes a backup phase in which various steps are performed for each allocated plain text block stored in a source storage at a point in time.
- One step includes hashing, using a first cryptographic hash function, the plain text block to generate a first hash.
- Another step includes encrypting, using an encrypt/decrypt function, the plain text block using the first hash as an encryption password.
- Another step includes hashing, using a second cryptographic hash function, the encrypted block to generate a third hash.
- Another step includes searching a key-value table of a deduplication storage to determine whether the third hash matches any key in the key-value table.
- each key-value pair in the key-value table includes a key that is a hash and a value that is an encrypted block.
- Another step includes, upon determining that the third hash does not match any key in the key-value table, inserting a key-value pair into the key-value table with the key being the third hash and the value being the encrypted block.
- Another step includes inserting an entry into an image map corresponding to the source storage that includes the first hash, the third hash, and a position of the plain text block as stored in the source storage.
- FIG. 1 is a schematic block diagram illustrating an example deduplication backup system
- FIGS. 2A-2D are schematic diagrams illustrating client-side encryption in a deduplication backup system
- FIGS. 3A-3B is a schematic flowchart illustrating a first example method for client-side encryption in a deduplication backup system
- FIGS. 4A-4D are schematic diagrams illustrating client-side encryption in a deduplication backup system.
- FIGS. 5A-5B is a schematic flowchart illustrating a second example method for client-side encryption in a deduplication backup system.
- storage refers to computer-readable media, or some logical portion thereof such as a volume, capable of storing data in blocks.
- block refers to a fixed-length discrete sequence of bits.
- the size of each block may be configured to match the standard sector size of a file system of a storage on which the block is stored. For example, the size of each block may be 512 bytes (4096 bits) where 512 bytes is the size of a standard sector.
- allocated block refers to a block in a storage that is currently tracked as storing data by a file system of the storage.
- free block refers to a block in a storage that is not currently employed nor tracked as storing data by a file system of the storage.
- backup when used herein as a noun, refers to a copy or copies of one or more blocks from a storage.
- base backup refers to a base backup of a storage that includes at least a copy of each unique allocated block of the storage at a point in time such that the base backup can be restored on its own to recreate the state of the storage at the point in time, without being dependent on any other backup.
- a “base backup” may also include nonunique allocated blocks and free blocks of the storage at the point in time.
- incrementative backup refers to an at least partial backup of a storage that includes at least a copy of each unique allocated block of the storage that was modified between a previous point in time of a previous backup of the storage and the subsequent point in time of the incremental backup, such that the incremental backup, along with all previous backups of the storage, including an initial base backup of the storage, can be restored together to recreate the state of desired blocks of the storage at the subsequent point in time.
- modified block refers to a block that was modified either because the block was previously-allocated and changed or because the block was modified by being newly-allocated.
- An “incremental backup” may also include nonunique allocated blocks and free blocks of the storage that were modified between the previous point in time and the subsequent point in time. Only “unique allocated blocks” may be included in a “base backup” or an “incremental backup” where only a single copy of multiple duplicate allocated blocks (i.e., nonunique allocated blocks) is backed up to reduce the size of the backup.
- a “base backup” or an “incremental backup” may exclude certain undesired allocated blocks such as blocks belonging to files whose contents are not necessary for restoration purposes, such as virtual memory pagination files and machine hibernation state files.
- FIG. 1 is a schematic block diagram illustrating an example deduplication backup system 100 .
- the example deduplication backup system 100 includes a deduplication vault system 102 , a source system 104 of Company A, and a source system 106 of Company B.
- Company A may be a competitor of Company B, such that users of the source system 104 of Company A would not be authorized to access sensitive data stored in the source system 106 of Company B, and vice-versa.
- the systems 102 , 104 , and 106 include storages 108 , 110 , and 112 , respectively.
- the deduplication vault storage 108 stores a base backup A and multiple incremental backups A that have been created of the source storage 110 to represent the states of the source storage 110 at various points in time.
- the base backup A represents the state of the source storage 110 at time t( 0 )
- the 1st incremental backup A represents the state of the source storage 110 at time t( 2 )
- the 2nd incremental backup A represents the state of the source storage 110 at time t( 4 )
- the nth incremental backup A represents the state of the source storage 110 at time t( 2 n ).
- the deduplication vault storage 108 stores a base backup B and multiple incremental backups B that have been created of the source storage 112 to represent the state of the source storage 112 at various points in time.
- the base backup B represents the state of the source storage 112 at time t( 1 )
- the 1st incremental backup B represents the state of the source storage 112 at time t( 3 )
- the 2nd incremental backup B represents the state of the source storage 112 at time t( 5 )
- the nth incremental backup B represents the state of the source storage 112 at time t( 2 n+ 1).
- the deduplication vault system 102 also includes a database 114 , metadata 116 , and a deduplication module 118 .
- the source systems 104 and 106 also include encryption modules 124 and 126 , respectively. The source systems 104 and 106 are able to communicate with the deduplication vault system 102 over a network 120 .
- Each of the systems 102 , 104 , and 106 may be any computing device capable of supporting a storage and communicating with other systems including, for example, file servers, web servers, personal computers, desktop computers, laptop computers, handheld devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, smartphones, digital cameras, hard disk drives, flash memory drives, and virtual machines.
- the network 120 may be any wired or wireless communication network including, for example, a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a Wireless Application Protocol (WAP) network, a Bluetooth network, an Internet Protocol (IP) network such as the internet, or some combination thereof.
- LAN Local Area Network
- MAN Metropolitan Area Network
- WAN Wide Area Network
- WAP Wireless Application Protocol
- Bluetooth an Internet Protocol (IP) network such as the internet, or some combination thereof.
- IP Internet Protocol
- the image backups stored in the deduplication vault storage 108 may be created by the deduplication module 118 .
- the deduplication module 118 may be configured to execute computer instructions to perform image backup operations of creating a base backup and multiple incremental backups of the source storages 110 and of the source storage 112 . It is noted that these image backups may initially be created on the source systems 104 and 106 and then copied to the deduplication vault system 102 .
- the base backup A may be created to capture the state of the source storage 110 at time t( 0 ).
- This image backup operation may include the deduplication module 118 copying all allocated blocks of the source storage 110 as allocated at time t( 0 ) and storing the allocated blocks in the deduplication vault storage 108 .
- the state of the source storage 110 at time t( 0 ) may be captured using snapshot technology in order to capture the blocks stored in the source storage 110 at time t( 0 ) without interrupting other processes, thus avoiding downtime of the source storage 110 .
- the base backup A may be very large depending on the size of the source storage 110 and the number of allocated blocks at time t( 0 ).
- the base backup A may take a relatively long time to create and consume a relatively large amount of space in the duplication vault storage 108 , depending on how many of the blocks included in the base backup A were already duplicated in the duplication vault storage 108 prior to the creation of the base backup A.
- the 1st incremental backup may include only those allocated blocks from the source storage 110 that were modified between time t( 0 ) and time t( 2 ), and the 2nd incremental backup may include only those allocated blocks from the source storage 110 that were modified between time t( 2 ) and time t( 4 ).
- each incremental backup A may take a relatively short time to create and consume a relatively small storage space in the deduplication vault storage 108 , depending on how many of the blocks included in the base backup A and the 1st and 2nd incremental backups A were already duplicated in the duplication vault storage 108 prior to the creation of the base backup A.
- an nth incremental backup A may be created to capture the state of the source storage 110 at time t( 2 n ). This may include copying only modified allocated blocks of the source storage 110 present at time t( 2 n ), using snapshot technology, and storing the modified allocated blocks in the deduplication vault storage 108 . The nth incremental backup A may include only those allocated blocks from the source storage 110 that were modified between time t( 2 n ) and the point in time of the backup of the source storage 110 that occurred just prior to the nth incremental backup A at time t( 2 n ).
- the base backup B and the 1st, 2nd, and nth incremental backups B may be created in a similar manner as the creation of the base backup A and the 1st, 2nd, and nth incremental backups A, only instead of being created to represent the states at times t( 0 ), t( 2 ), t( 4 ), and t( 2 n ), the base backup B and the 1st, 2nd, and nth incremental backups B are created to represent the states at times t( 1 ), t( 3 ), t( 5 ), and t( 2 n+ 1).
- a time with a label t(x) is at least as late in time as a time with a label t(x ⁇ 1).
- incremental backups may be created on an ongoing basis.
- the frequency of creating new incremental backups may be altered as desired in order to adjust the amount of data that will be lost should the source storage 110 or 112 experience corruption of its stored blocks or become unavailable at any given point in time.
- the blocks from the source storage 110 or 112 can be restored to the state at the point in time of a particular incremental backup by applying the image backups to a restore storage from oldest to newest, namely, first applying the base backup and then applying each successive incremental backup up to the particular incremental backup.
- the blocks from the source storage 110 or 112 can be restored to the state at the point in time of a particular incremental backup by applying the image backups to a restore storage concurrently, namely, concurrently applying the base backup and each successive incremental backup up to the particular incremental backup.
- the restore storage may be the source storage 110 or 112 or some other storage.
- the source storage 110 or 112 may instead be backed up by creating a base backup and one or more decremental image backups. Decremental backups are created by initially creating a base backup to capture the state at an initial point in time, then updating the base backup to capture the state at a subsequent point in time by modifying only those blocks in the base backup that changed between the initial and subsequent points in time, and by adding to the base backup copies of any blocks newly allocated between the initial and subsequent point in time.
- the original blocks in the base backup that correspond to the changed blocks are copied to a decremental backup, thus enabling restoration of the source storage 110 or 112 at the initial point in time (by restoring the updated base backup and then restoring the decremental backup or by concurrently restoring the updated base backup and the decremental backup) or at the subsequent point in time (by simply restoring the updated base backup).
- restoring a single base backup is generally faster than restoring a base backup and one or more incremental or decremental backups
- creating decremental backups instead of incremental backups may enable the most recent backup to be restored more quickly since the most recent backup is always a base backup or an updated base backup instead of potentially being an incremental backup. Therefore, the methods disclosed herein are not limited to encrypting base and incremental backups, but may also include encrypting base and decremental backups.
- the database 114 and the metadata 116 may be employed to track information related to the source storages 110 and 112 , the deduplication vault storage 108 , and the backups of the source storages 110 and 112 that are stored in the deduplication vault storage 108 .
- the database 114 and the metadata 116 may be identical or similar in structure and function to the database 500 and the metadata 700 disclosed in related U.S. patent application Ser. No. 13/782,549, titled “MULTIPHASE DEDUPLICATION,” which was filed on Mar. 1, 2013 and is expressly incorporated herein by reference in its entirety.
- the deduplication module 118 and/or another module may restore each block that was stored in the source storage 110 or 112 at a particular point in time to a restore storage.
- the deduplication vault system 102 may be a file server
- the source system 104 may be a first desktop computer
- the source system 106 may be a second desktop computer
- the network 120 may include the internet.
- the file server may be configured to periodically back up the storages of the first and second desktop computers over the internet as part of backup jobs by creating base backups and multiple incremental backups and storing them in the storage of the file server.
- the first and second desktop computers may also be configured to track modifications to their storages between backups in order to easily and quickly identify only those blocks that were modified for use in the creation of an incremental backup.
- the file server may also be configured to restore one or more of the backups to a storage of a restore computer over the internet if the first or second desktop computer experiences corruption of its storage or if the first or second desktop computer's storage becomes unavailable.
- any of the systems 102 , 104 , and 106 may instead include two or more storages.
- the systems 102 , 104 , and 106 are disclosed in FIG. 1 as communicating over the network 120 , it is understood that the systems 102 and 104 or 102 and 106 may instead communicate directly with each other.
- the storage 110 or 112 may function as both a source storage and a restore storage.
- the storage 110 or 112 may function as a source storage during the creation of a backup and as a restore storage during a restoration of the backup, which may enable the storage 110 or 112 to be restored to a state of an earlier point in time.
- deduplication module 118 the encryption module 124 , and the encryption module 126 are the only modules disclosed in the example deduplication backup system 100 of FIG. 1 , it is understood that the functionality of the modules 118 , 124 , and 126 may be replaced or augmented by one or more similar modules residing on any of the systems 102 , 104 , and 106 or another system.
- the deduplication vault system 102 of FIG. 1 may be configured to simultaneously back up many more source storages and/or to simultaneously restore many more restore storages.
- FIG. 1 Having described one specific environment with respect to FIG. 1 , it is understood that the specific environment of FIG. 1 is only one of countless environments in which the example methods disclosed herein may be practiced. The scope of the example embodiments is not intended to be limited to any particular environment.
- FIGS. 2A-2D are schematic diagrams illustrating client-side encryption 200 in the deduplication backup system 100 .
- the deduplication vault storage 108 may have been seeded with common blocks and/or various image backup operations of one or more backup jobs may have transpired, which will have resulted in the insertions of various blocks into the deduplication vault storage 108 , such as the blocks at positions 108 ( 4 )- 108 ( 8 ).
- allocated blocks in the source storages 110 and 112 are identified as being appropriate for being backed up. In the case of a base backup, all allocated blocks may be identified, and in the case of an incremental, only allocated blocks that have potentially been modified may be identified.
- the client-side encryption 200 illustrates the creation of the base backup A of the source storage 110 to represent the state of the source storage 110 at time t( 0 ) in FIGS. 2A-2B , and illustrates the creation of the base backup B of the source storage 112 to represent the state of the source storage 112 at time t( 1 ) in FIGS. 2C-2D .
- the source storages 110 and 112 are each depicted with only eight blocks and the deduplication vault storage 108 is depicted with only sixteen blocks, it is understood that the storages 108 , 110 , and 112 may include many more blocks, such as millions or billions or potentially even more blocks.
- Hash values also referred to herein as hashes, are illustrated as “HX,” where X is a number that represents a unique hash.
- a snapshot is taken of the source storage 110 at time t( 0 ) and allocated plain text blocks at positions 110 ( 1 ), 110 ( 2 ), 110 ( 4 ), 110 ( 6 ), and 110 ( 7 ) are targeted to be included in the base backup A of the source storage 110 .
- Each of these blocks is then read from the source storage 110 , hashed, using a 1st cryptographic hash function, to generate a 1st hash, and then the 1st hash is hashed, using a 2nd cryptographic hash function, to generate a 2nd hash.
- each key-value pair in the key-value table includes a key that is a hash and a value that is an encrypted block.
- each key-value pair in the key-value table includes a key that is a hash and a value that is an encrypted block.
- the 2nd hash H 38 matches the key at position 108 ( 4 ) in the key-value table, while the 2nd hashes H 27 , H 23 , and H 29 do not match any key in the key value table.
- an entry is inserted into an image map 202 corresponding to the base backup A of the source storage 110 that includes the corresponding 1st hash H 18 and the position 110 ( 2 ) of the plain text block as stored in the source storage 110 .
- the image maps disclosed in the drawings may be implemented in the metadata 116 of the duplication vault system 102 of FIG. 1 . Further, the image maps disclosed in the drawings may be stored in plain text or may themselves be encrypted. Also, the image maps disclosed in the drawings may each be stored locally in the source storage of the corresponding source system or may each be stored remotely in the deduplication vault storage 108 of the deduplication vault system 102 . When the image map is encrypted, it may be encrypted after the backup phases disclosed herein, and then decrypted prior to the restore phases disclosed herein.
- each of their corresponding plain text blocks is encrypted, using an encrypt/decrypt function, using the 1st hash as an encryption password, and then a key-value pair is inserted into the key-value table with the key being the 2nd hash and the value being the encrypted block, and then an entry is inserted into the image map 202 corresponding to the source storage 110 that includes the 1st hash and a position of the plain text block as stored in the source storage 110 .
- each block at position 110 ( 4 ) and the block at position 110 ( 7 ) are duplicates, only the first instance of this duplicate block is encrypted and inserted into the key-value table, but entries for both of the duplicate blocks are inserted into the image map 202 .
- an “encrypt/decrypt function” may actually be two separate functions, one for encrypting and another for decrypting, in which case the “encrypt/decrypt function” is the combination of an encrypt function and a decrypt function.
- each block may be processed individually through each of the steps disclosed in FIGS. 2A and 2B , and below in FIGS. 2C and 2D , instead of a step being performed concurrently on all relevant blocks.
- a snapshot is then taken of the source storage 112 at time t( 1 ) and allocated plain text blocks at positions 112 ( 1 ), 112 ( 2 ), 112 ( 3 ), and 112 ( 5 ) are targeted to be included in the base backup B of the source storage 112 .
- Each of these blocks is then read from the source storage 112 , hashed, using the 1st cryptographic hash function, to generate a 1st hash, and then the 1st hash is hashed, using the 2nd cryptographic hash function, to generate a 2nd hash.
- each of their corresponding plain text blocks is encrypted, using the encrypt/decrypt function, using the 1st hash as an encryption password, then a key-value pair are inserted into the key-value table with the key being the 2nd hash and the value being the encrypted block, and an entry is inserted into the image map 204 corresponding to the source storage that includes the 1st hash and a position of the plain text block as stored in the source storage 110 .
- plain-text blocks of the source storages 110 and 112 may be encrypted at the source system 104 of Company A and at the source system 106 of Company B prior to sending the blocks to the deduplication vault storage 108 .
- This client-side encryption 200 reduces the potential for an unauthorized user to access the original plain-text blocks.
- the client-side encryption 200 encrypts plain-text blocks in such a way that only a single encrypted block is stored in the deduplication vault storage 108 for each unique plain-text block that is backed up across the source storages 110 and 112 .
- the client-side encryption 200 employs client-side encryption with deduplication which enables sensitive blocks to remain secure within the key value table of the deduplication vault storage 108 even while redundancy within and across the source storages 110 and 112 is reduced or eliminated. As disclosed in FIGS.
- FIGS. 3A-3B is a schematic flowchart illustrating a first example method 300 for client-side encryption in the deduplication backup system 100 .
- the method 300 may be implemented, in at least some embodiments, by the deduplication module 118 of the deduplication vault system 102 , by the encryption module 124 of the source system 104 , and by the encryption module 126 of the source system 106 of FIG. 1 .
- these modules may be configured to execute computer instructions to perform operations of client-side encryption of the source storages 110 and 112 prior to being backed up into the deduplication vault storage 108 , as represented by one or more of phases 302 - 308 which are made up of the steps 310 - 364 of the method 300 .
- phase/steps may be divided into additional phases/steps, combined into fewer phases/steps, reordered, or eliminated, depending on the desired implementation.
- the method 300 will now be discussed with reference to FIGS. 1, 2A-2D, and 3A-3B .
- the method 300 may include a backup phase 302 for Company A, a restore phase 304 for Company A, a backup phase 306 for Company B, and a restore phase 308 for Company B.
- the backup phase 302 of the method 300 may include a step 310 in which an allocated plain text block is read from the source storage.
- the encryption module 124 may read, at step 310 , the plain text block at position 110 ( 1 ) or 110 ( 2 ) from the source storage 110 , as disclosed in FIG. 2A .
- the backup phase 302 of the method 300 may include a step 312 in which the plain text blocks is hashed, using a 1st cryptographic hash function, to generate a 1st hash.
- the encryption module 124 may hash, at step 312 , the plain text block from position 110 ( 1 ) or 110 ( 2 ) using the 1st cryptographic hash function to generate a 1st hash, such as the 1st hash H 7 or the 1st hash H 18 , as disclosed in FIG. 2A .
- the 1 st cryptographic hash function may be a SHA-1, SHA-2, SHA-3, MD5, or other cryptographic hash function, for example.
- the backup phase 302 of the method 300 may include a step 314 in which the 1 st hash is hashed, using a 2nd cryptographic hash function, to generate a 2nd hash.
- the encryption module 124 may hash, at step 314 , the 1st hash H 7 or the 1st hash H 18 using the 2nd cryptographic hash function to generate the 2nd hash H 27 or the 2nd hash H 38 , as disclosed in FIG. 2A .
- the 2nd cryptographic hash function may be a SHA-1, SHA-2, SHA-3, MD5, or other cryptographic hash function, for example, and may be the same as, or different from, the 1st cryptographic hash function.
- the backup phase 302 of the method 300 may include a step 316 in which a key-value table of a deduplication vault is searched to determine whether the 2nd hash matches any key in the key-value table, where each key-value pair in the key-value table includes a key that is a hash and a value that is an encrypted block.
- the deduplication module 118 may search, at step 316 , the key-value table of the deduplication vault storage 108 to determine that the 2nd hash H 27 does not match any key in the key-value table, or to determine that the 2nd hash H 38 does match a key at position 108 ( 4 ) in the key-value table, as disclosed in FIG. 2B .
- the backup phase 302 of the method 300 may include steps 318 and 320 . Otherwise (Yes at step 316 ), the backup phase 302 of the method 300 may proceed directly to the step 322 .
- the backup phase 302 of the method 300 may include a step 318 in which the plain text block is encrypted, using an encrypt/decrypt function, using the 1st hash as an encryption password.
- the encryption module 124 may encrypt, at step 318 , the plain text block from position 110 ( 1 ) using an encrypt/decrypt function, using the 1st hash H 7 as an encryption password, resulting in an encrypted version of the plain text block from position 110 ( 1 ), as disclosed in FIG. 2B .
- the backup phase 302 of the method 300 may include a step 320 in which a key-value pair is inserted into the key-value table with the key being the 2nd hash and the value being the encrypted block.
- the deduplication module 118 may insert, at step 320 , a key-value pair into the key-value table at position 108 ( 1 ) with the key being the 2nd hash H 27 and the value being the encrypted version of the plain text block at position 110 ( 1 ), as disclosed in FIG. 2B .
- the backup phase 302 of the method 300 may include a step 322 in which an entry is inserted into an image map corresponding to the source storage that includes the 1st hash and a position of the plain text block as stored in the source storage.
- the deduplication module 118 may insert, at step 322 , an entry into the image map 202 corresponding to the source storage 110 that includes the 1st hash H 18 and position 110 ( 2 ), as disclosed in FIG. 2A , or that includes the 1st hash H 7 and position 110 ( 1 ), as disclosed in FIG. 2B .
- the backup phase 302 of the method 300 may include a step 324 in which it is determined whether all appropriate blocks to be included in the backup have been read from the source storage.
- all unique allocated blocks may be identified, and in the case of an incremental, only unique allocated blocks that have potentially been modified may be identified.
- the deduplication module 118 may determine, at step 324 , whether all of the allocated blocks at positions 110 ( 1 ), 110 ( 2 ), 110 ( 4 ), 110 ( 6 ), and 110 ( 7 ) have been read from the source storage 110 , as disclosed in FIG. 2B .
- step 324 If it is determined at step 324 that all allocated blocks have not been read from the source storage 110 (No at step 324 ), then the method 300 returns to step 310 where the next allocated block is read from the source storage 110 . Otherwise (Yes at step 324 ), the backup phase 302 of the method 300 is complete, and the method 300 proceeds to step 326 of the restore phase 304 .
- a backup of the source storage 110 will have been stored in the deduplication vault storage 108 .
- the backup of the source storage 110 as stored in the deduplication vault storage 108 has been reduced in size due to not storing multiple copies of the blocks from positions 110 ( 2 ) and 110 ( 7 ), as disclosed in FIG. 2B .
- the total overall size of the backups will likely be reduced in size due to the elimination of duplicate blocks across the backups.
- the deduplication vault storage 108 is configured to store each of the plain text blocks of the source storage 110 included in the backup as encrypted blocks, thus reducing the potential for an unauthorized user, such as a user from Company B, to access the original plain-text blocks, except for those blocks that are included in a backup of the unauthorized user.
- the restore phase 304 of the method 300 may include a step 326 in which an entry is read in the image map.
- the deduplication module 118 may read, at step 326 , the first entry in the image map 202 , which includes the 1st hash H 18 and source position 110 ( 2 ), as disclosed in FIG. 2B .
- the restore phase 304 of the method 300 may include a step 328 in which the 1st hash included in the entry is hashed, using the 2nd cryptographic hash function, to generate the 2nd hash.
- the encryption module 124 may hash, at step 328 , the 1st hash H 18 , using the 2nd cryptographic hash function, to generate the 2nd hash H 38 , as disclosed in FIG. 2B .
- the restore phase 304 of the method 300 may include a step 330 in which the key-value table is searched to retrieve the encrypted block of the key-value pair having a key that matches the 2nd hash.
- the deduplication module 118 may search, at step 330 , the key-value table of the deduplication vault storage 108 to retrieve the encrypted block of the key-value pair at position 108 ( 4 ) that has a key that matches the 2nd hash H 38 , as disclosed in FIG. 2B .
- the restore phase 304 of the method 300 may include a step 332 in which the encrypted block is decrypted, using the encrypt/decrypt function, and using the 1st hash as a decryption password.
- the encryption module 124 may decrypt, at step 332 , the encrypted block, using the encrypt/decrypt function, and using the 1st hash H 18 as a decryption password, resulting in the plain text block from position 110 ( 2 ) of the source storage 110 , as disclosed in FIG. 2B .
- the restore phase 304 of the method 300 may include a step 334 in which the decrypted block is stored in a restore storage at the position included in the entry.
- the encryption module 124 may store, at step 334 , the decrypted block in the source storage 110 , where the source storage 110 is functioning as a restore storage, in the position 110 ( 2 ), as disclosed in FIG. 2B .
- the restore phase 304 of the method 300 may include a step 336 in which it is determined whether all entries have been read from the image map.
- the deduplication module 118 may determine, at step 336 , whether all of the entries have been read from the image map 202 , as disclosed in FIG. 2B . If it is determined at step 336 that all entries have not been read from the image map 202 (No at step 336 ), then the method 300 returns to step 326 where the next entry is read from the image map 202 . Otherwise (Yes at step 336 ), the restore phase 304 of the method 300 is complete, and the method 300 proceeds to step 338 of the backup phase 306 .
- a backup of the source storage 110 that was stored in the deduplication vault storage 108 will have been restored to a restore storage.
- the restoration of the backup of the source storage 110 involves the backup remaining securely encrypted until being decrypted at the source system 104 , thus reducing the potential for an unauthorized user, such as a user from Company B, to access the original plain-text blocks, except for those blocks that are included in a backup of the unauthorized user.
- the backup phase 306 and the restore phase 308 of the method 300 are similar in many respects to the backup phase 302 and the restore phase 304 of the method 300 , the main difference being that the backup phase 306 and the restore phase 308 are performed on the source system 106 of Company B instead of on the source system 104 of Company A.
- the backup phase 306 of the method 300 may include a step 338 in which an allocated plain text block is read from the source storage.
- the encryption module 126 may read, at step 338 , the plain text block at position 112 ( 1 ) or 112 ( 2 ) from the source storage 112 , as disclosed in FIG. 2C .
- the backup phase 306 of the method 300 may include a step 340 in which the plain text block is hashed, using the same 1st cryptographic hash function used in the step 312 , to generate a 4th hash.
- the encryption module 126 may hash, at step 340 , the plain text block from position 112 ( 1 ) or 112 ( 2 ) using the 1st cryptographic hash function to generate a 4th hash, such as the 4th hash H 47 or the 4th hash H 3 , as disclosed in FIG. 2C .
- the backup phase 306 of the method 300 may include a step 342 in which the 4th hash is hashed, using the same 2nd cryptographic hash function used in step 314 , to generate a 5th hash.
- the encryption module 126 may hash, at step 342 , the 4th hash H 47 or the 4th hash H 3 using the 2nd cryptographic hash function to generate the 5th hash H 67 or the 5th hash H 23 , as disclosed in FIG. 2C .
- the backup phase 306 of the method 300 may include a step 344 in which a key-value table of a deduplication vault is searched to determine whether the 5th hash matches any key in the key-value table.
- the deduplication module 118 may search, at step 344 , the key-value table of the deduplication vault storage 108 to determine that the 5th hash H 67 does not match any key in the key-value table, or to determine that the 5th hash H 23 does match a key at position 108 ( 2 ) in the key-value table, as disclosed in FIG. 2C .
- the backup phase 306 of the method 300 may include steps 346 and 348 . Otherwise (Yes at step 344 ), the backup phase 306 of the method 300 may proceed directly to the step 350 .
- the backup phase 306 of the method 300 may include a step 346 in which the plain text block is encrypted, using an encrypt/decrypt function, using the 4th hash as an encryption password.
- the encryption module 126 may encrypt, at step 346 , the plain text block from position 112 ( 1 ) using an encrypt/decrypt function, using the 4th hash H 47 as an encryption password, resulting in an encrypted version of the plain text block from position 112 ( 1 ), as disclosed in FIG. 2D .
- the backup phase 306 of the method 300 may include a step 348 in which a key-value pair is inserted into the key-value table with the key being the 5th hash and the value being the encrypted block.
- the deduplication module 118 may insert, at step 348 , a key-value pair into the key-value table at position 108 ( 9 ) with the key being the 5th hash H 67 and the value being the encrypted version of the plain text block from position 112 ( 1 ), as disclosed in FIG. 2D .
- the backup phase 306 of the method 300 may include a step 350 in which an entry is inserted into an image map corresponding to the source storage that includes the 4th hash and a position of the plain text block as stored in the source storage.
- the deduplication module 118 may insert, at step 350 , an entry into the image map 204 corresponding to the source storage 112 that includes the 4th hash H 3 and position 112 ( 2 ), as disclosed in FIG. 2C , or that includes the 4th hash H 47 and position 112 ( 1 ), as disclosed in FIG. 2D .
- the backup phase 306 of the method 300 may include a step 352 in which it is determined whether all appropriate blocks to be included in the backup have been read from the source storage.
- the deduplication module 118 may determine, at step 352 , whether all of the allocated blocks at positions 112 ( 1 ), 112 ( 2 ), 112 ( 3 ), and 112 ( 5 ) have been read from the source storage 112 , as disclosed in FIG. 2D . If it is determined at step 352 that all allocated blocks have not been read from the source storage 112 (No at step 352 ), then the method 300 returns to step 338 where the next allocated block is read from the source storage 112 . Otherwise (Yes at step 352 ), the backup phase 306 of the method 300 is complete, and the method 300 proceeds to step 354 of the restore phase 308 .
- a backup of the source storage 112 will have been stored in the deduplication vault storage 108 , along with the backup of the source storage 110 .
- the backup of the source storage 112 as stored in the deduplication vault storage 108 has been reduced in size due to not storing multiple copies of the duplicate blocks from positions 112 ( 2 ) and 112 ( 5 ), as disclosed in FIG. 2D .
- the method 300 is employed to encrypt the duplicate plain-text block from position 110 ( 4 ) and position 112 ( 2 ) in such a way that only a single encrypted block is stored in position 108 ( 2 ) in the deduplication vault storage 108 for this duplicate block.
- the method 300 is employed to encrypt the duplicate plain-text block from position 110 ( 2 ) and position 112 ( 5 ) in such a way that only a single encrypted block is stored in position 108 ( 4 ) in the deduplication vault storage 108 for this duplicate block. Therefore, unlike standard deduplication vaults, which either store a single plain-text deduplicated block or store a single plain-text block in two different encrypted forms, the method 300 disclosed herein employs client-side encryption with deduplication which enables sensitive blocks to remain secure within the deduplication vault storage 108 even while redundancy within and across the source storages 110 and 112 is reduced or eliminated.
- the restore phase 308 of the method 300 may include a step 354 in which an entry is read in the image map.
- the deduplication module 118 may read, at step 354 , the first entry in the image map 204 which includes the 4th hash H 3 and source position 112 ( 2 ), as disclosed in FIG. 2D .
- the restore phase 308 of the method 300 may include a step 356 in which the 4th hash included in the entry is hashed, using the 2nd cryptographic hash function, to generate the 5th hash.
- the encryption module 126 may hash, at step 356 , the 4th hash H 3 , using the 2nd cryptographic hash function, to generate the 5th hash H 23 , as disclosed in FIG. 2D .
- the restore phase 308 of the method 300 may include a step 358 in which the key-value table is searched to retrieve the encrypted block of the key-value pair having a key that matches the 5th hash.
- the deduplication module 118 may search, at step 358 , the key-value table of the deduplication vault storage 108 to retrieve the encrypted block of the key-value pair at position 108 ( 2 ) that has a key that matches the 5th hash H 23 , as disclosed in FIG. 2D .
- the restore phase 308 of the method 300 may include a step 360 in which the encrypted block is decrypted, using the encrypt/decrypt function, and using the 4th hash as a decryption password.
- the encryption module 126 may decrypt, at step 360 , the encrypted block, using the encrypt/decrypt function, and using the 4th hash H 3 as a decryption password, resulting in the plain text block from position 112 ( 2 ) of the source storage 112 , as disclosed in FIG. 2D .
- the restore phase 308 of the method 300 may include a step 362 in which the decrypted block is stored in a restore storage at the position included in the entry.
- the encryption module 126 may store, at step 362 , the decrypted block in the source storage 112 , where the source storage 112 is functioning as a restore storage, in the position 112 ( 2 ), as disclosed in FIG. 2D .
- the restore phase 308 of the method 300 may include a step 364 in which it is determined whether all entries have been read from the image map.
- the deduplication module 118 may determine, at step 364 , whether all of the entries have been read from the image map 204 , as disclosed in FIG. 2D . If it is determined at step 364 that all entries have not been read from the image map 204 (No at step 364 ), then the method 300 returns to step 354 where the next entry is read from the image map 204 . Otherwise (Yes at step 364 ), the restore phase 308 of the method 300 is complete.
- a backup of the source storage 112 that was stored in the deduplication vault storage 108 will have been restored to a restore storage.
- the restoration of the backup of the source storage 112 involves the backup remaining securely encrypted until being decrypted at the source system 106 , thus reducing the potential for an unauthorized user, such as a user from Company A, to access the original plain-text blocks, except for those blocks that are included in a backup of the unauthorized user.
- FIGS. 4A-4D are schematic diagrams illustrating client-side encryption 400 in the deduplication backup system 100 .
- the client-side encryption 400 may be implemented, in at least some embodiments, with similar events occurring prior to the client-side encryption 400 as occurred prior to the client-side encryption 200 discussed above.
- a snapshot is taken of the source storage 110 at time t( 0 ) and allocated plain text blocks at positions 110 ( 1 ), 110 ( 2 ), 110 ( 4 ), 110 ( 6 ), and 110 ( 7 ) are targeted to be included in the base backup A of the source storage 110 .
- Each of these blocks is then read from the source storage 110 , hashed, using the 1st cryptographic hash function, to generate a 1st hash, and then encrypted, using the encrypt/decrypt function, using the 1st hash as an encryption password.
- the encrypted block is then hashed, using the 2nd cryptographic hash function, to generate a 3rd hash.
- the 3rd hash matches any key in the key-value table of the deduplication vault storage 108 .
- the 3rd hash H 118 matches the key at position 108 ( 4 ) in the key-value table, while the 3rd hashes H 107 , H 103 , and H 109 do not match any key in the key value table.
- an entry is inserted into an image map 402 corresponding to the base backup A of the source storage 110 that includes the corresponding 1st hash H 18 , the corresponding 3rd hash H 118 , and the position 110 ( 2 ) of the plain text block as stored in the source storage 110 .
- a snapshot is then taken of the source storage 112 at time t( 1 ) and allocated plain text blocks at positions 112 ( 1 ), 112 ( 2 ), 112 ( 3 ), and 112 ( 5 ) are targeted to be included in the base backup B of the source storage 112 .
- Each of these blocks is then read from the source storage 112 , hashed, using the 1st cryptographic hash function, to generate a 1st hash, and then encrypted, using the encrypt/decrypt function, using the 1st hash as an encryption password.
- the encrypted block is then hashed, using the 2nd cryptographic hash function, to generate a 3rd hash.
- the 3rd hash matches any key in the key-value table of the deduplication vault storage 108 .
- the 3rd hashes H 103 and H 118 match the keys at positions 108 ( 2 ) and 108 ( 4 ), respectively, in the key-value table, while the 3rd hashes H 147 and H 151 do not match any key in the key value table.
- entries are inserted into an image map 404 corresponding to the base backup B of the source storage 112 that each includes the corresponding 1st hash, the corresponding 3rd hash, and the position of the plain text block as stored in the source storage 112 .
- plain-text blocks of the source storages 110 and 112 may be encrypted at the source system 104 of Company A and at the source system 106 of Company B prior to sending the blocks to the deduplication vault storage 108 , which may result in benefits similar to those discussed above in connection with the client-side encryption 200 of FIGS. 2A-2D .
- the client-side encryption 400 may additionally include the added benefit of preventing the key-value table of the deduplication vault storage 108 from being “poisoned” by the malicious or inadvertent insertion of an encrypted block as a value that does not match the hash inserted as its corresponding key.
- any “poisoning” of the key-value table may be prevented in the client-side encryption 400 because each 3rd hash inserted into the key-value table can be verified to match its corresponding encrypted block by rehashing the encrypted block using the 2nd cryptographic hash function, and comparing the results of the rehash operation with the 3rd hash, where if the comparison is not identical then the insert is deemed to be a poisoning attempt and is therefore rejected.
- FIGS. 5A-5B is a schematic flowchart illustrating a second example method 500 for client-side encryption in the deduplication backup system 100 .
- the method 500 may be implemented, in at least some embodiments, in a similar manner as the method 300 discussed above.
- the method 500 will now be discussed with reference to FIGS. 1, 4A-4D , and 5 A- 5 B.
- the method 500 may include a backup phase 502 for Company A, a restore phase 504 for Company A, a backup phase 506 for Company B, and a restore phase 508 for Company B.
- the backup phase 502 of the method 500 may include a step 510 in which an allocated plain text block is read from the source storage.
- the encryption module 124 may read, at step 510 , the plain text block at position 110 ( 1 ) or 110 ( 2 ) from the source storage 110 , as disclosed in FIG. 4A .
- the backup phase 502 of the method 500 may include a step 512 in which the plain text blocks is hashed, using a 1st cryptographic hash function, to generate a 1st hash.
- the encryption module 124 may hash, at step 512 , the plain text block from position 110 ( 1 ) or 110 ( 2 ) using the 1st cryptographic hash function to generate the 1st hash H 7 or the 1st hash H 18 , as disclosed in FIG. 4A .
- the backup phase 502 of the method 500 may include a step 514 in which the plain text block is encrypted, using the encrypt/decrypt function, using the 1st hash as an encryption password.
- the encryption module 124 may encrypt, at step 514 , the plain text block from position 110 ( 1 ) using the encrypt/decrypt function, using the 1st hash H 7 as an encryption password, resulting in an encrypted version of the plain text block from position 110 ( 1 ), as disclosed in FIG. 4A .
- the encryption module 124 may encrypt, at step 514 , the plain text block from position 110 ( 2 ) using the encrypt/decrypt function, using the 1st hash H 18 as an encryption password, resulting in an encrypted version of the plain text block from position 110 ( 2 ), as disclosed in FIG. 4A .
- the backup phase 502 of the method 500 may include a step 516 in which the encrypted block is hashed, using the 2nd cryptographic hash function, to generate a 3rd hash.
- the encryption module 124 may hash, at step 516 , the encrypted block corresponding to the plain text block at position 110 ( 1 ) or position 110 ( 2 ) using the 2nd cryptographic hash function to generate the 3rd hash H 107 or the 3rd hash H 118 , as disclosed in FIG. 4A .
- the backup phase 502 of the method 500 may include a step 518 in which a key-value table of a deduplication vault is searched to determine whether the 3rd hash matches any key in the key-value table.
- the deduplication module 118 may search, at step 518 , the key-value table of the deduplication vault storage 108 to determine that the 3rd hash H 107 does not match any key in the key-value table, or to determine that the 3rd hash H 118 does match a key at position 108 ( 4 ) in the key-value table, as disclosed in FIG. 4A .
- the backup phase 502 of the method 500 may include step 520 . Otherwise (Yes at step 518 ), the backup phase 502 of the method 500 may proceed directly to step 522 .
- the backup phase 502 of the method 500 may include a step 520 in which a key-value pair is inserted into the key-value table with the key being the 3rd hash and the value being the encrypted block.
- the deduplication module 118 may insert, at step 520 , a key-value pair into the key-value table at position 108 ( 1 ) with the key being the 3rd hash H 107 and the value being the encrypted version of the plain text block at position 110 ( 1 ), as disclosed in FIG. 4B .
- the backup phase 502 of the method 500 may include a step 522 in which an entry is inserted into an image map corresponding to the source storage that includes the 1st hash, the 3rd hash, and a position of the plain text block as stored in the source storage.
- the deduplication module 118 may insert, at step 522 , an entry into the image map 402 corresponding to the source storage 110 that includes the 1st hash H 18 , the third hash H 118 , and position 110 ( 2 ) of the plain text block as stored in the source storage 110 , as disclosed in FIG. 4A , or that includes the 1st hash H 7 , the 3rd hash H 107 , and position 110 ( 1 ) of the plain text block as stored in the source storage 110 , as disclosed in FIG. 4B .
- the backup phase 502 of the method 500 may include a step 524 in which it is determined whether all appropriate blocks to be included in the backup have been read from the source storage.
- the deduplication module 118 may determine, at step 524 , whether all of the allocated blocks at positions 110 ( 1 ), 110 ( 2 ), 110 ( 4 ), 110 ( 6 ), and 110 ( 7 ) have been read from the source storage 110 , as disclosed in FIG. 2B . If it is determined at step 524 that all allocated blocks have not been read from the source storage 110 (No at step 524 ), then the method 500 returns to step 510 where the next allocated block is read from the source storage 110 . Otherwise (Yes at step 524 ), the backup phase 502 of the method 500 is complete, and the method 500 proceeds to step 526 of the restore phase 504 .
- a backup of the source storage 110 will have been stored in the deduplication vault storage 108 .
- the backup of the source storage 110 as stored in the deduplication vault storage 108 has been reduced in size due to not storing multiple copies of the duplicate blocks from positions 110 ( 2 ) and 110 ( 7 ), as disclosed in FIG. 4B .
- the total overall size of the backups will likely be reduced in size due to the elimination of duplicate blocks across the backups.
- the deduplication vault storage 108 is configured to store each of the plain text blocks of the source storage 110 included in the backup as encrypted blocks, thus reducing the potential for an unauthorized user, such as a user from Company B, to access the original plain-text blocks, except for those blocks that are included in a backup of the unauthorized user.
- the restore phase 504 of the method 500 may include a step 526 in which an entry is read in the image map.
- the deduplication module 118 may read, at step 526 , the first entry in the image map 402 which includes the 1st hash H 18 , the 3rd hash H 118 , and source position 110 ( 2 ), as disclosed in FIG. 4B .
- the restore phase 504 of the method 500 may include a step 528 in which the key-value table is searched to retrieve the encrypted block of the key-value pair having a key that matches the 3rd hash.
- the deduplication module 118 may search, at step 528 , the key-value table of the deduplication vault storage 108 to retrieve the encrypted block of the key-value pair at position 108 ( 4 ) that has a key that matches the 3rd hash H 118 , as disclosed in FIG. 4B .
- the restore phase 504 of the method 500 may include a step 530 in which the encrypted block is decrypted, using the encrypt/decrypt function, and using the 1st hash as a decryption password.
- the encryption module 124 may decrypt, at step 530 , the encrypted block, using the encrypt/decrypt function, and using the 1st hash H 18 as a decryption password, resulting in the plain text block from position 110 ( 2 ) of the source storage 110 , as disclosed in FIG. 4B .
- the restore phase 504 of the method 500 may include a step 532 in which the decrypted block is stored in a restore storage at the position included in the entry.
- the encryption module 124 may store, at step 532 , the decrypted block in the source storage 110 , where the source storage 110 is functioning as a restore storage, in the position 110 ( 2 ), as disclosed in FIG. 4B .
- the restore phase 504 of the method 500 may include a step 534 in which it is determined whether all entries have been read from the image map.
- the deduplication module 118 may determine, at step 534 , whether all of the entries have been read from the image map 402 , as disclosed in FIG. 4B . If it is determined at step 534 that all entries have not been read from the image map 402 (No at step 534 ), then the method 500 returns to step 526 where the next entry is read from the image map 402 . Otherwise (Yes at step 534 ), the restore phase 504 of the method 500 is complete, and the method 500 proceeds to step 536 of the backup phase 506 .
- a backup of the source storage 110 that was stored in the deduplication vault storage 108 will have been restored to a restore storage.
- the restoration of the backup of the source storage 110 involves the backup remaining securely encrypted until being decrypted at the source system 104 , thus reducing the potential for an unauthorized user, such as a user from Company B, to access the original plain-text blocks, except for those blocks that are included in a backup of the unauthorized user.
- the backup phase 506 and the restore phase 508 of the method 500 are similar in many respects to the backup phase 502 and the restore phase 504 of the method 500 , the main difference being that the backup phase 506 and the restore phase 508 are performed on the source system 106 of Company B instead of on the source system 104 of Company A.
- the backup phase 506 of the method 500 may include a step 536 in which an allocated plain text block is read from the source storage.
- the encryption module 126 may read, at step 536 , the plain text block at position 112 ( 1 ) or 112 ( 2 ) from the source storage 112 , as disclosed in FIG. 4C .
- the backup phase 506 of the method 500 may include a step 538 in which the plain text block is hashed, using the same 1st cryptographic hash function used in the step 512 , to generate a 4th hash.
- the encryption module 126 may hash, at step 538 , the plain text block from position 112 ( 1 ) or 112 ( 2 ) using the 1st cryptographic hash function to generate the 4th hash H 47 or the 4th hash H 3 , as disclosed in FIG. 4C .
- the backup phase 506 of the method 500 may include a step 540 in which the plain text block is encrypted, using the encrypt/decrypt function, using the 4th hash as an encryption password.
- the encryption module 126 may encrypt, at step 540 , the plain text block from position 112 ( 1 ) using an encrypt/decrypt function, using the 4th hash H 47 as an encryption password, resulting in an encrypted version of the plain text block from position 112 ( 1 ), as disclosed in FIG. 4C .
- the encryption module 126 may encrypt, at step 540 , the plain text block from position 112 ( 2 ) using the encrypt/decrypt function, using the 4th hash H 3 as an encryption password, resulting in an encrypted version of the plain text block from position 110 ( 2 ), as disclosed in FIG. 4C .
- the backup phase 502 of the method 500 may include a step 542 in which the encrypted block is hashed, using the same 2nd cryptographic hash function used in step 516 , to generate a 6th hash.
- the encryption module 126 may hash, at step 542 , the encrypted block corresponding to the plain text block at position 112 ( 1 ) or position 112 ( 2 ) using the 2nd cryptographic hash function to generate the 6th hash H 147 or the 6th hash H 103 , respectively, as disclosed in FIG. 4C .
- the backup phase 506 of the method 500 may include a step 544 in which a key-value table of a deduplication vault is searched to determine whether the 6th hash matches any key in the key-value table.
- the deduplication module 118 may search, at step 544 , the key-value table of the deduplication vault storage 108 to determine that the 6th hash H 147 does not match any key in the key-value table, or to determine that the 6th hash H 103 does match a key at position 108 ( 2 ) in the key-value table, as disclosed in FIG. 4C .
- the backup phase 506 of the method 500 may include step 546 . Otherwise (Yes at step 544 ), the backup phase 506 of the method 500 may proceed directly to step 548 .
- the backup phase 506 of the method 500 may include a step 546 in which a key-value pair is inserted into the key-value table with the key being the 6th hash and the value being the encrypted block.
- the deduplication module 118 may insert, at step 546 , a key-value pair into the key-value table at position 108 ( 9 ) with the key being the 6th hash H 147 and the value being the encrypted version of the plain text block 112 ( 1 ), as disclosed in FIG. 4D .
- the backup phase 506 of the method 500 may include a step 548 in which an entry is inserted into an image map corresponding to the source storage that includes the 4th hash, the 6th hash, and a position of the plain text block as stored in the source storage.
- the deduplication module 118 may insert, at step 548 , an entry into the image map 404 corresponding to the source storage 112 that includes the 4th hash H 3 , the 6th hash H 103 , and position 112 ( 2 ) of the plain text block as stored in the source storage 112 , as disclosed in FIG. 4C , or that includes the 4th hash H 47 , the 6th hash H 147 , and position 112 ( 1 ) of the plain text block as stored in the source storage 112 , as disclosed in FIG. 4D .
- the backup phase 506 of the method 500 may include a step 550 in which it is determined whether all appropriate blocks to be included in the backup have been read from the source storage.
- the deduplication module 118 may determine, at step 550 , whether all of the allocated blocks at positions 112 ( 1 ), 112 ( 2 ), 112 ( 3 ), and 112 ( 5 ) have been read from the source storage 112 , as disclosed in FIG. 4D . If it is determined at step 550 that all allocated blocks have not been read from the source storage 112 (No at step 550 ), then the method 500 returns to step 536 where the next allocated block is read from the source storage 112 . Otherwise (Yes at step 550 ), the backup phase 506 of the method 500 is complete, and the method 500 proceeds to step 552 of the restore phase 508 .
- a backup of the source storage 112 will have been stored in the deduplication vault storage 108 , along with the backup of the source storage 110 .
- the backup of the source storage 112 as stored in the deduplication vault storage 108 has been reduced in size due to not storing multiple copies of the duplicate blocks from positions 112 ( 2 ) and 112 ( 5 ), as disclosed in FIG. 4D .
- the method 500 is employed to encrypt the duplicate plain-text block from position 110 ( 4 ) and position 112 ( 2 ) in such a way that only a single encrypted block is stored in position 108 ( 2 ) in the deduplication vault storage 108 for this duplicate block.
- the method 500 is employed to encrypt the duplicate plain-text block from position 110 ( 2 ) and position 112 ( 5 ) in such a way that only a single encrypted block is stored in position 108 ( 4 ) in the deduplication vault storage 108 for this duplicate block. Therefore, unlike standard deduplication vaults which store a single plain-text deduplicated block, or store a single plain-text block in two different encrypted forms, the method 500 disclosed herein employs client-side encryption with deduplication which enables sensitive blocks to remain secure within the deduplication vault storage 108 even while redundancy within and across the source storages 110 and 112 is reduced or eliminated.
- the restore phase 508 of the method 500 may include a step 552 in which an entry is read in the image map.
- the deduplication module 118 may read, at step 552 , the first entry in the image map 404 which includes the 4th hash H 3 , the 6th hash H 103 , and source position 112 ( 2 ), as disclosed in FIG. 4D .
- the restore phase 508 of the method 500 may include a step 554 in which the key-value table is searched to retrieve the encrypted block of the key-value pair having a key that matches the 6th hash.
- the deduplication module 118 may search, at step 554 , the key-value table of the deduplication vault storage 108 to retrieve the encrypted block of the key-value pair at position 108 ( 2 ) that has a key that matches the 6th hash H 103 , as disclosed in FIG. 4D .
- the restore phase 508 of the method 500 may include a step 556 in which the encrypted block is decrypted, using the encrypt/decrypt function, and using the 4th hash as a decryption password.
- the encryption module 126 may decrypt, at step 556 , the encrypted block, using the encrypt/decrypt function, and using the 4th hash H 3 as a decryption password, resulting in the plain text block from position 112 ( 2 ) of the source storage 112 , as disclosed in FIG. 4D .
- the restore phase 508 of the method 500 may include a step 558 in which the decrypted block is stored in a restore storage at the position included in the entry.
- the encryption module 126 may store, at step 558 , the decrypted block in the source storage 112 , where the source storage 112 is functioning as a restore storage, in the position 112 ( 2 ), as disclosed in FIG. 4D .
- the restore phase 508 of the method 500 may include a step 560 in which it is determined whether all entries have been read from the image map.
- the deduplication module 118 may determine, at step 560 , whether all of the entries have been read from the image map 404 , as disclosed in FIG. 4D . If it is determined at step 560 that all entries have not been read from the image map 404 (No at step 560 ), then the method 500 returns to step 552 where the next entry is read from the image map 404 . Otherwise (Yes at step 560 ), the restore phase 508 of the method 500 is complete.
- a backup of the source storage 112 that was stored in the deduplication vault storage 108 will have been restored to a restore storage.
- the restoration of the backup of the source storage 112 involves the backup remaining securely encrypted until being decrypted at the source system 106 , thus reducing the potential for an unauthorized user, such as a user from Company A, to access the original plain-text blocks, except for those blocks that are included in a backup of the unauthorized user.
- the foregoing discussion of the methods 300 and 500 are but two possible implementations of client-side encryption in a deduplication backup system, and various modifications are possible and contemplated. For example, these methods may be modified to remove the steps or portions of steps that involve restoring a backup to a restore storage. Further, although the methods 300 and 500 are discussed above as being performed by the deduplication module 118 , the encryption module 124 , and the encryption module 126 , it is understood that the methods 300 and 500 may alternatively be performed by the deduplication module 118 , the encryption module 124 , and the encryption module 126 exclusively or by some other module or combination of modules.
- inventions described herein may include the use of a special-purpose or general-purpose computer, including various computer hardware or software modules, as discussed in greater detail below.
- Embodiments described herein may be implemented using non-transitory computer-readable media for carrying or having computer-executable instructions or data structures stored thereon.
- Such computer-readable media may be any available media that may be accessed by a general-purpose or special-purpose computer.
- Such computer-readable media may include non-transitory computer-readable storage media including RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other storage medium which may be used to carry or store one or more desired programs having program code in the form of computer-executable instructions or data structures and which may be accessed and executed by a general-purpose computer, special-purpose computer, or virtual computer such as a virtual machine. Combinations of the above may also be included within the scope of computer-readable media.
- Computer-executable instructions comprise, for example, instructions and data which, when executed by one or more processors, cause a general-purpose computer, special-purpose computer, or virtual computer such as a virtual machine to perform a certain method, function, or group of methods or functions.
- module may refer to software objects or routines that execute on a computing system.
- the different modules or filters described herein may be implemented as objects or processes that execute on a computing system (e.g., as separate threads). While the system and methods described herein are preferably implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Databases & Information Systems (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- This application is a continuation of U.S. patent application Ser. No. 14/508,654, filed Oct. 7, 2014, and titled “CLIENT-SIDE ENCRYPTION IN A DEDUPLICATION BACKUP SYSTEM,” which is incorporated herein by reference in its entirety.
- The embodiments disclosed herein relate to client-side encryption in a deduplication backup system.
- A storage is computer-readable media capable of storing data in blocks. Storages face a myriad of threats to the data they store and to their smooth and continuous operation. In order to mitigate these threats, a backup of the data in a storage may be created at a particular point in time to enable the restoration of the data at some future time. Such a restoration may become desirable, for example, if the storage experiences corruption of its stored data, if the storage becomes unavailable, or if a user wishes to create a second identical storage.
- A storage is typically logically divided into a finite number of fixed-length blocks. A storage also typically includes a file system which tracks the locations of the blocks that are allocated to each file that is stored in the storage. The file system also tracks the blocks that are not allocated to any file. The file system generally tracks allocated and free blocks using specialized data structures, referred to as file system metadata. File system metadata is also stored in designated blocks in the storage.
- Various techniques exist for backing up a source storage. One common technique involves backing up individual files stored in the source storage on a per-file basis. This technique is often referred to as file backup. File backup uses the file system of the source storage as a starting point and performs a backup by writing the files to a destination storage. Using this approach, individual files are backed up if they have been modified since the previous backup. File backup may be useful for finding and restoring a few lost or corrupted files. However, file backup may also include significant overhead in the form of bandwidth and logical overhead because file backup requires the tracking and storing of information about where each file exists within the file system of the source storage and the destination storage.
- Another common technique for backing up a source storage ignores the locations of individual files stored in the source storage and instead simply backs up all allocated blocks stored in the source storage. This technique is often referred to as image backup because the backup generally contains or represents an image, or copy, of the entire allocated contents of the source storage. Using this approach, individual allocated blocks are backed up if they have been modified since the previous backup. Because image backup backs up all allocated blocks of the source storage, image backup backs up both the blocks that make up the files stored in the source storage as well as the blocks that make up the file system metadata. Also, because image backup backs up all allocated blocks rather than individual files, this approach does not necessarily need to be aware of the file system metadata or the files stored in the source storage, beyond utilizing minimal knowledge of the file system metadata in order to only back up allocated blocks, since free blocks are not generally backed up.
- Image backup can be relatively fast compared to file backup because reliance on the file system is minimized. An image backup can also be relatively fast compared to a file backup because seeking during image backup may be reduced. In particular, during image backup, blocks are generally read sequentially with relatively limited seeking. In contrast, during file backup, blocks that make up individual files may be scattered in the source storage, resulting in relatively extensive seeking.
- One common problem encountered when backing up multiple similar source storages to the same backup storage using image backup is the potential for redundancy within the backed-up data. For example, if multiple source storages utilize the same commercial operating system, such as WINDOWS® 8 Professional, they may store a common set of system files which will have identical blocks. If these source storages are backed up to the same backup storage, these identical blocks will be stored in the backup storage multiple times, resulting in redundant blocks. Redundancy in a backup storage may increase the overall size requirements of backup storage and increase the bandwidth overhead of transporting blocks to the backup storage.
- While this redundancy problem can be mitigated to a certain extent through the use of a deduplication vault, a standard deduplication vault, in order to deduplicate the blocks of a storage, must first receive the blocks from the computer system of the storage in unencrypted form, after which the deduplication vault will store the block if it is unique, or if the vault supports encryption it will encrypt and store the encrypted block if it is unique. In this way the standard deduplication vault will support deduplication of blocks from multiple systems. However, as the standard deduplication vault requires, at least temporarily, access to the unencrypted blocks, this provides an opportunity for these blocks to be compromised should the security of the deduplication vault be compromised or faulty. For this reason, encrypted deduplication vaults have been developed in which each block is encrypted by the source computer system prior to backing up the block into the encrypted deduplication vault, such that the deduplication vault, without being provided the decryption key, is unable to decrypt the encrypted blocks.
- While encrypted deduplication vaults have alleviated the concerns regarding unauthorized access to sensitive blocks, a common problem encountered during backup into an encrypted deduplication vault is that encrypted blocks may not be capable of deduplication across different clients. In particular, while the blocks that make up a commercial operating system or a standard application may be identical in their plain text form, encryption of two identical plain text blocks can result in differences in the encrypted versions of the blocks, as each client is likely to use its own unique encryption password. Thus, even if an identical plain text block is backed up across different source storages, the encrypted block that is actually stored in the deduplication vault may be different for each source storage, resulting in the identical plain text block being stored multiple times in different encrypted forms. As a result, the benefits of deduplication may be lost even when identical blocks are being backed up because different source systems may encrypt identical blocks differently, particularly if different encryption passwords are used on the different source systems.
- The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one example technology area where some embodiments described herein may be practiced.
- In general, example embodiments described herein relate to client-side encryption in a deduplication backup system. The example methods disclosed herein may be employed to encrypt plain-text blocks at a source system (i.e., a client) prior to sending the blocks to a deduplication vault system. This client-side encryption reduces the potential for an unauthorized user to access the original plain-text blocks even where the unauthorized user has access to the deduplication vault system. Further, the example methods disclosed herein may also be employed to encrypt plain-text blocks in such a way that only a single encrypted block is stored in the deduplication vault storage for each unique plain-text block that is backed-up across multiple source storages of multiple clients. Thus, the example methods disclosed herein employ client-side encryption with deduplication which enables sensitive blocks to remain secure within the deduplication vault storage even while redundancy within and across multiple source storages is reduced or eliminated. This may increase the number of blocks from a source storage that are already duplicated in the deduplication vault storage at the time that a backup of the source storage is created in the deduplication vault storage, thereby decreasing the number of blocks that must be copied from the source storage to the deduplication vault storage. Decreasing the number of blocks that must be copied from the source storage to the deduplication vault storage during the creation of a backup may result in decreased bandwidth overhead of transporting blocks to the deduplication vault storage and increased efficiency and speed during the creation of each backup.
- In one example embodiment, a method for client-side encryption in a deduplication backup system includes a backup phase in which various steps are performed for each allocated plain text block stored in a source storage at a point in time. One step includes hashing, using a first cryptographic hash function, the plain text block to generate a first hash. Another step includes hashing, using a second cryptographic hash function, the first hash to generate a second hash. Another step includes searching a key-value table of a deduplication storage to determine whether the second hash matches any key in the key-value table. In this step, each key-value pair in the key-value table includes a key that is a hash and a value that is an encrypted block. Another step includes, upon determining that the second hash does not match any key in the key-value table, encrypting, using an encrypt/decrypt function, the plain text block using the first hash as an encryption password and inserting a key-value pair into the key-value table with the key being the second hash and the value being the encrypted block. Another step includes inserting an entry into an image map corresponding to the source storage that includes the first hash and a position of the plain text block as stored in the source storage.
- In another example embodiment, a method for client-side encryption in a deduplication backup system includes a backup phase in which various steps are performed for each allocated plain text block stored in a source storage at a point in time. One step includes hashing, using a first cryptographic hash function, the plain text block to generate a first hash. Another step includes encrypting, using an encrypt/decrypt function, the plain text block using the first hash as an encryption password. Another step includes hashing, using a second cryptographic hash function, the encrypted block to generate a third hash. Another step includes searching a key-value table of a deduplication storage to determine whether the third hash matches any key in the key-value table. In this step, each key-value pair in the key-value table includes a key that is a hash and a value that is an encrypted block. Another step includes, upon determining that the third hash does not match any key in the key-value table, inserting a key-value pair into the key-value table with the key being the third hash and the value being the encrypted block. Another step includes inserting an entry into an image map corresponding to the source storage that includes the first hash, the third hash, and a position of the plain text block as stored in the source storage.
- It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention as claimed.
- Example embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
-
FIG. 1 is a schematic block diagram illustrating an example deduplication backup system; -
FIGS. 2A-2D are schematic diagrams illustrating client-side encryption in a deduplication backup system; -
FIGS. 3A-3B is a schematic flowchart illustrating a first example method for client-side encryption in a deduplication backup system; -
FIGS. 4A-4D are schematic diagrams illustrating client-side encryption in a deduplication backup system; and -
FIGS. 5A-5B is a schematic flowchart illustrating a second example method for client-side encryption in a deduplication backup system. - The term “storage” as used herein refers to computer-readable media, or some logical portion thereof such as a volume, capable of storing data in blocks. The term “block” as used herein refers to a fixed-length discrete sequence of bits. In some example embodiments, the size of each block may be configured to match the standard sector size of a file system of a storage on which the block is stored. For example, the size of each block may be 512 bytes (4096 bits) where 512 bytes is the size of a standard sector. The term “allocated block” as used herein refers to a block in a storage that is currently tracked as storing data by a file system of the storage. The term “free block” as used herein refers to a block in a storage that is not currently employed nor tracked as storing data by a file system of the storage. The term “backup,” when used herein as a noun, refers to a copy or copies of one or more blocks from a storage. The term “base backup” as used herein refers to a base backup of a storage that includes at least a copy of each unique allocated block of the storage at a point in time such that the base backup can be restored on its own to recreate the state of the storage at the point in time, without being dependent on any other backup. A “base backup” may also include nonunique allocated blocks and free blocks of the storage at the point in time. The term “incremental backup” as used herein refers to an at least partial backup of a storage that includes at least a copy of each unique allocated block of the storage that was modified between a previous point in time of a previous backup of the storage and the subsequent point in time of the incremental backup, such that the incremental backup, along with all previous backups of the storage, including an initial base backup of the storage, can be restored together to recreate the state of desired blocks of the storage at the subsequent point in time. The term “modified block” as used herein refers to a block that was modified either because the block was previously-allocated and changed or because the block was modified by being newly-allocated. An “incremental backup” may also include nonunique allocated blocks and free blocks of the storage that were modified between the previous point in time and the subsequent point in time. Only “unique allocated blocks” may be included in a “base backup” or an “incremental backup” where only a single copy of multiple duplicate allocated blocks (i.e., nonunique allocated blocks) is backed up to reduce the size of the backup. A “base backup” or an “incremental backup” may exclude certain undesired allocated blocks such as blocks belonging to files whose contents are not necessary for restoration purposes, such as virtual memory pagination files and machine hibernation state files.
-
FIG. 1 is a schematic block diagram illustrating an examplededuplication backup system 100. As disclosed inFIG. 1 , the examplededuplication backup system 100 includes adeduplication vault system 102, asource system 104 of Company A, and asource system 106 of Company B. Company A may be a competitor of Company B, such that users of thesource system 104 of Company A would not be authorized to access sensitive data stored in thesource system 106 of Company B, and vice-versa. Thesystems storages - The
deduplication vault storage 108 stores a base backup A and multiple incremental backups A that have been created of thesource storage 110 to represent the states of thesource storage 110 at various points in time. For example, the base backup A represents the state of thesource storage 110 at time t(0), the 1st incremental backup A represents the state of thesource storage 110 at time t(2), the 2nd incremental backup A represents the state of thesource storage 110 at time t(4), and the nth incremental backup A represents the state of thesource storage 110 at time t(2 n). Similarly, thededuplication vault storage 108 stores a base backup B and multiple incremental backups B that have been created of thesource storage 112 to represent the state of thesource storage 112 at various points in time. For example, the base backup B represents the state of thesource storage 112 at time t(1), the 1st incremental backup B represents the state of thesource storage 112 at time t(3), the 2nd incremental backup B represents the state of thesource storage 112 at time t(5), and the nth incremental backup B represents the state of thesource storage 112 at time t(2 n+1). Thededuplication vault system 102 also includes adatabase 114,metadata 116, and adeduplication module 118. Thesource systems encryption modules source systems deduplication vault system 102 over anetwork 120. - Each of the
systems network 120 may be any wired or wireless communication network including, for example, a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a Wireless Application Protocol (WAP) network, a Bluetooth network, an Internet Protocol (IP) network such as the internet, or some combination thereof. - The image backups stored in the
deduplication vault storage 108 may be created by thededuplication module 118. For example, thededuplication module 118 may be configured to execute computer instructions to perform image backup operations of creating a base backup and multiple incremental backups of the source storages 110 and of thesource storage 112. It is noted that these image backups may initially be created on thesource systems deduplication vault system 102. - For example, the base backup A may be created to capture the state of the
source storage 110 at time t(0). This image backup operation may include thededuplication module 118 copying all allocated blocks of thesource storage 110 as allocated at time t(0) and storing the allocated blocks in thededuplication vault storage 108. The state of thesource storage 110 at time t(0) may be captured using snapshot technology in order to capture the blocks stored in thesource storage 110 at time t(0) without interrupting other processes, thus avoiding downtime of thesource storage 110. The base backup A may be very large depending on the size of thesource storage 110 and the number of allocated blocks at time t(0). As a result, the base backup A may take a relatively long time to create and consume a relatively large amount of space in theduplication vault storage 108, depending on how many of the blocks included in the base backup A were already duplicated in theduplication vault storage 108 prior to the creation of the base backup A. - Next, the 1st and 2nd incremental backups A may be created to capture the states of the
source storage 110 at times t(2) and t(4), respectively. This may include copying only modified allocated blocks of thesource storage 110 present at time t(2) and storing the modified allocated blocks in thededuplication vault storage 108, then later copying only modified allocated blocks of thesource storage 110 present at time t(4) and storing the modified allocated blocks in thededuplication vault storage 108. The states of thesource storage 110 at times t(2) and t(4) may also be captured using snapshot technology, thus avoiding downtime of thesource storage 110. Each incremental backup A may include only those allocated blocks from thesource storage 110 that were modified after the time of the previous backup. Thus, the 1st incremental backup may include only those allocated blocks from thesource storage 110 that were modified between time t(0) and time t(2), and the 2nd incremental backup may include only those allocated blocks from thesource storage 110 that were modified between time t(2) and time t(4). In general, as compared to the base backup A, each incremental backup A may take a relatively short time to create and consume a relatively small storage space in thededuplication vault storage 108, depending on how many of the blocks included in the base backup A and the 1st and 2nd incremental backups A were already duplicated in theduplication vault storage 108 prior to the creation of the base backup A. - Finally, an nth incremental backup A may be created to capture the state of the
source storage 110 at time t(2 n). This may include copying only modified allocated blocks of thesource storage 110 present at time t(2 n), using snapshot technology, and storing the modified allocated blocks in thededuplication vault storage 108. The nth incremental backup A may include only those allocated blocks from thesource storage 110 that were modified between time t(2 n) and the point in time of the backup of thesource storage 110 that occurred just prior to the nth incremental backup A at time t(2 n). - The base backup B and the 1st, 2nd, and nth incremental backups B may be created in a similar manner as the creation of the base backup A and the 1st, 2nd, and nth incremental backups A, only instead of being created to represent the states at times t(0), t(2), t(4), and t(2 n), the base backup B and the 1st, 2nd, and nth incremental backups B are created to represent the states at times t(1), t(3), t(5), and t(2 n+1). As disclosed herein, a time with a label t(x) is at least as late in time as a time with a label t(x−1).
- Therefore, incremental backups may be created on an ongoing basis. The frequency of creating new incremental backups may be altered as desired in order to adjust the amount of data that will be lost should the
source storage source storage source storage source storage - Although only allocated blocks are included in the example base and incremental backups discussed above, it is understood that in alternative implementations both allocated and free blocks may be backed up during the creation of a base backup or an incremental backup. This is typically done for forensic purposes, because the contents of free blocks can be interesting where the free blocks contain data from a previous point in time when the blocks were in use and allocated. Therefore, the creation of base backups and incremental backups as disclosed herein is not limited to allocated blocks but may also include free blocks.
- Further, although only base backups and incremental backups are discussed above, it is understood that the
source storage source storage - The
database 114 and themetadata 116 may be employed to track information related to the source storages 110 and 112, thededuplication vault storage 108, and the backups of the source storages 110 and 112 that are stored in thededuplication vault storage 108. For example, thedatabase 114 and themetadata 116 may be identical or similar in structure and function to thedatabase 500 and the metadata 700 disclosed in related U.S. patent application Ser. No. 13/782,549, titled “MULTIPHASE DEDUPLICATION,” which was filed on Mar. 1, 2013 and is expressly incorporated herein by reference in its entirety. Subsequently, thededuplication module 118 and/or another module may restore each block that was stored in thesource storage - In one example embodiment, the
deduplication vault system 102 may be a file server, thesource system 104 may be a first desktop computer, thesource system 106 may be a second desktop computer, and thenetwork 120 may include the internet. In this example embodiment, the file server may be configured to periodically back up the storages of the first and second desktop computers over the internet as part of backup jobs by creating base backups and multiple incremental backups and storing them in the storage of the file server. The first and second desktop computers may also be configured to track modifications to their storages between backups in order to easily and quickly identify only those blocks that were modified for use in the creation of an incremental backup. The file server may also be configured to restore one or more of the backups to a storage of a restore computer over the internet if the first or second desktop computer experiences corruption of its storage or if the first or second desktop computer's storage becomes unavailable. - Although only a single storage is disclosed in each of the
systems FIG. 1 , it is understood that any of thesystems systems FIG. 1 as communicating over thenetwork 120, it is understood that thesystems storage storage storage - Further, although the
deduplication module 118, theencryption module 124, and theencryption module 126 are the only modules disclosed in the examplededuplication backup system 100 ofFIG. 1 , it is understood that the functionality of themodules systems source storages deduplication backup system 100 ofFIG. 1 , it is understood that thededuplication vault system 102 ofFIG. 1 may be configured to simultaneously back up many more source storages and/or to simultaneously restore many more restore storages. For example, the greater the number of source storages that are backed up to thededuplication vault storage 108, the greater the likelihood for reducing redundancy and for reducing the overall number of blocks being backed up, resulting in corresponding decreases in the bandwidth overhead of transporting blocks to thededuplication vault storage 108. - Having described one specific environment with respect to
FIG. 1 , it is understood that the specific environment ofFIG. 1 is only one of countless environments in which the example methods disclosed herein may be practiced. The scope of the example embodiments is not intended to be limited to any particular environment. -
FIGS. 2A-2D are schematic diagrams illustrating client-side encryption 200 in thededuplication backup system 100. Prior to the client-side encryption 200, thededuplication vault storage 108 may have been seeded with common blocks and/or various image backup operations of one or more backup jobs may have transpired, which will have resulted in the insertions of various blocks into thededuplication vault storage 108, such as the blocks at positions 108(4)-108(8). Further, prior to the client-side encryption 200, allocated blocks in the source storages 110 and 112 are identified as being appropriate for being backed up. In the case of a base backup, all allocated blocks may be identified, and in the case of an incremental, only allocated blocks that have potentially been modified may be identified. The client-side encryption 200 illustrates the creation of the base backup A of thesource storage 110 to represent the state of thesource storage 110 at time t(0) inFIGS. 2A-2B , and illustrates the creation of the base backup B of thesource storage 112 to represent the state of thesource storage 112 at time t(1) inFIGS. 2C-2D . Although the source storages 110 and 112 are each depicted with only eight blocks and thededuplication vault storage 108 is depicted with only sixteen blocks, it is understood that thestorages - As disclosed in
FIG. 2A , a snapshot is taken of thesource storage 110 at time t(0) and allocated plain text blocks at positions 110(1), 110(2), 110(4), 110(6), and 110(7) are targeted to be included in the base backup A of thesource storage 110. Each of these blocks is then read from thesource storage 110, hashed, using a 1st cryptographic hash function, to generate a 1st hash, and then the 1st hash is hashed, using a 2nd cryptographic hash function, to generate a 2nd hash. Next, it is determined whether the 2nd hash matches any key in the key-value table of thededuplication vault storage 108, where each key-value pair in the key-value table includes a key that is a hash and a value that is an encrypted block. As disclosed inFIG. 2A , only the 2nd hash H38 matches the key at position 108(4) in the key-value table, while the 2nd hashes H27, H23, and H29 do not match any key in the key value table. Next, an entry is inserted into animage map 202 corresponding to the base backup A of thesource storage 110 that includes the corresponding 1st hash H18 and the position 110(2) of the plain text block as stored in thesource storage 110. Where multiple items of data are included in the same entry in an image map, it is understood that the items are associated with one another and that this association is stored in the entry. Therefore, the inclusion of the 1st hash H18 and the position 110(2) into the same entry in theimage map 202 in this example associates the 1st hash H18 with the position 110(2). The image maps disclosed in the drawings may be implemented in themetadata 116 of theduplication vault system 102 ofFIG. 1 . Further, the image maps disclosed in the drawings may be stored in plain text or may themselves be encrypted. Also, the image maps disclosed in the drawings may each be stored locally in the source storage of the corresponding source system or may each be stored remotely in thededuplication vault storage 108 of thededuplication vault system 102. When the image map is encrypted, it may be encrypted after the backup phases disclosed herein, and then decrypted prior to the restore phases disclosed herein. - As disclosed in
FIG. 2B , since the 2nd hashes H27, H23, and H29 do not match any key in the key value table, each of their corresponding plain text blocks is encrypted, using an encrypt/decrypt function, using the 1st hash as an encryption password, and then a key-value pair is inserted into the key-value table with the key being the 2nd hash and the value being the encrypted block, and then an entry is inserted into theimage map 202 corresponding to thesource storage 110 that includes the 1st hash and a position of the plain text block as stored in thesource storage 110. It is noted that since the block at position 110(4) and the block at position 110(7) are duplicates, only the first instance of this duplicate block is encrypted and inserted into the key-value table, but entries for both of the duplicate blocks are inserted into theimage map 202. It is further noted that an “encrypt/decrypt function” may actually be two separate functions, one for encrypting and another for decrypting, in which case the “encrypt/decrypt function” is the combination of an encrypt function and a decrypt function. It is also noted that each block may be processed individually through each of the steps disclosed inFIGS. 2A and 2B , and below inFIGS. 2C and 2D , instead of a step being performed concurrently on all relevant blocks. - As disclosed in
FIG. 2C , a snapshot is then taken of thesource storage 112 at time t(1) and allocated plain text blocks at positions 112(1), 112(2), 112(3), and 112(5) are targeted to be included in the base backup B of thesource storage 112. Each of these blocks is then read from thesource storage 112, hashed, using the 1st cryptographic hash function, to generate a 1st hash, and then the 1st hash is hashed, using the 2nd cryptographic hash function, to generate a 2nd hash. Next, it is determined whether the 2nd hash matches any key in the key-value table of thededuplication vault storage 108. As disclosed inFIG. 2C , only the 2nd hashes H23 and H38 match the keys at positions 108(2) and 108(4), respectively, in the key-value table, while the 2nd hashes H67 and H71 do not match any key in the key value table. Next, entries are inserted into animage map 204 corresponding to the base backup B of thesource storage 112 that each includes the corresponding 1st hash and the position of the plain text block as stored in thesource storage 112. - As disclosed in
FIG. 2D , since the 2nd hashes H67 and H71 do not match any key in the key value table, each of their corresponding plain text blocks is encrypted, using the encrypt/decrypt function, using the 1st hash as an encryption password, then a key-value pair are inserted into the key-value table with the key being the 2nd hash and the value being the encrypted block, and an entry is inserted into theimage map 204 corresponding to the source storage that includes the 1st hash and a position of the plain text block as stored in thesource storage 110. - Therefore, during the client-
side encryption 200 ofFIGS. 2A-2D , plain-text blocks of the source storages 110 and 112 may be encrypted at thesource system 104 of Company A and at thesource system 106 of Company B prior to sending the blocks to thededuplication vault storage 108. This client-side encryption 200 reduces the potential for an unauthorized user to access the original plain-text blocks. Further, the client-side encryption 200 encrypts plain-text blocks in such a way that only a single encrypted block is stored in thededuplication vault storage 108 for each unique plain-text block that is backed up across the source storages 110 and 112. For example, only a single encrypted block is stored at position 108(4) of the key value table for the duplicate blocks at positions 110(2) and 112(5), and only a single encrypted block is stored at position 108(2) of the key value table for the duplicate blocks at positions 110(4), 110(7), and 112(2). Thus, the client-side encryption 200 employs client-side encryption with deduplication which enables sensitive blocks to remain secure within the key value table of thededuplication vault storage 108 even while redundancy within and across the source storages 110 and 112 is reduced or eliminated. As disclosed inFIGS. 2A-2D , since the blocks at positions 110(2), 110(7), 112(2), and 112(5) are already duplicated in thededuplication vault storage 108 at the time that the base backups A and B of the source storages 110 and 112 are created in thededuplication vault storage 108, these blocks do not need to be copied from the source storages 110 and 112 to thededuplication vault storage 108, resulting in decreased bandwidth overhead of transporting blocks to thededuplication vault storage 108 and increased efficiency and speed during the creation of the base backups A and B. -
FIGS. 3A-3B is a schematic flowchart illustrating afirst example method 300 for client-side encryption in thededuplication backup system 100. Themethod 300 may be implemented, in at least some embodiments, by thededuplication module 118 of thededuplication vault system 102, by theencryption module 124 of thesource system 104, and by theencryption module 126 of thesource system 106 ofFIG. 1 . For example, these modules may be configured to execute computer instructions to perform operations of client-side encryption of the source storages 110 and 112 prior to being backed up into thededuplication vault storage 108, as represented by one or more of phases 302-308 which are made up of the steps 310-364 of themethod 300. Although illustrated as discrete phases and steps, various phases/steps may be divided into additional phases/steps, combined into fewer phases/steps, reordered, or eliminated, depending on the desired implementation. Themethod 300 will now be discussed with reference toFIGS. 1, 2A-2D, and 3A-3B . - The
method 300 may include abackup phase 302 for Company A, a restorephase 304 for Company A, abackup phase 306 for Company B, and a restorephase 308 for Company B. - The
backup phase 302 of themethod 300 may include astep 310 in which an allocated plain text block is read from the source storage. For example, theencryption module 124 may read, atstep 310, the plain text block at position 110(1) or 110(2) from thesource storage 110, as disclosed inFIG. 2A . - The
backup phase 302 of themethod 300 may include astep 312 in which the plain text blocks is hashed, using a 1st cryptographic hash function, to generate a 1st hash. Continuing with the above example, theencryption module 124 may hash, atstep 312, the plain text block from position 110(1) or 110(2) using the 1st cryptographic hash function to generate a 1st hash, such as the 1st hash H7 or the 1st hash H18, as disclosed inFIG. 2A . The 1st cryptographic hash function may be a SHA-1, SHA-2, SHA-3, MD5, or other cryptographic hash function, for example. - The
backup phase 302 of themethod 300 may include astep 314 in which the 1st hash is hashed, using a 2nd cryptographic hash function, to generate a 2nd hash. Continuing with the above example, theencryption module 124 may hash, atstep 314, the 1st hash H7 or the 1st hash H18 using the 2nd cryptographic hash function to generate the 2nd hash H27 or the 2nd hash H38, as disclosed inFIG. 2A . The 2nd cryptographic hash function may be a SHA-1, SHA-2, SHA-3, MD5, or other cryptographic hash function, for example, and may be the same as, or different from, the 1st cryptographic hash function. - The
backup phase 302 of themethod 300 may include astep 316 in which a key-value table of a deduplication vault is searched to determine whether the 2nd hash matches any key in the key-value table, where each key-value pair in the key-value table includes a key that is a hash and a value that is an encrypted block. Continuing with the above example, thededuplication module 118 may search, atstep 316, the key-value table of thededuplication vault storage 108 to determine that the 2nd hash H27 does not match any key in the key-value table, or to determine that the 2nd hash H38 does match a key at position 108(4) in the key-value table, as disclosed inFIG. 2B . Upon determining that the second hash does not match any key in the key-value table (No at step 316), thebackup phase 302 of themethod 300 may includesteps backup phase 302 of themethod 300 may proceed directly to thestep 322. - The
backup phase 302 of themethod 300 may include astep 318 in which the plain text block is encrypted, using an encrypt/decrypt function, using the 1st hash as an encryption password. Continuing with the above example, theencryption module 124 may encrypt, atstep 318, the plain text block from position 110(1) using an encrypt/decrypt function, using the 1st hash H7 as an encryption password, resulting in an encrypted version of the plain text block from position 110(1), as disclosed inFIG. 2B . - The
backup phase 302 of themethod 300 may include astep 320 in which a key-value pair is inserted into the key-value table with the key being the 2nd hash and the value being the encrypted block. Continuing with the above example, thededuplication module 118 may insert, atstep 320, a key-value pair into the key-value table at position 108(1) with the key being the 2nd hash H27 and the value being the encrypted version of the plain text block at position 110(1), as disclosed inFIG. 2B . - The
backup phase 302 of themethod 300 may include astep 322 in which an entry is inserted into an image map corresponding to the source storage that includes the 1st hash and a position of the plain text block as stored in the source storage. Continuing with the above example, thededuplication module 118 may insert, atstep 322, an entry into theimage map 202 corresponding to thesource storage 110 that includes the 1st hash H18 and position 110(2), as disclosed inFIG. 2A , or that includes the 1st hash H7 and position 110(1), as disclosed inFIG. 2B . - The
backup phase 302 of themethod 300 may include astep 324 in which it is determined whether all appropriate blocks to be included in the backup have been read from the source storage. In the case of a base backup, all unique allocated blocks may be identified, and in the case of an incremental, only unique allocated blocks that have potentially been modified may be identified. Continuing with the above example, thededuplication module 118 may determine, atstep 324, whether all of the allocated blocks at positions 110(1), 110(2), 110(4), 110(6), and 110(7) have been read from thesource storage 110, as disclosed inFIG. 2B . If it is determined atstep 324 that all allocated blocks have not been read from the source storage 110 (No at step 324), then themethod 300 returns to step 310 where the next allocated block is read from thesource storage 110. Otherwise (Yes at step 324), thebackup phase 302 of themethod 300 is complete, and themethod 300 proceeds to step 326 of the restorephase 304. - By the conclusion of the
backup phase 302, a backup of thesource storage 110 will have been stored in thededuplication vault storage 108. Unlike a standard backup image, however, the backup of thesource storage 110 as stored in thededuplication vault storage 108 has been reduced in size due to not storing multiple copies of the blocks from positions 110(2) and 110(7), as disclosed inFIG. 2B . In addition, where multiple storages are backed up into thededuplication vault storage 108, the total overall size of the backups will likely be reduced in size due to the elimination of duplicate blocks across the backups. Finally, unlike standard deduplication vault storages, thededuplication vault storage 108 is configured to store each of the plain text blocks of thesource storage 110 included in the backup as encrypted blocks, thus reducing the potential for an unauthorized user, such as a user from Company B, to access the original plain-text blocks, except for those blocks that are included in a backup of the unauthorized user. - The restore
phase 304 of themethod 300 may include astep 326 in which an entry is read in the image map. For example, thededuplication module 118 may read, atstep 326, the first entry in theimage map 202, which includes the 1st hash H18 and source position 110(2), as disclosed inFIG. 2B . - The restore
phase 304 of themethod 300 may include astep 328 in which the 1st hash included in the entry is hashed, using the 2nd cryptographic hash function, to generate the 2nd hash. Continuing with the above example, theencryption module 124 may hash, atstep 328, the 1st hash H18, using the 2nd cryptographic hash function, to generate the 2nd hash H38, as disclosed inFIG. 2B . - The restore
phase 304 of themethod 300 may include astep 330 in which the key-value table is searched to retrieve the encrypted block of the key-value pair having a key that matches the 2nd hash. Continuing with the above example, thededuplication module 118 may search, atstep 330, the key-value table of thededuplication vault storage 108 to retrieve the encrypted block of the key-value pair at position 108(4) that has a key that matches the 2nd hash H38, as disclosed inFIG. 2B . - The restore
phase 304 of themethod 300 may include astep 332 in which the encrypted block is decrypted, using the encrypt/decrypt function, and using the 1st hash as a decryption password. Continuing with the above example, theencryption module 124 may decrypt, atstep 332, the encrypted block, using the encrypt/decrypt function, and using the 1st hash H18 as a decryption password, resulting in the plain text block from position 110(2) of thesource storage 110, as disclosed inFIG. 2B . - The restore
phase 304 of themethod 300 may include astep 334 in which the decrypted block is stored in a restore storage at the position included in the entry. Continuing with the above example, theencryption module 124 may store, atstep 334, the decrypted block in thesource storage 110, where thesource storage 110 is functioning as a restore storage, in the position 110(2), as disclosed inFIG. 2B . - The restore
phase 304 of themethod 300 may include astep 336 in which it is determined whether all entries have been read from the image map. Continuing with the above example, thededuplication module 118 may determine, atstep 336, whether all of the entries have been read from theimage map 202, as disclosed inFIG. 2B . If it is determined atstep 336 that all entries have not been read from the image map 202 (No at step 336), then themethod 300 returns to step 326 where the next entry is read from theimage map 202. Otherwise (Yes at step 336), the restorephase 304 of themethod 300 is complete, and themethod 300 proceeds to step 338 of thebackup phase 306. - By the conclusion of the restore
phase 304, a backup of thesource storage 110 that was stored in thededuplication vault storage 108 will have been restored to a restore storage. Unlike a standard restoration, however, the restoration of the backup of thesource storage 110 involves the backup remaining securely encrypted until being decrypted at thesource system 104, thus reducing the potential for an unauthorized user, such as a user from Company B, to access the original plain-text blocks, except for those blocks that are included in a backup of the unauthorized user. - The
backup phase 306 and the restorephase 308 of themethod 300 are similar in many respects to thebackup phase 302 and the restorephase 304 of themethod 300, the main difference being that thebackup phase 306 and the restorephase 308 are performed on thesource system 106 of Company B instead of on thesource system 104 of Company A. - The
backup phase 306 of themethod 300 may include astep 338 in which an allocated plain text block is read from the source storage. For example, theencryption module 126 may read, atstep 338, the plain text block at position 112(1) or 112(2) from thesource storage 112, as disclosed inFIG. 2C . - The
backup phase 306 of themethod 300 may include astep 340 in which the plain text block is hashed, using the same 1st cryptographic hash function used in thestep 312, to generate a 4th hash. Continuing with the above example, theencryption module 126 may hash, atstep 340, the plain text block from position 112(1) or 112(2) using the 1st cryptographic hash function to generate a 4th hash, such as the 4th hash H47 or the 4th hash H3, as disclosed inFIG. 2C . - The
backup phase 306 of themethod 300 may include astep 342 in which the 4th hash is hashed, using the same 2nd cryptographic hash function used instep 314, to generate a 5th hash. Continuing with the above example, theencryption module 126 may hash, atstep 342, the 4th hash H47 or the 4th hash H3 using the 2nd cryptographic hash function to generate the 5th hash H67 or the 5th hash H23, as disclosed inFIG. 2C . - The
backup phase 306 of themethod 300 may include astep 344 in which a key-value table of a deduplication vault is searched to determine whether the 5th hash matches any key in the key-value table. Continuing with the above example, thededuplication module 118 may search, atstep 344, the key-value table of thededuplication vault storage 108 to determine that the 5th hash H67 does not match any key in the key-value table, or to determine that the 5th hash H23 does match a key at position 108(2) in the key-value table, as disclosed inFIG. 2C . Upon determining that the second hash does not match any key in the key-value table (No at step 344), thebackup phase 306 of themethod 300 may includesteps backup phase 306 of themethod 300 may proceed directly to thestep 350. - The
backup phase 306 of themethod 300 may include astep 346 in which the plain text block is encrypted, using an encrypt/decrypt function, using the 4th hash as an encryption password. Continuing with the above example, theencryption module 126 may encrypt, atstep 346, the plain text block from position 112(1) using an encrypt/decrypt function, using the 4th hash H47 as an encryption password, resulting in an encrypted version of the plain text block from position 112(1), as disclosed inFIG. 2D . - The
backup phase 306 of themethod 300 may include astep 348 in which a key-value pair is inserted into the key-value table with the key being the 5th hash and the value being the encrypted block. Continuing with the above example, thededuplication module 118 may insert, atstep 348, a key-value pair into the key-value table at position 108(9) with the key being the 5th hash H67 and the value being the encrypted version of the plain text block from position 112(1), as disclosed inFIG. 2D . - The
backup phase 306 of themethod 300 may include astep 350 in which an entry is inserted into an image map corresponding to the source storage that includes the 4th hash and a position of the plain text block as stored in the source storage. Continuing with the above example, thededuplication module 118 may insert, atstep 350, an entry into theimage map 204 corresponding to thesource storage 112 that includes the 4th hash H3 and position 112(2), as disclosed inFIG. 2C , or that includes the 4th hash H47 and position 112(1), as disclosed inFIG. 2D . - The
backup phase 306 of themethod 300 may include astep 352 in which it is determined whether all appropriate blocks to be included in the backup have been read from the source storage. Continuing with the above example, thededuplication module 118 may determine, atstep 352, whether all of the allocated blocks at positions 112(1), 112(2), 112(3), and 112(5) have been read from thesource storage 112, as disclosed inFIG. 2D . If it is determined atstep 352 that all allocated blocks have not been read from the source storage 112 (No at step 352), then themethod 300 returns to step 338 where the next allocated block is read from thesource storage 112. Otherwise (Yes at step 352), thebackup phase 306 of themethod 300 is complete, and themethod 300 proceeds to step 354 of the restorephase 308. - By the conclusion of the
backup phase 306, a backup of thesource storage 112 will have been stored in thededuplication vault storage 108, along with the backup of thesource storage 110. Unlike a standard backup image, however, the backup of thesource storage 112 as stored in thededuplication vault storage 108 has been reduced in size due to not storing multiple copies of the duplicate blocks from positions 112(2) and 112(5), as disclosed inFIG. 2D . Further, themethod 300 is employed to encrypt the duplicate plain-text block from position 110(4) and position 112(2) in such a way that only a single encrypted block is stored in position 108(2) in thededuplication vault storage 108 for this duplicate block. Similarly, themethod 300 is employed to encrypt the duplicate plain-text block from position 110(2) and position 112(5) in such a way that only a single encrypted block is stored in position 108(4) in thededuplication vault storage 108 for this duplicate block. Therefore, unlike standard deduplication vaults, which either store a single plain-text deduplicated block or store a single plain-text block in two different encrypted forms, themethod 300 disclosed herein employs client-side encryption with deduplication which enables sensitive blocks to remain secure within thededuplication vault storage 108 even while redundancy within and across the source storages 110 and 112 is reduced or eliminated. - The restore
phase 308 of themethod 300 may include astep 354 in which an entry is read in the image map. For example, thededuplication module 118 may read, atstep 354, the first entry in theimage map 204 which includes the 4th hash H3 and source position 112(2), as disclosed inFIG. 2D . - The restore
phase 308 of themethod 300 may include astep 356 in which the 4th hash included in the entry is hashed, using the 2nd cryptographic hash function, to generate the 5th hash. Continuing with the above example, theencryption module 126 may hash, atstep 356, the 4th hash H3, using the 2nd cryptographic hash function, to generate the 5th hash H23, as disclosed inFIG. 2D . - The restore
phase 308 of themethod 300 may include astep 358 in which the key-value table is searched to retrieve the encrypted block of the key-value pair having a key that matches the 5th hash. Continuing with the above example, thededuplication module 118 may search, atstep 358, the key-value table of thededuplication vault storage 108 to retrieve the encrypted block of the key-value pair at position 108(2) that has a key that matches the 5th hash H23, as disclosed inFIG. 2D . - The restore
phase 308 of themethod 300 may include astep 360 in which the encrypted block is decrypted, using the encrypt/decrypt function, and using the 4th hash as a decryption password. Continuing with the above example, theencryption module 126 may decrypt, atstep 360, the encrypted block, using the encrypt/decrypt function, and using the 4th hash H3 as a decryption password, resulting in the plain text block from position 112(2) of thesource storage 112, as disclosed inFIG. 2D . - The restore
phase 308 of themethod 300 may include astep 362 in which the decrypted block is stored in a restore storage at the position included in the entry. Continuing with the above example, theencryption module 126 may store, atstep 362, the decrypted block in thesource storage 112, where thesource storage 112 is functioning as a restore storage, in the position 112(2), as disclosed inFIG. 2D . - The restore
phase 308 of themethod 300 may include astep 364 in which it is determined whether all entries have been read from the image map. Continuing with the above example, thededuplication module 118 may determine, atstep 364, whether all of the entries have been read from theimage map 204, as disclosed inFIG. 2D . If it is determined atstep 364 that all entries have not been read from the image map 204 (No at step 364), then themethod 300 returns to step 354 where the next entry is read from theimage map 204. Otherwise (Yes at step 364), the restorephase 308 of themethod 300 is complete. - By the conclusion of the restore
phase 308, a backup of thesource storage 112 that was stored in thededuplication vault storage 108 will have been restored to a restore storage. Unlike a standard restoration, however, the restoration of the backup of thesource storage 112 involves the backup remaining securely encrypted until being decrypted at thesource system 106, thus reducing the potential for an unauthorized user, such as a user from Company A, to access the original plain-text blocks, except for those blocks that are included in a backup of the unauthorized user. -
FIGS. 4A-4D are schematic diagrams illustrating client-side encryption 400 in thededuplication backup system 100. The client-side encryption 400 may be implemented, in at least some embodiments, with similar events occurring prior to the client-side encryption 400 as occurred prior to the client-side encryption 200 discussed above. - As disclosed in
FIG. 4A , a snapshot is taken of thesource storage 110 at time t(0) and allocated plain text blocks at positions 110(1), 110(2), 110(4), 110(6), and 110(7) are targeted to be included in the base backup A of thesource storage 110. Each of these blocks is then read from thesource storage 110, hashed, using the 1st cryptographic hash function, to generate a 1st hash, and then encrypted, using the encrypt/decrypt function, using the 1st hash as an encryption password. The encrypted block is then hashed, using the 2nd cryptographic hash function, to generate a 3rd hash. Next, it is determined whether the 3rd hash matches any key in the key-value table of thededuplication vault storage 108. As disclosed inFIG. 4A , only the 3rd hash H118 matches the key at position 108(4) in the key-value table, while the 3rd hashes H107, H103, and H109 do not match any key in the key value table. Next, an entry is inserted into animage map 402 corresponding to the base backup A of thesource storage 110 that includes the corresponding 1st hash H18, the corresponding 3rd hash H118, and the position 110(2) of the plain text block as stored in thesource storage 110. - As disclosed in
FIG. 4B , since the 3rd hashes H107, H103, and H109 do not match any key in the key value table, key-value pairs are inserted into the key value table for each with the key being the 3rd hash and the value being the corresponding encrypted block. Then, entries are inserted into theimage map 402 corresponding to thesource storage 110 that each includes the 1st hash, the 3rd hash, and the position of the plain text block as stored in thesource storage 110. It is noted that since the block at position 110(4) and the block at position 110(7) are duplicates, only the first instance of this duplicate block is encrypted and inserted into the key-value table, but entries for both of the duplicate blocks are inserted into theimage map 402. - As disclosed in
FIG. 4C , a snapshot is then taken of thesource storage 112 at time t(1) and allocated plain text blocks at positions 112(1), 112(2), 112(3), and 112(5) are targeted to be included in the base backup B of thesource storage 112. Each of these blocks is then read from thesource storage 112, hashed, using the 1st cryptographic hash function, to generate a 1st hash, and then encrypted, using the encrypt/decrypt function, using the 1st hash as an encryption password. The encrypted block is then hashed, using the 2nd cryptographic hash function, to generate a 3rd hash. Next, it is determined whether the 3rd hash matches any key in the key-value table of thededuplication vault storage 108. As disclosed inFIG. 4C , only the 3rd hashes H103 and H118 match the keys at positions 108(2) and 108(4), respectively, in the key-value table, while the 3rd hashes H147 and H151 do not match any key in the key value table. Next, entries are inserted into animage map 404 corresponding to the base backup B of thesource storage 112 that each includes the corresponding 1st hash, the corresponding 3rd hash, and the position of the plain text block as stored in thesource storage 112. - As disclosed in
FIG. 4D , since the 3rd hashes H147 and H151 do not match any key in the key value table, key-value pairs are inserted into the key value table for each with the key being the 3rd hash and the value being the corresponding encrypted block. Then, entries are inserted into theimage map 404 corresponding to thesource storage 112 that each includes the 1st hash, the 3rd hash, and the position of the plain text block as stored in thesource storage 112. - Therefore, during the client-
side encryption 400 ofFIGS. 4A-4D , plain-text blocks of the source storages 110 and 112 may be encrypted at thesource system 104 of Company A and at thesource system 106 of Company B prior to sending the blocks to thededuplication vault storage 108, which may result in benefits similar to those discussed above in connection with the client-side encryption 200 ofFIGS. 2A-2D . In addition, the client-side encryption 400 may additionally include the added benefit of preventing the key-value table of thededuplication vault storage 108 from being “poisoned” by the malicious or inadvertent insertion of an encrypted block as a value that does not match the hash inserted as its corresponding key. Any “poisoning” of the key-value table may be prevented in the client-side encryption 400 because each 3rd hash inserted into the key-value table can be verified to match its corresponding encrypted block by rehashing the encrypted block using the 2nd cryptographic hash function, and comparing the results of the rehash operation with the 3rd hash, where if the comparison is not identical then the insert is deemed to be a poisoning attempt and is therefore rejected. -
FIGS. 5A-5B is a schematic flowchart illustrating asecond example method 500 for client-side encryption in thededuplication backup system 100. Themethod 500 may be implemented, in at least some embodiments, in a similar manner as themethod 300 discussed above. Themethod 500 will now be discussed with reference toFIGS. 1, 4A-4D , and 5A-5B. - The
method 500 may include abackup phase 502 for Company A, a restorephase 504 for Company A, abackup phase 506 for Company B, and a restorephase 508 for Company B. - The
backup phase 502 of themethod 500 may include astep 510 in which an allocated plain text block is read from the source storage. For example, theencryption module 124 may read, atstep 510, the plain text block at position 110(1) or 110(2) from thesource storage 110, as disclosed inFIG. 4A . - The
backup phase 502 of themethod 500 may include astep 512 in which the plain text blocks is hashed, using a 1st cryptographic hash function, to generate a 1st hash. Continuing with the above example, theencryption module 124 may hash, atstep 512, the plain text block from position 110(1) or 110(2) using the 1st cryptographic hash function to generate the 1st hash H7 or the 1st hash H18, as disclosed inFIG. 4A . - The
backup phase 502 of themethod 500 may include astep 514 in which the plain text block is encrypted, using the encrypt/decrypt function, using the 1st hash as an encryption password. Continuing with the above example, theencryption module 124 may encrypt, atstep 514, the plain text block from position 110(1) using the encrypt/decrypt function, using the 1st hash H7 as an encryption password, resulting in an encrypted version of the plain text block from position 110(1), as disclosed inFIG. 4A . Similarly, theencryption module 124 may encrypt, atstep 514, the plain text block from position 110(2) using the encrypt/decrypt function, using the 1st hash H18 as an encryption password, resulting in an encrypted version of the plain text block from position 110(2), as disclosed inFIG. 4A . - The
backup phase 502 of themethod 500 may include astep 516 in which the encrypted block is hashed, using the 2nd cryptographic hash function, to generate a 3rd hash. Continuing with the above example, theencryption module 124 may hash, atstep 516, the encrypted block corresponding to the plain text block at position 110(1) or position 110(2) using the 2nd cryptographic hash function to generate the 3rd hash H107 or the 3rd hash H118, as disclosed inFIG. 4A . - The
backup phase 502 of themethod 500 may include astep 518 in which a key-value table of a deduplication vault is searched to determine whether the 3rd hash matches any key in the key-value table. Continuing with the above example, thededuplication module 118 may search, atstep 518, the key-value table of thededuplication vault storage 108 to determine that the 3rd hash H107 does not match any key in the key-value table, or to determine that the 3rd hash H118 does match a key at position 108(4) in the key-value table, as disclosed inFIG. 4A . Upon determining that the 3rd hash does not match any key in the key-value table (No at step 518), thebackup phase 502 of themethod 500 may includestep 520. Otherwise (Yes at step 518), thebackup phase 502 of themethod 500 may proceed directly to step 522. - The
backup phase 502 of themethod 500 may include astep 520 in which a key-value pair is inserted into the key-value table with the key being the 3rd hash and the value being the encrypted block. Continuing with the above example, thededuplication module 118 may insert, atstep 520, a key-value pair into the key-value table at position 108(1) with the key being the 3rd hash H107 and the value being the encrypted version of the plain text block at position 110(1), as disclosed inFIG. 4B . - The
backup phase 502 of themethod 500 may include astep 522 in which an entry is inserted into an image map corresponding to the source storage that includes the 1st hash, the 3rd hash, and a position of the plain text block as stored in the source storage. Continuing with the above example, thededuplication module 118 may insert, atstep 522, an entry into theimage map 402 corresponding to thesource storage 110 that includes the 1st hash H18, the third hash H118, and position 110(2) of the plain text block as stored in thesource storage 110, as disclosed inFIG. 4A , or that includes the 1st hash H7, the 3rd hash H107, and position 110(1) of the plain text block as stored in thesource storage 110, as disclosed inFIG. 4B . - The
backup phase 502 of themethod 500 may include astep 524 in which it is determined whether all appropriate blocks to be included in the backup have been read from the source storage. Continuing with the above example, thededuplication module 118 may determine, atstep 524, whether all of the allocated blocks at positions 110(1), 110(2), 110(4), 110(6), and 110(7) have been read from thesource storage 110, as disclosed inFIG. 2B . If it is determined atstep 524 that all allocated blocks have not been read from the source storage 110 (No at step 524), then themethod 500 returns to step 510 where the next allocated block is read from thesource storage 110. Otherwise (Yes at step 524), thebackup phase 502 of themethod 500 is complete, and themethod 500 proceeds to step 526 of the restorephase 504. - By the conclusion of the
backup phase 502, a backup of thesource storage 110 will have been stored in thededuplication vault storage 108. Unlike a standard backup image, however, the backup of thesource storage 110 as stored in thededuplication vault storage 108 has been reduced in size due to not storing multiple copies of the duplicate blocks from positions 110(2) and 110(7), as disclosed inFIG. 4B . In addition, where multiple storages are backed up into thededuplication vault storage 108, the total overall size of the backups will likely be reduced in size due to the elimination of duplicate blocks across the backups. Finally, unlike standard deduplication vault storages, thededuplication vault storage 108 is configured to store each of the plain text blocks of thesource storage 110 included in the backup as encrypted blocks, thus reducing the potential for an unauthorized user, such as a user from Company B, to access the original plain-text blocks, except for those blocks that are included in a backup of the unauthorized user. - The restore
phase 504 of themethod 500 may include astep 526 in which an entry is read in the image map. For example, thededuplication module 118 may read, atstep 526, the first entry in theimage map 402 which includes the 1st hash H18, the 3rd hash H118, and source position 110(2), as disclosed inFIG. 4B . - The restore
phase 504 of themethod 500 may include astep 528 in which the key-value table is searched to retrieve the encrypted block of the key-value pair having a key that matches the 3rd hash. Continuing with the above example, thededuplication module 118 may search, atstep 528, the key-value table of thededuplication vault storage 108 to retrieve the encrypted block of the key-value pair at position 108(4) that has a key that matches the 3rd hash H118, as disclosed inFIG. 4B . - The restore
phase 504 of themethod 500 may include astep 530 in which the encrypted block is decrypted, using the encrypt/decrypt function, and using the 1st hash as a decryption password. Continuing with the above example, theencryption module 124 may decrypt, atstep 530, the encrypted block, using the encrypt/decrypt function, and using the 1st hash H18 as a decryption password, resulting in the plain text block from position 110(2) of thesource storage 110, as disclosed inFIG. 4B . - The restore
phase 504 of themethod 500 may include astep 532 in which the decrypted block is stored in a restore storage at the position included in the entry. Continuing with the above example, theencryption module 124 may store, atstep 532, the decrypted block in thesource storage 110, where thesource storage 110 is functioning as a restore storage, in the position 110(2), as disclosed inFIG. 4B . - The restore
phase 504 of themethod 500 may include astep 534 in which it is determined whether all entries have been read from the image map. Continuing with the above example, thededuplication module 118 may determine, atstep 534, whether all of the entries have been read from theimage map 402, as disclosed inFIG. 4B . If it is determined atstep 534 that all entries have not been read from the image map 402 (No at step 534), then themethod 500 returns to step 526 where the next entry is read from theimage map 402. Otherwise (Yes at step 534), the restorephase 504 of themethod 500 is complete, and themethod 500 proceeds to step 536 of thebackup phase 506. - By the conclusion of the restore
phase 504, a backup of thesource storage 110 that was stored in thededuplication vault storage 108 will have been restored to a restore storage. Unlike a standard restoration, however, the restoration of the backup of thesource storage 110 involves the backup remaining securely encrypted until being decrypted at thesource system 104, thus reducing the potential for an unauthorized user, such as a user from Company B, to access the original plain-text blocks, except for those blocks that are included in a backup of the unauthorized user. - The
backup phase 506 and the restorephase 508 of themethod 500 are similar in many respects to thebackup phase 502 and the restorephase 504 of themethod 500, the main difference being that thebackup phase 506 and the restorephase 508 are performed on thesource system 106 of Company B instead of on thesource system 104 of Company A. - The
backup phase 506 of themethod 500 may include astep 536 in which an allocated plain text block is read from the source storage. For example, theencryption module 126 may read, atstep 536, the plain text block at position 112(1) or 112(2) from thesource storage 112, as disclosed inFIG. 4C . - The
backup phase 506 of themethod 500 may include astep 538 in which the plain text block is hashed, using the same 1st cryptographic hash function used in thestep 512, to generate a 4th hash. Continuing with the above example, theencryption module 126 may hash, atstep 538, the plain text block from position 112(1) or 112(2) using the 1st cryptographic hash function to generate the 4th hash H47 or the 4th hash H3, as disclosed inFIG. 4C . - The
backup phase 506 of themethod 500 may include astep 540 in which the plain text block is encrypted, using the encrypt/decrypt function, using the 4th hash as an encryption password. Continuing with the above example, theencryption module 126 may encrypt, atstep 540, the plain text block from position 112(1) using an encrypt/decrypt function, using the 4th hash H47 as an encryption password, resulting in an encrypted version of the plain text block from position 112(1), as disclosed inFIG. 4C . Similarly, theencryption module 126 may encrypt, atstep 540, the plain text block from position 112(2) using the encrypt/decrypt function, using the 4th hash H3 as an encryption password, resulting in an encrypted version of the plain text block from position 110(2), as disclosed inFIG. 4C . - The
backup phase 502 of themethod 500 may include astep 542 in which the encrypted block is hashed, using the same 2nd cryptographic hash function used instep 516, to generate a 6th hash. Continuing with the above example, theencryption module 126 may hash, atstep 542, the encrypted block corresponding to the plain text block at position 112(1) or position 112(2) using the 2nd cryptographic hash function to generate the 6th hash H147 or the 6th hash H103, respectively, as disclosed inFIG. 4C . - The
backup phase 506 of themethod 500 may include astep 544 in which a key-value table of a deduplication vault is searched to determine whether the 6th hash matches any key in the key-value table. Continuing with the above example, thededuplication module 118 may search, atstep 544, the key-value table of thededuplication vault storage 108 to determine that the 6th hash H147 does not match any key in the key-value table, or to determine that the 6th hash H103 does match a key at position 108(2) in the key-value table, as disclosed inFIG. 4C . Upon determining that the 6th hash does not match any key in the key-value table (No at step 544), thebackup phase 506 of themethod 500 may includestep 546. Otherwise (Yes at step 544), thebackup phase 506 of themethod 500 may proceed directly to step 548. - The
backup phase 506 of themethod 500 may include astep 546 in which a key-value pair is inserted into the key-value table with the key being the 6th hash and the value being the encrypted block. Continuing with the above example, thededuplication module 118 may insert, atstep 546, a key-value pair into the key-value table at position 108(9) with the key being the 6th hash H147 and the value being the encrypted version of the plain text block 112(1), as disclosed inFIG. 4D . - The
backup phase 506 of themethod 500 may include astep 548 in which an entry is inserted into an image map corresponding to the source storage that includes the 4th hash, the 6th hash, and a position of the plain text block as stored in the source storage. Continuing with the above example, thededuplication module 118 may insert, atstep 548, an entry into theimage map 404 corresponding to thesource storage 112 that includes the 4th hash H3, the 6th hash H103, and position 112(2) of the plain text block as stored in thesource storage 112, as disclosed inFIG. 4C , or that includes the 4th hash H47, the 6th hash H147, and position 112(1) of the plain text block as stored in thesource storage 112, as disclosed inFIG. 4D . - The
backup phase 506 of themethod 500 may include astep 550 in which it is determined whether all appropriate blocks to be included in the backup have been read from the source storage. Continuing with the above example, thededuplication module 118 may determine, atstep 550, whether all of the allocated blocks at positions 112(1), 112(2), 112(3), and 112(5) have been read from thesource storage 112, as disclosed inFIG. 4D . If it is determined atstep 550 that all allocated blocks have not been read from the source storage 112 (No at step 550), then themethod 500 returns to step 536 where the next allocated block is read from thesource storage 112. Otherwise (Yes at step 550), thebackup phase 506 of themethod 500 is complete, and themethod 500 proceeds to step 552 of the restorephase 508. - By the conclusion of the
backup phase 506, a backup of thesource storage 112 will have been stored in thededuplication vault storage 108, along with the backup of thesource storage 110. Unlike a standard backup image, however, the backup of thesource storage 112 as stored in thededuplication vault storage 108 has been reduced in size due to not storing multiple copies of the duplicate blocks from positions 112(2) and 112(5), as disclosed inFIG. 4D . Further, themethod 500 is employed to encrypt the duplicate plain-text block from position 110(4) and position 112(2) in such a way that only a single encrypted block is stored in position 108(2) in thededuplication vault storage 108 for this duplicate block. Similarly, themethod 500 is employed to encrypt the duplicate plain-text block from position 110(2) and position 112(5) in such a way that only a single encrypted block is stored in position 108(4) in thededuplication vault storage 108 for this duplicate block. Therefore, unlike standard deduplication vaults which store a single plain-text deduplicated block, or store a single plain-text block in two different encrypted forms, themethod 500 disclosed herein employs client-side encryption with deduplication which enables sensitive blocks to remain secure within thededuplication vault storage 108 even while redundancy within and across the source storages 110 and 112 is reduced or eliminated. - The restore
phase 508 of themethod 500 may include astep 552 in which an entry is read in the image map. For example, thededuplication module 118 may read, atstep 552, the first entry in theimage map 404 which includes the 4th hash H3, the 6th hash H103, and source position 112(2), as disclosed inFIG. 4D . - The restore
phase 508 of themethod 500 may include astep 554 in which the key-value table is searched to retrieve the encrypted block of the key-value pair having a key that matches the 6th hash. Continuing with the above example, thededuplication module 118 may search, atstep 554, the key-value table of thededuplication vault storage 108 to retrieve the encrypted block of the key-value pair at position 108(2) that has a key that matches the 6th hash H103, as disclosed inFIG. 4D . - The restore
phase 508 of themethod 500 may include astep 556 in which the encrypted block is decrypted, using the encrypt/decrypt function, and using the 4th hash as a decryption password. Continuing with the above example, theencryption module 126 may decrypt, atstep 556, the encrypted block, using the encrypt/decrypt function, and using the 4th hash H3 as a decryption password, resulting in the plain text block from position 112(2) of thesource storage 112, as disclosed inFIG. 4D . - The restore
phase 508 of themethod 500 may include astep 558 in which the decrypted block is stored in a restore storage at the position included in the entry. Continuing with the above example, theencryption module 126 may store, atstep 558, the decrypted block in thesource storage 112, where thesource storage 112 is functioning as a restore storage, in the position 112(2), as disclosed inFIG. 4D . - The restore
phase 508 of themethod 500 may include astep 560 in which it is determined whether all entries have been read from the image map. Continuing with the above example, thededuplication module 118 may determine, atstep 560, whether all of the entries have been read from theimage map 404, as disclosed inFIG. 4D . If it is determined atstep 560 that all entries have not been read from the image map 404 (No at step 560), then themethod 500 returns to step 552 where the next entry is read from theimage map 404. Otherwise (Yes at step 560), the restorephase 508 of themethod 500 is complete. - By the conclusion of the restore
phase 508, a backup of thesource storage 112 that was stored in thededuplication vault storage 108 will have been restored to a restore storage. Unlike a standard restoration, however, the restoration of the backup of thesource storage 112 involves the backup remaining securely encrypted until being decrypted at thesource system 106, thus reducing the potential for an unauthorized user, such as a user from Company A, to access the original plain-text blocks, except for those blocks that are included in a backup of the unauthorized user. - It is understood that the foregoing discussion of the
methods methods deduplication module 118, theencryption module 124, and theencryption module 126, it is understood that themethods deduplication module 118, theencryption module 124, and theencryption module 126 exclusively or by some other module or combination of modules. - The embodiments described herein may include the use of a special-purpose or general-purpose computer, including various computer hardware or software modules, as discussed in greater detail below.
- Embodiments described herein may be implemented using non-transitory computer-readable media for carrying or having computer-executable instructions or data structures stored thereon. Such computer-readable media may be any available media that may be accessed by a general-purpose or special-purpose computer. By way of example, and not limitation, such computer-readable media may include non-transitory computer-readable storage media including RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other storage medium which may be used to carry or store one or more desired programs having program code in the form of computer-executable instructions or data structures and which may be accessed and executed by a general-purpose computer, special-purpose computer, or virtual computer such as a virtual machine. Combinations of the above may also be included within the scope of computer-readable media.
- Computer-executable instructions comprise, for example, instructions and data which, when executed by one or more processors, cause a general-purpose computer, special-purpose computer, or virtual computer such as a virtual machine to perform a certain method, function, or group of methods or functions. Although the subject matter has been described in language specific to structural features and/or methodological steps, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or steps described above. Rather, the specific features and steps described above are disclosed as example forms of implementing the claims.
- As used herein, the term “module” may refer to software objects or routines that execute on a computing system. The different modules or filters described herein may be implemented as objects or processes that execute on a computing system (e.g., as separate threads). While the system and methods described herein are preferably implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated.
- All examples and conditional language recited herein are intended for pedagogical objects to aid the reader in understanding the example embodiments and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically-recited examples and conditions.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/060,396 US20160191247A1 (en) | 2014-10-07 | 2016-03-03 | Client-side encryption in a deduplication backup system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/508,654 US9372998B2 (en) | 2014-10-07 | 2014-10-07 | Client-side encryption in a deduplication backup system |
US15/060,396 US20160191247A1 (en) | 2014-10-07 | 2016-03-03 | Client-side encryption in a deduplication backup system |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/508,654 Continuation US9372998B2 (en) | 2014-10-07 | 2014-10-07 | Client-side encryption in a deduplication backup system |
Publications (1)
Publication Number | Publication Date |
---|---|
US20160191247A1 true US20160191247A1 (en) | 2016-06-30 |
Family
ID=55633009
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/508,654 Active US9372998B2 (en) | 2014-10-07 | 2014-10-07 | Client-side encryption in a deduplication backup system |
US15/060,396 Abandoned US20160191247A1 (en) | 2014-10-07 | 2016-03-03 | Client-side encryption in a deduplication backup system |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/508,654 Active US9372998B2 (en) | 2014-10-07 | 2014-10-07 | Client-side encryption in a deduplication backup system |
Country Status (1)
Country | Link |
---|---|
US (2) | US9372998B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4020265A1 (en) * | 2020-12-25 | 2022-06-29 | Shanghai Kunyao Network Science & Technology Co., Ltd. | Method and device for storing encrypted data |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9449183B2 (en) * | 2012-01-28 | 2016-09-20 | Jianqing Wu | Secure file drawer and safe |
US9619343B2 (en) * | 2015-02-19 | 2017-04-11 | International Business Machines Corporation | Accelerated recovery after a data disaster |
KR102416004B1 (en) * | 2015-09-02 | 2022-07-04 | 삼성전자주식회사 | Server device includig interface circuits, memory modules and switch circuit connecting interface circuits and memory modules |
CN106570815A (en) * | 2016-10-21 | 2017-04-19 | 广东工业大学 | Image encryption method based on double-chaos system and blocking |
CN108880812B (en) * | 2017-05-09 | 2022-08-09 | 北京京东尚科信息技术有限公司 | Method and system for data encryption |
TWI689832B (en) * | 2018-03-29 | 2020-04-01 | 威聯通科技股份有限公司 | File deduplication processing system and file processing method thereof |
US11537724B2 (en) * | 2019-03-26 | 2022-12-27 | International Business Machines Corporation | Generating data migration plan for in-place encryption of data |
CN113853588A (en) * | 2019-05-21 | 2021-12-28 | 美光科技公司 | Bus encryption for non-volatile memory |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6330557B1 (en) * | 1998-06-30 | 2001-12-11 | Sun Microsystems, Inc. | Method and system for storing data in a hash table that eliminates the necessity of key storage |
US8199911B1 (en) * | 2008-03-31 | 2012-06-12 | Symantec Operating Corporation | Secure encryption algorithm for data deduplication on untrusted storage |
US8401185B1 (en) * | 2010-02-01 | 2013-03-19 | Symantec Corporation | Systems and methods for securely deduplicating data owned by multiple entities |
-
2014
- 2014-10-07 US US14/508,654 patent/US9372998B2/en active Active
-
2016
- 2016-03-03 US US15/060,396 patent/US20160191247A1/en not_active Abandoned
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4020265A1 (en) * | 2020-12-25 | 2022-06-29 | Shanghai Kunyao Network Science & Technology Co., Ltd. | Method and device for storing encrypted data |
Also Published As
Publication number | Publication date |
---|---|
US20160098568A1 (en) | 2016-04-07 |
US9372998B2 (en) | 2016-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9626518B2 (en) | Avoiding encryption in a deduplication storage | |
US9372998B2 (en) | Client-side encryption in a deduplication backup system | |
US8621240B1 (en) | User-specific hash authentication | |
US9917688B2 (en) | Consolidating encrypted image backups without decryption | |
US9152500B1 (en) | Hash collision recovery in a deduplication vault | |
US8930686B2 (en) | Deduplication of encrypted data | |
US8667273B1 (en) | Intelligent file encryption and secure backup system | |
US8504528B2 (en) | Duplicate backup data identification and consolidation | |
US9626249B1 (en) | Avoiding compression of high-entropy data during creation of a backup of a source storage | |
US9361185B1 (en) | Capturing post-snapshot quiescence writes in a branching image backup chain | |
US20210377016A1 (en) | Key rollover for client side encryption in deduplication backup systems | |
US9311190B1 (en) | Capturing post-snapshot quiescence writes in a linear image backup chain | |
US20110016095A1 (en) | Integrated Approach for Deduplicating Data in a Distributed Environment that Involves a Source and a Target | |
US9304864B1 (en) | Capturing post-snapshot quiescence writes in an image backup | |
US10120595B2 (en) | Optimizing backup of whitelisted files | |
US9734156B1 (en) | Systems and methods for leveraging data-deduplication capabilities of file systems | |
US9886351B2 (en) | Hybrid image backup of a source storage | |
US20160070621A1 (en) | Pruning unwanted file content from an image backup | |
CN111858149B (en) | System, method and machine readable medium for backup belonging to cluster | |
US9804926B1 (en) | Cataloging file system-level changes to a source storage between image backups of the source storage | |
US8738577B1 (en) | Change tracking for multiphase deduplication | |
US8732135B1 (en) | Restoring a backup from a deduplication vault storage | |
US9152504B1 (en) | Staged restore of a decremental backup chain | |
Tian et al. | Sed‐Dedup: An efficient secure deduplication system with data modifications | |
US9727264B1 (en) | Tracking content blocks in a source storage for inclusion in an image backup of the source storage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: STORAGECRAFT TECHNOLOGY CORPORATION, UTAH Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BLAIR, JEFFREY DALE;BUSHMAN, NATHAN S.;IRISH, DUDLEY MELVIN;REEL/FRAME:037950/0778 Effective date: 20141007 |
|
AS | Assignment |
Owner name: SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT, VIRG Free format text: SECURITY AGREEMENT;ASSIGNOR:STORAGECRAFT TECHNOLOGY CORPORATION;REEL/FRAME:038449/0943 Effective date: 20160415 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: STORAGECRAFT TECHNOLOGY CORPORATION, MINNESOTA Free format text: TERMINATION AND RELEASE OF PATENT SECURITY AGREEMENT;ASSIGNOR:SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT;REEL/FRAME:055614/0607 Effective date: 20210316 |