EP1428123B1 - Selective data backup - Google Patents
Selective data backup Download PDFInfo
- Publication number
- EP1428123B1 EP1428123B1 EP02757632A EP02757632A EP1428123B1 EP 1428123 B1 EP1428123 B1 EP 1428123B1 EP 02757632 A EP02757632 A EP 02757632A EP 02757632 A EP02757632 A EP 02757632A EP 1428123 B1 EP1428123 B1 EP 1428123B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- file
- substantive
- stored
- processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000004891 communication Methods 0.000 claims abstract description 14
- 238000012546 transfer Methods 0.000 claims abstract description 13
- 238000000034 method Methods 0.000 claims description 60
- 230000008859 change Effects 0.000 claims description 16
- 230000008030 elimination Effects 0.000 claims description 13
- 238000003379 elimination reaction Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims 2
- 230000008569 process Effects 0.000 description 32
- 208000024780 Urticaria Diseases 0.000 description 13
- 230000006870 function Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 239000000126 substance Substances 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 5
- 238000003491 array Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000000739 chaotic effect Effects 0.000 description 2
- 208000000044 Amnesia Diseases 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000945 filler Substances 0.000 description 1
- 231100000863 loss of memory Toxicity 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
- G06F11/1451—Management of the data involved in backup or backup restore by selection of backup contents
Definitions
- the invention relates to backing up data and more particularly to backing up non-deterministic files.
- Safeguarding electronic data by backing the data up is a common event, and an event that consumes increasing amount of memory and processing power.
- Data files today typically occupy much more memory than with previous software programs and thus backing these files up requires more storage space, and more processing power and communication-link bandwidth to transfer and store the files.
- differential file backup is performed by determining changes that have occurred within a file using a set of hash codes that represents the information within the file, as it previously existed, in fixed-size blocks. These hash codes are matched up against the same file now modified, determining those areas of the file that have changed and those areas of the file that have not changed. This results in significant bandwidth and space savings for sending and storing the portions of the file that have changed.
- Common file elimination determines whether a file to be backed up is the same as other files to be backed up (e.g., a file already backed up), and if so, stores only one copy of that file.
- Common file elimination techniques can be applied to data groups other than files.
- Implementations of the invention may include one or more of the following features.
- the processor may be configured to bypass at least some of the associated data, for transfer for storage, that are present in the stored data.
- the processor may be configured to transfer the associated substantive data for storage only if the associated substantive data are absent from the stored data.
- the processor may be configured to determine the associated substantive data by analyzing a structure of the desired data and data associated with portions of the structure.
- the processor may be configured to map the associated substantive data to a change-resistant format and to compare the associated substantive data with the stored data using the change-resistant formatted substantive data.
- the processor may be configured to perform differential backup on the associated substantive data to compare the associated substantive data with the stored data.
- Implementations of the invention may also include one or more of the following features.
- the processor may be further configured to transfer, over the communication link, indicia that substantive data are absent from the stored data and that substantive data in the stored data are absent from the associated substantive data.
- the indicia may include at least one of an add key command, an add value command, a remove key command, a remove value command, and a change value command.
- the processor may be configured to determine the associated substantive data by determining groupings of data within the desired data.
- the processor may be configured to determine the groupings of data by analyzing indexes associated with the desired data.
- the processor may be configured to perform common file elimination on the groupings of data to compare the associated substantive data with the stored data.
- the processor may be further configured to transfer, over the communication link, indicia of relationships of data groupings that exist in the stored data to the associated substantive data.
- implementations of the invention may include one or more of the following features.
- the processor may be configured to identify the associated substantive data by analyzing a data file containing data subgroups and identifying the data subgroups within the data file, compare the associated substantive data with the stored data by comparing the data subgroups with stored potentially-common data sets, and back up data subgroups based upon the comparison. Thus the processor may back up only those data subgroups that are absent from the stored potentially-common data sets.
- the processor may store the data subgroups as separate files for comparison.
- the processor may compare the data subgroups with the stored potentially-common data sets using a common file elimination technique.
- the processor may provide remove indicia indicating that at least one of the potentially-common data sets is associated with the data file.
- the processor may provide remove indicia indicating that at least one of the potentially-common, data sets is associated with a particular portion of the data file.
- Various aspects of the invention may provide one or more of the following advantages.
- Data stored in ways that defeat traditional backup techniques can be backed up while achieving advantages of the traditional backup techniques.
- Files with little substantive changes and significant non-substantive changes can be backed up with little storage and/or processing, corresponding to the little substantive changes.
- Data subgroups of larger, aggregated data groups, e.g., files, can be backed up in a non-redundant manner.
- At least some embodiments of the invention provide techniques for determining substantive differential changes in data for storing the substantive changes and/or determining data subsets of larger collections of data that may be similar to other data sets to reduce duplicative storage of similar data sets.
- Files can be analyzed to determine their substance and compared against the substance of stored files. If the substance of the files differ, then the substantive differences can be stored, while non-substantive differences can be ignored and not backed up. Further, files that contain meaningful subsets can be analyzed to determine the subsets and the subsets compared with stored sets of data. Duplicative sets of data can be bypassed for back up while non-duplicative sets can be backed up.
- Some data files also reduce the efficacy of common file elimination backup techniques. These data files treat attachments by embedding the attachments within a larger data set, e.g., a file, containing all mail and attachments. Common file elimination techniques will fail to recognize that a subset of the email file (the attachment) would match an existing data set in a storage repository, unless the entire file containing the attachment matched a stored file.
- a non-deterministic file or data group is a file or data group in which the physical makeup of the file or data group may change from one moment to the next with very little actual change of the substance or content, or the contents of a file or data group containing data subgroups may change dramatically, while data subgroups of the file or data group may match stored groups.
- a system 10 for backing up non-deterministic data includes a computer 12, a communication network 14, and a backup storage 16.
- the computer 12 includes a processor 18 and memory 20 for storing software instructions that can be executed by the processor 12, and for storing data that may be backed up.
- the software instructions are configured to be executed by the processor 18 to perform functions as described below.
- the computer 12 is configured to send data through the network 14 to the storage 16 for backup.
- the network 14, here shown as the packet-switched network commonly known as the Internet may be a wide area network (WAN) or a local area network (LAN). Further, the network 14 may be replaced with a simple communication line, the network 14 indicating a communication link between the computer 12 and the storage 16, although the form of the link may vary.
- WAN wide area network
- LAN local area network
- the computer 12 is configured to have the processor 18 assess data, stored in the memory 20, to be backed up and to determine whether and what data to back up.
- the processor 18 is configured, in conjunction with the stored software, to identify a mechanism through which the actual, substantive information within a file can be accessed. This mechanism preferably identifies only the substantive information and ignores garbage data or administrative portions of the file.
- the processor 18 is further configured to divide the file into parts representing the real information.
- the processor 18 is further configured to evaluate the real information for backup, preferably using traditional techniques such as differential backup or common (redundant) file elimination. Evaluation of the real data may be performed with the real data collected in one group (file or memory block) or in the separate parts.
- the computer 12 can also operate on sets of data other than files.
- Registry backup on Windows® NT systems is generally performed by most backup products using a Win32 API (Application Programming Interface) called RegSaveKey, for registry save key.
- the registry is an inverted tree structured database including descriptions of applications (e.g., types of files such as .txt), user information (e.g., desktop settings), and specific settings (e.g., wordprocessing defaults, email defaults, etc.) for applications identified as keys.
- a key is a name of a setting in the registry, and a value as used below indicates the value of the setting.
- RegSaveKey will copy an in-memory version of a registry hive to a storage disk in the location specified by the API user.
- a hive is a logical branch of the registry and is contained within a single file. For example, the following hives exist on Windows NT 4.0:
- a process 30 for backing up registry hives using the system 10 includes the stages shown.
- the process 30, however, is exemplary only and not limiting.
- the process 30 can be altered, e.g., by having stages added, removed, or rearranged.
- the process 30 can be adapted to backup other non-deterministic files or other groups of data whose physical makeup (e.g., bits) changes more significantly than their substantive contents do (i.e., substance represented by the physical makeup).
- the process 30 is preferably implemented in a process running in the background that can perform backup regardless of whether any user is logged on to the system being backed up.
- the process 30 is further preferably implemented by the background process running under a local system account that has sufficient privileges to accomplish the backup (including access to the registry and any appropriate file). This may allow access to most keys for backup and restore processes, including keys that the user running the backup program may not have permission to access (e.g., due to Windows NT security).
- the process 30 steps through a registry to find substantive information and compares the substantive information with stored substantive information to determine what substantive information is new, what substantive information has been previously stored, what substantive information has been changed, and what previously-stored substantive information has been removed. Alternatively, the process could build an organized file to which traditional block differential techniques can be applied.
- the user manipulates the computer 12 (e.g., using a mouse, keyboard, etc.) to have the Win32 API RegSaveKey() store the hive to a disk (e.g., the memory 20) as a file. If this backup is the first backup ever of this hive for the computer 12, then the computer 12 sends this file to the storage 16 and caches it locally as well, and the process 30 ends.
- the file as loaded in memory is the "base" registry file or "OldHive.” If this backup is not the first backup of the hive, then the process 32 proceeds to stage 34.
- the computer 12 uses the Win32 API RegLoadKey() to reload the registry hive file into the registry.
- the computer reloads the hive file under a new name, "CurrentHive,” for comparison.
- the computer 12 recovers the OldHive from the local disk cache.
- the registry hive is recovered as a file as it had existed at the time of the first backup.
- the computer 12 determines which of the multiple OldHives is the newest OldHive that is at least as old as the hive that is to be backed up.
- the restored hive is loaded into the registry.
- the computer 12 loads the hive restored from cache into the registry under the name "OldHive" for comparison with the CurrentHive.
- stage 40 the computer 12 performs a comparison between the two loaded registry hives, CurrentHive and OldHive. Each difference is written to a difference file named "HKEY_LOCAL_MACHINE$ ⁇ HiveName>$CL," where ⁇ HiveName> is the name of the hive being backed up.
- the difference file will contain a list of commands to take the originally backed up registry file (base registry file) and add and subtract (and possibly change) information from it so that it is equivalent to the registry hive file as it existed at the time the difference file was produced.
- base registry file base registry file
- subtract (and possibly change) information from it so that it is equivalent to the registry hive file as it existed at the time the difference file was produced.
- the computer 12 uses standard differential backup techniques to backup the file produced in stage 40. Differences produced will be against the previous difference file created for this registry hive. Preferably, only substantive data in the CurrentHive but not in the OldHive are sent for storage to the backup storage 16 by the computer 12. Some substantive data in the CurrentHive and in the OldHive may be sent, or re-sent, for storage and the system 10 would still be advantageous over storing everything in the CurrentHive or determining differences in traditional ways instead of by analyzing the substantive data. Thus, sending some duplicative data for storage is also within the scope of the invention although at least some, and preferably all, substantive data that are in the CurrentHive and in the OldHive are bypassed for being sent to the backup storage 16.
- stage 40 of FIG. 2 is shown as a process 50 for comparing loaded registry hives using the system 10 and includes the stages shown.
- the process 50 is exemplary only and not limiting.
- the process 50 can be altered, e.g., by having stages added, removed, or rearranged.
- the computer 12 reads major keys (first-level keys) into corresponding key arrays, one array for the CurrentHive and one array for the OldHive. For each key-array element, the computer 12 stores:
- an ACL may be 2K in length but is attached to 45,000 different keys (90 MB).
- Preferably only one actual copy of the ACL is kept in a map and mapped to a tag, with a tag referenced by each key.
- the Win32 API function RegQueryTnfoKey() is used by the computer 12 after opening the key with the RegOpenKey() API.
- the computer 12 uses a call to RegGetKeySecurity().
- the ACL is stored in a separate map and can be looked up using a unique tag. This tag is a unique (for this registry hive) hash code based on the contents of the ACL. If a tag is generated that is identical to an existing tag, but the contents of the ACL differ (hash code collision), then the hash code value is incremented by one until a unique tag is generated.
- the computer 12 sorts each of the two key arrays by szKeyName.
- the computer 12 compares each major key in the CurrentHive key array against the keys in the OldHive key array. Comparing the major keys will yield one of the following results:
- the computer 12 compares each value under the key being processed in the CurrentHive value array against each value under the key being processed in the OldHive value array. For each comparison, one of the following results will occur:
- the computer 12 processes the subkeys of the major key of the CurrentHive and the major key of the OldHive as if they themselves are major keys, according to stages 52 and 54.
- the computer 12 closes the major keys.
- the computer 12 can close the major keys using an appropriate API such as the Win32 API function RegCloseKey().
- the comparison performed by the process 50 shown in FIG. 3 is performed "on the fly" as the substantive information is obtained.
- the computer 12 does not wait to produce an entire file of the substantive information of the CurrentHive and then compare that with the substantive information (in another file) of the OldHive. Instead, the computer 12 compares the substantive information from the CurrentHive with the substantive information from the OldHive as the CurrentHive information is obtained.
- the computer can perform the process 50 by producing two files of substantive information and comparing the substantive-information files, e.g., using standard differential file backup techniques.
- the substantive information files are change-resistant files in that small changes to the substantive content of the files result in a small change to the physical makeup of the change-resistant files.
- the "on-the-fly" technique may be preferred in the interests of conserving time and resources.
- ADD KEY, ADD VALUE, REMOVE KEY, and REMOVE VALUE commands are shown illustratively.
- Subkey AA of Key A is in the CurrentHive but not the OldHive, and thus a corresponding ADD KEY command is produced and put into the difference file.
- the ACL of the key is added to the difference file if that particular ACL is not already in the difference file, with the same being true for class names.
- Value BB1 of Subkey BB of Key B is in the CurrentHive but not the OldHive, and thus a corresponding ADD VALE command is produced and put into the difference file.
- a process 70 for restoring backup up registry hives using the system 10 includes the stages shown.
- the process 70 is exemplary only and not limiting.
- the process 70 can be altered, e.g., by having stages added, removed, or rearranged. Further, the process 70 can be adapted to restore other non-deterministic files or other data groups.
- the computer 12 determines the last full registry hive file that was backed up (a base registry file). Periodically, the difference file may be emptied when it gets larger than a desired size. For example, if the difference file becomes larger than the CurrentHive, then the OldHive can be reset to be the CurrentHive, effectively emptying the difference file. Thus, there will be multiple OldHives. For a backup, the computer 12 determines which of the multiple OldHives is the newest OldHive that is at least as old as the hive that is to be backed up.
- the computer 12 reconstructs the last full registry hive file from the on-disk cache. If this file is not in the cache (e.g., due to loss of memory from e.g., disk crash, machine loss, etc.), then the computer 12 retrieves the file from the backup storage 16.
- the computer 12 loads the reconstructed file from stage 74 as hive "RestoreHive.” To do this, a user uses an appropriate API of the computer 12 such as the RegLoadKey() Win32 API.
- the computer 12 retrieves the appropriate difference file.
- the computer 12 retrieves the HKEY_LOCAL_MACHINE$ ⁇ HiveName>$CL file that corresponds to the registry hive backup for which a registry hive file is to be restored.
- the computer 12 opens and processes the retrieved difference file. For each command in the file, the computer 12 ADDs or DELetes keys and values (and changes values if change/modify value commands are used) from the loaded "RestoreHive". The computer 12 also applies ACL's as appropriate and uses CLASS ID's to find corresponding classes.
- the computer closes the difference file and unloads the "RestoreHive" hive.
- the computer uses an appropriate function such as the Win32 API function RegUnloadKey().
- various techniques may be employed regarding determining and/or storing differences between current files and prior versions of the file.
- the above description focused on storing a baseline version of a file and at each subsequent backup, determining a difference file that represents the differences between the current version and the baseline version, and storing the difference file, with a new baseline file possibly being periodically stored.
- a file to be backed up can be stored locally and at each subsequent backup, a logical difference can be determined between the most-recently backed up version (as opposed to a baseline version) and the current version, and the determined difference stored as a difference file.
- a binary version of a file can be converted to a canonical form that is amenable to differential backup, and this canonical form backed up.
- the current file can be converted to the canonical form, and traditional differential backup processes applied to determine the differences between the two canonical-form files. Still other techniques are possible and within the scope and spirit of the invention and claims. Restoration using these alternative techniques can be performed by focusing on the substantive data of the backed up files.
- the system 10 can also be used to efficiently backup files or other data sets that break the effectiveness of typical common/redundant-file elimination (CFE/RFE) backup techniques.
- CFE-breaking files that the computer 12 is configured to efficiently backup contain aggregations of files or other data groups, such as email attachments, with indexes or other indicia of data subgroups within the larger file, resembling a database.
- data subgroups may be identical to other data subgroups or files to be backed up.
- the computer 12 is configured to determine the individual data subgroups and reduce, and preferably eliminate, redundant backup of the same data subgroup.
- CFE techniques are not limited to application to files, but may be applied to any group of data.
- an exemplary CFE-breaking file 110 is shown logically, as the file 110 may be physically divided among many, non-consecutive memory locations with appropriate pointers.
- the file 110 includes indexes indicating the beginnings 112, 114 and ends 116, 118 of data subgroups 120, 122.
- Other data subgroups may be contained in the file 110, but only the two data subgroups 120, 122 are shown for exemplary purposes. Examples of the data subgroups 120, 122 are an email and an associated attachment, or an email folder and an associated attachment, although these examples are not limiting.
- Data subgroups may be files or other related sets of data that are not files.
- a process 130 for backing up CFE-breaking files using the system 10 includes the stages shown.
- the process 130 is exemplary only and not limiting.
- the process 130 can be altered, e.g., by having stages added, removed, or rearranged.
- the process 130 may be applied to data sets other than files.
- the computer 12 analyzes the file 110 to determine the data subgroups 120, 122.
- the computer 12 finds the indexes 112, 114, 116, 118 in the file 110 to determine the beginnings and ends of the data subgroups 120, 122, and thus the content of the data subgroups 120, 122.
- the computer 12 stores the data subgroups 120, 122 and applies redundant/common file elimination backup.
- the computer 12 stores the data subgroups 120, 122 in temporary storage, e.g., cache, as the data subgroups 120, 122 are determined.
- the computer 12 also applies standard common file elimination techniques to each stored data subgroup, as it is determined, relative to previously-stored files in the backup storage 16.
- the computer 12 can store the data subgroups 120, 122 into more permanent storage, and perform common file elimination on the stored groups and/or files collectively.
- a cross-referencing database is produced to relate redundant data subgroups with their associated data subgroups. For example, if the data subgroup 120 is an email message and the data subgroup 122 is an attachment that is redundant with a file already stored in the backup storage 16, then the data subgroup 122 will not be backed up in its entirety.
- the computer 12 will insert a reference into the cross-referencing database that relates the data subgroup 120 with the already-stored file that is the same as the data subgroup 122.
- the computer 12 can use the cross-referencing database to determine what data subgroups, e.g., the data subgroup 120, in the file 110 have associated data subgroups, e.g., the data subgroup 122, that were redundant, find the stored redundant data subgroup, and reassemble the file 110 using the stored redundant data subgroup as the data subgroup, here the data subgroup 122, that was not stored in its entirety with the file 110.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
- The invention relates to backing up data and more particularly to backing up non-deterministic files.
- Safeguarding electronic data by backing the data up is a common event, and an event that consumes increasing amount of memory and processing power. Data files today typically occupy much more memory than with previous software programs and thus backing these files up requires more storage space, and more processing power and communication-link bandwidth to transfer and store the files. With enormous amounts of data to back up, it is desirable to reduce data backup to not back up data that has not changed, and to back up as few copies (preferably one) of a file as possible,
- A number of techniques have been developed for network-based computer backup systems that greatly reduce the bandwidth and storage needs of the backup system. Two examples are differential file backup and common file elimination (e.g.,
Cane et al, U.S. Patent No. 5,765,173 ). Generally, differential file backup is performed by determining changes that have occurred within a file using a set of hash codes that represents the information within the file, as it previously existed, in fixed-size blocks. These hash codes are matched up against the same file now modified, determining those areas of the file that have changed and those areas of the file that have not changed. This results in significant bandwidth and space savings for sending and storing the portions of the file that have changed. Common file elimination determines whether a file to be backed up is the same as other files to be backed up (e.g., a file already backed up), and if so, stores only one copy of that file. Common file elimination techniques can be applied to data groups other than files. - International Patent Publication No.
WO 00/029952 - Aspects and examples of the invention are set out in the appended claims.
- Implementations of the invention may include one or more of the following features. The processor may be configured to bypass at least some of the associated data, for transfer for storage, that are present in the stored data. The processor may be configured to transfer the associated substantive data for storage only if the associated substantive data are absent from the stored data. The processor may be configured to determine the associated substantive data by analyzing a structure of the desired data and data associated with portions of the structure. The processor may be configured to map the associated substantive data to a change-resistant format and to compare the associated substantive data with the stored data using the change-resistant formatted substantive data. The processor may be configured to perform differential backup on the associated substantive data to compare the associated substantive data with the stored data.
- Implementations of the invention may also include one or more of the following features. The processor may be further configured to transfer, over the communication link, indicia that substantive data are absent from the stored data and that substantive data in the stored data are absent from the associated substantive data. The indicia may include at least one of an add key command, an add value command, a remove key command, a remove value command, and a change value command. The processor may be configured to determine the associated substantive data by determining groupings of data within the desired data. The processor may be configured to determine the groupings of data by analyzing indexes associated with the desired data. The processor may be configured to perform common file elimination on the groupings of data to compare the associated substantive data with the stored data. The processor may be further configured to transfer, over the communication link, indicia of relationships of data groupings that exist in the stored data to the associated substantive data.
- In addition, implementations of the invention may include one or more of the following features. The processor may be configured to identify the associated substantive data by analyzing a data file containing data subgroups and identifying the data subgroups within the data file, compare the associated substantive data with the stored data by comparing the data subgroups with stored potentially-common data sets, and back up data subgroups based upon the comparison. Thus the processor may back up only those data subgroups that are absent from the stored potentially-common data sets. The processor may store the data subgroups as separate files for comparison. The processor may compare the data subgroups with the stored potentially-common data sets using a common file elimination technique. The processor may provide remove indicia indicating that at least one of the potentially-common data sets is associated with the data file. The processor may provide remove indicia indicating that at least one of the potentially-common, data sets is associated with a particular portion of the data file.
- Various aspects of the invention may provide one or more of the following advantages. Data stored in ways that defeat traditional backup techniques can be backed up while achieving advantages of the traditional backup techniques. Files with little substantive changes and significant non-substantive changes can be backed up with little storage and/or processing, corresponding to the little substantive changes. Data subgroups of larger, aggregated data groups, e.g., files, can be backed up in a non-redundant manner.
- These and other advantages of the invention, along with the invention itself, will be more fully understood after a review of the following figures, detailed description, and claims.
-
-
FIG. 1 is a simplified block diagram of a system for backing up data. -
FIGS. 2-3 are block flow diagrams of a process of backing up a non-deterministic registry data file using the system shown inFIG. 1 . -
FIG. 4 is a block diagram illustrating backing up of a registry file. -
FIG. 5 is a block flow diagram of restoring a backed up registry file. -
FIG. 6 is a simplified diagram of a portion of an aggregate file containing data subgroups. -
FIG. 7 is a block flow diagram of a process of backing up the aggregate file shown inFIG. 6 . - At least some embodiments of the invention provide techniques for determining substantive differential changes in data for storing the substantive changes and/or determining data subsets of larger collections of data that may be similar to other data sets to reduce duplicative storage of similar data sets. Files can be analyzed to determine their substance and compared against the substance of stored files. If the substance of the files differ, then the substantive differences can be stored, while non-substantive differences can be ignored and not backed up. Further, files that contain meaningful subsets can be analyzed to determine the subsets and the subsets compared with stored sets of data. Duplicative sets of data can be bypassed for back up while non-duplicative sets can be backed up.
- It has been discovered that certain files do not conform well to traditional differential file backup techniques. Some files contain within them calculated indexes and pointers (administration data) or in many cases garbage bytes. Garbage bytes do not represent any substantive content/information, but are merely empty space. Such files, that may have been slightly modified or not modified at all substantively, appear different, e.g., as if they were nearly completely different than a previous version of the file if a hash code comparison method is used. This "chaotic" behavior means that almost all the "data" for these files, in order to reliably reconstruct the files, would have to be backed up.
- Some data files (e.g., some mail system files such as Microsoft Outlook .pst files) also reduce the efficacy of common file elimination backup techniques. These data files treat attachments by embedding the attachments within a larger data set, e.g., a file, containing all mail and attachments. Common file elimination techniques will fail to recognize that a subset of the email file (the attachment) would match an existing data set in a storage repository, unless the entire file containing the attachment matched a stored file.
- As used herein, a non-deterministic file or data group is a file or data group in which the physical makeup of the file or data group may change from one moment to the next with very little actual change of the substance or content, or the contents of a file or data group containing data subgroups may change dramatically, while data subgroups of the file or data group may match stored groups. These characteristics can elude traditional backup methods, rendering the traditional backup methods less effective.
- Referring to
FIG. 1 , asystem 10 for backing up non-deterministic data includes acomputer 12, acommunication network 14, and abackup storage 16. Thecomputer 12 includes aprocessor 18 andmemory 20 for storing software instructions that can be executed by theprocessor 12, and for storing data that may be backed up. The software instructions are configured to be executed by theprocessor 18 to perform functions as described below. Thecomputer 12 is configured to send data through thenetwork 14 to thestorage 16 for backup. Thenetwork 14, here shown as the packet-switched network commonly known as the Internet, may be a wide area network (WAN) or a local area network (LAN). Further, thenetwork 14 may be replaced with a simple communication line, thenetwork 14 indicating a communication link between thecomputer 12 and thestorage 16, although the form of the link may vary. - The
computer 12 is configured to have theprocessor 18 assess data, stored in thememory 20, to be backed up and to determine whether and what data to back up. Theprocessor 18 is configured, in conjunction with the stored software, to identify a mechanism through which the actual, substantive information within a file can be accessed. This mechanism preferably identifies only the substantive information and ignores garbage data or administrative portions of the file. Theprocessor 18 is further configured to divide the file into parts representing the real information. Theprocessor 18 is further configured to evaluate the real information for backup, preferably using traditional techniques such as differential backup or common (redundant) file elimination. Evaluation of the real data may be performed with the real data collected in one group (file or memory block) or in the separate parts. Thecomputer 12 can also operate on sets of data other than files. - Registry backup on Windows® NT systems is generally performed by most backup products using a Win32 API (Application Programming Interface) called RegSaveKey, for registry save key. The registry is an inverted tree structured database including descriptions of applications (e.g., types of files such as .txt), user information (e.g., desktop settings), and specific settings (e.g., wordprocessing defaults, email defaults, etc.) for applications identified as keys. A key is a name of a setting in the registry, and a value as used below indicates the value of the setting. RegSaveKey will copy an in-memory version of a registry hive to a storage disk in the location specified by the API user. A hive is a logical branch of the registry and is contained within a single file. For example, the following hives exist on Windows NT 4.0:
- SYSTEM
- SAM
- SECURITY
- SOFTWARE
- USER
- There is one USER hive per user that has an account on the system, but only the logged-on user's USER hive is loaded in memory. All other USER hives remain on disk.
- Backing up the registry has traditionally involved calling the Windows RegSaveKey() API for each in-memory hive. This API is called because the corresponding hive files on the disk cannot be accessed while the hives are loaded because they are in use. RegSaveKey() was traditionally the only mechanism to capture these registry hives as files. The hives would be backed up using a differential backup software engine (e.g., as described in
Cane et al US Patent No. 5,765,173 ). This resulted in the hive file(s) being backed up in full the first time and then the binary changes to the hive file(s) being sent on each successive backup. This mechanism, resulted in 100K or so of data on average per hive being backed up on each backup. - With Windows 2000, the size of the registry hive files increased. Using the mechanism for backing up a registry described in the preceding paragraph, the resulting amount of data per backup grew to almost 12 MB on average for the SOFTWARE hive and other hives were generally over 1 MB. For remote users with slow communication connections, sending multiple megabytes (15 MB+) on every backup, usually daily, to a remote Data Center over such slow connections is unacceptable. Also, the backed up data would need to be stored. This means for a Data Center that supports 10,000 Windows 2000 users, 150 GB of data would typically need to be stored each day, for backing up just the users' registry hives.
- It was discovered that the data RegSaveKey() produced on Windows 2000 was non-deterministic. Two consecutive backups using RegSaveKey, with no real changes to the registry hives produced nearly completely different files. The real data in the files were interspersed with "junk" or filler data, meaningless bytes of whatever happened to be in memory at the time, that varied. Further, the ordering of the data differed even if the "junk" between the real data were ignored.
- Along with the above backup size problem is the corresponding retrieval problem. Multiple MB's of data were typically backed up at every backup using a differential technology. The registry hives would be reconstructed by the Data Center and sent back to the user's machine when they were needed for retrieval. The total size of a reconstructed registry, on Windows 2000 would be in excess of 20 MB; and on a slow connection, this would take far too long.
- In operation, referring to
FIG. 2 , with further reference toFIG. 1 , aprocess 30 for backing up registry hives using thesystem 10 includes the stages shown. Theprocess 30, however, is exemplary only and not limiting. Theprocess 30 can be altered, e.g., by having stages added, removed, or rearranged. Further, theprocess 30 can be adapted to backup other non-deterministic files or other groups of data whose physical makeup (e.g., bits) changes more significantly than their substantive contents do (i.e., substance represented by the physical makeup). Theprocess 30 is preferably implemented in a process running in the background that can perform backup regardless of whether any user is logged on to the system being backed up. Theprocess 30 is further preferably implemented by the background process running under a local system account that has sufficient privileges to accomplish the backup (including access to the registry and any appropriate file). This may allow access to most keys for backup and restore processes, including keys that the user running the backup program may not have permission to access (e.g., due to Windows NT security). Theprocess 30 steps through a registry to find substantive information and compares the substantive information with stored substantive information to determine what substantive information is new, what substantive information has been previously stored, what substantive information has been changed, and what previously-stored substantive information has been removed. Alternatively, the process could build an organized file to which traditional block differential techniques can be applied. - At
stage 32, the user manipulates the computer 12 (e.g., using a mouse, keyboard, etc.) to have the Win32 API RegSaveKey() store the hive to a disk (e.g., the memory 20) as a file. If this backup is the first backup ever of this hive for thecomputer 12, then thecomputer 12 sends this file to thestorage 16 and caches it locally as well, and theprocess 30 ends. The file as loaded in memory is the "base" registry file or "OldHive." If this backup is not the first backup of the hive, then theprocess 32 proceeds to stage 34. - At
stage 34, thecomputer 12 uses the Win32 API RegLoadKey() to reload the registry hive file into the registry. The computer reloads the hive file under a new name, "CurrentHive," for comparison. - At
stage 36, thecomputer 12 recovers the OldHive from the local disk cache. The registry hive is recovered as a file as it had existed at the time of the first backup. As discussed below, there may be multiple OldHives as the OldHive may be periodically reset. For a backup, thecomputer 12 determines which of the multiple OldHives is the newest OldHive that is at least as old as the hive that is to be backed up. - At
stage 38, the restored hive is loaded into the registry. Thecomputer 12 loads the hive restored from cache into the registry under the name "OldHive" for comparison with the CurrentHive. - At
stage 40, thecomputer 12 performs a comparison between the two loaded registry hives, CurrentHive and OldHive. Each difference is written to a difference file named "HKEY_LOCAL_MACHINE$<HiveName>$CL," where <HiveName> is the name of the hive being backed up. The difference file will contain a list of commands to take the originally backed up registry file (base registry file) and add and subtract (and possibly change) information from it so that it is equivalent to the registry hive file as it existed at the time the difference file was produced. A more detailed description ofstage 40 is presented below with respect toFIG. 3 . - At
stage 42, thecomputer 12 uses standard differential backup techniques to backup the file produced instage 40. Differences produced will be against the previous difference file created for this registry hive. Preferably, only substantive data in the CurrentHive but not in the OldHive are sent for storage to thebackup storage 16 by thecomputer 12. Some substantive data in the CurrentHive and in the OldHive may be sent, or re-sent, for storage and thesystem 10 would still be advantageous over storing everything in the CurrentHive or determining differences in traditional ways instead of by analyzing the substantive data. Thus, sending some duplicative data for storage is also within the scope of the invention although at least some, and preferably all, substantive data that are in the CurrentHive and in the OldHive are bypassed for being sent to thebackup storage 16. - Referring to
FIG. 3 , with further reference toFIG. 1 ,stage 40 ofFIG. 2 is shown as aprocess 50 for comparing loaded registry hives using thesystem 10 and includes the stages shown. Theprocess 50, however, is exemplary only and not limiting. Theprocess 50 can be altered, e.g., by having stages added, removed, or rearranged. - At stage 52, the
computer 12 reads major keys (first-level keys) into corresponding key arrays, one array for the CurrentHive and one array for the OldHive. For each key-array element, thecomputer 12 stores: - szKeyName
- The Key's Name
- nNumSubKeys
- The Number of Subkeys under this key
- nNumValues
- The Number of Values under this key
- nACLTag
- The unique tag to an Access Control List (ACL) stored in a separate map
- nClassTag
- The unique tag to a Class Name stored in a separate map.
- The actual ACL and Class name are not stored together with the other information for the key as they tend to be identical to other ACL's and classes. By storing a tag to the ACL or Class, there is significant space savings, both in memory during the
comparison process 50 and when creating, sending, and storing the difference file that gets backed up. For instance, an ACL may be 2K in length but is attached to 45,000 different keys (90 MB). Preferably only one actual copy of the ACL is kept in a map and mapped to a tag, with a tag referenced by each key. - To get major-key information, the Win32 API function RegQueryTnfoKey() is used by the
computer 12 after opening the key with the RegOpenKey() API. To get the keys security information for the ACL, thecomputer 12 uses a call to RegGetKeySecurity(). The ACL is stored in a separate map and can be looked up using a unique tag. This tag is a unique (for this registry hive) hash code based on the contents of the ACL. If a tag is generated that is identical to an existing tag, but the contents of the ACL differ (hash code collision), then the hash code value is incremented by one until a unique tag is generated. - At
stage 54, thecomputer 12 sorts each of the two key arrays by szKeyName. Thecomputer 12 compares each major key in the CurrentHive key array against the keys in the OldHive key array. Comparing the major keys will yield one of the following results: - 1. Key in CurrentHive but not in OldHive: In this case, the
computer 12 writes out an "Add Key" command to the file to be backed up. Also, thecomputer 12 enumerates through all subkeys and values of this major key and adds an "Add Key" and an "Add Value" command for each. Effectively the whole tree under that major subkey is new to the current hive. - 2. Key not in CurrentHive but in OldHive: In this case, the
computer 12 writes out a "Remove Key" command to the file to be backed up. - 3. Key is in Both CurrentHive and OldHive: In this case; the
computer 12 determines if a class name or an ACL is different between these two major keys, even though the name is the same. If either is different, then thecomputer 12 writes the class information and/or ACL information of CunentHive's key to the file to be backed up (the "difference file") if the corresponding class and/or ACL has not already been written to the difference file. Thecomputer 12 outputs a "Modified Key" command to the file with the tags (the "tag fiie") for the class and/or ACL's. Thecomputer 12 compares two value arrays as described with respect tosubstages - At
substage 56, similar to stage 52, value names and values of keys in the CurrentHive and the OldHive that exist under the major keys that matched are read and sorted by thecomputer 12. These values can be gotten by thecomputer 12 with an appropriate API such as the RegEnumValue() Win32 API. Thecomputer 12 sorts the values into a CurrentHive value array and an OldHive value array. - At
substage 58, thecomputer 12 compares each value under the key being processed in the CurrentHive value array against each value under the key being processed in the OldHive value array. For each comparison, one of the following results will occur: - 1. Value in CurrentHive but not in OldHive: The
computer 12 writes out an "Add Value" command to the difference file. - 2. Value not in Current Hive but in Old Hive: The
computer 12 writes out a "Remove Value" command to the difference file. - 3. Value is in Both the Current Hive and the Old Hive and is the Same: No output.
- 4. Value is in Both the Current Hive and the Old Hive but is Different: The
computer 12 writes a "Change Value" command (aka modify value command, etc.) to the difference file. The change value command is logically equivalent to a remove value command combined with an add value command, and can be implemented as such. - Also, the
computer 12 processes the subkeys of the major key of the CurrentHive and the major key of the OldHive as if they themselves are major keys, according tostages 52 and 54. - At
stage 60, thecomputer 12 closes the major keys. Thecomputer 12 can close the major keys using an appropriate API such as the Win32 API function RegCloseKey(). - The comparison performed by the
process 50 shown inFIG. 3 is performed "on the fly" as the substantive information is obtained. Thecomputer 12 does not wait to produce an entire file of the substantive information of the CurrentHive and then compare that with the substantive information (in another file) of the OldHive. Instead, thecomputer 12 compares the substantive information from the CurrentHive with the substantive information from the OldHive as the CurrentHive information is obtained. Alternatively, the computer can perform theprocess 50 by producing two files of substantive information and comparing the substantive-information files, e.g., using standard differential file backup techniques. The substantive information files are change-resistant files in that small changes to the substantive content of the files result in a small change to the physical makeup of the change-resistant files. The "on-the-fly" technique may be preferred in the interests of conserving time and resources. - Referring to
FIG. 4 , ADD KEY, ADD VALUE, REMOVE KEY, and REMOVE VALUE commands are shown illustratively. As shown, Subkey AA of Key A is in the CurrentHive but not the OldHive, and thus a corresponding ADD KEY command is produced and put into the difference file. For each ADD KEY command placed in the difference file, the ACL of the key is added to the difference file if that particular ACL is not already in the difference file, with the same being true for class names. Similarly to the new Subkey AA, Value BB1 of Subkey BB of Key B is in the CurrentHive but not the OldHive, and thus a corresponding ADD VALE command is produced and put into the difference file. For any ADD command (key or value), enough information is placed in the difference file so that the corresponding key or value can be added back to a registry hive later. The Value AB2 of the Subkey AB of the Key A is in the OldHive but not in the CurrentHive and thus a corresponding REMOVE VALUE command is produced and put into the difference file. Similarly, the Subkey AC of the Key A is in the OldHive but not in the CurrentHive and thus a corresponding REMOVE KEY command is produced and put into the difference file.FIG. 4 does not show any MODIFY KEY commands, but these may be produced and put into the difference file if, for any of the matches of keys or values shown, the corresponding ACL's or class names differ. - Referring to
FIG. 5 , with further reference toFIG. 1 , aprocess 70 for restoring backup up registry hives using thesystem 10 includes the stages shown. Theprocess 70, however, is exemplary only and not limiting. Theprocess 70 can be altered, e.g., by having stages added, removed, or rearranged. Further, theprocess 70 can be adapted to restore other non-deterministic files or other data groups. - At
stage 72, thecomputer 12 determines the last full registry hive file that was backed up (a base registry file). Periodically, the difference file may be emptied when it gets larger than a desired size. For example, if the difference file becomes larger than the CurrentHive, then the OldHive can be reset to be the CurrentHive, effectively emptying the difference file. Thus, there will be multiple OldHives. For a backup, thecomputer 12 determines which of the multiple OldHives is the newest OldHive that is at least as old as the hive that is to be backed up. - At
stage 74, thecomputer 12 reconstructs the last full registry hive file from the on-disk cache. If this file is not in the cache (e.g., due to loss of memory from e.g., disk crash, machine loss, etc.), then thecomputer 12 retrieves the file from thebackup storage 16. - At
stage 76, thecomputer 12 loads the reconstructed file fromstage 74 as hive "RestoreHive." To do this, a user uses an appropriate API of thecomputer 12 such as the RegLoadKey() Win32 API. - At
stage 78, thecomputer 12 retrieves the appropriate difference file. Thecomputer 12 retrieves the HKEY_LOCAL_MACHINE$<HiveName>$CL file that corresponds to the registry hive backup for which a registry hive file is to be restored. - At
stage 80, thecomputer 12 opens and processes the retrieved difference file. For each command in the file, thecomputer 12 ADDs or DELetes keys and values (and changes values if change/modify value commands are used) from the loaded "RestoreHive". Thecomputer 12 also applies ACL's as appropriate and uses CLASS ID's to find corresponding classes. - At
stage 82, the computer closes the difference file and unloads the "RestoreHive" hive. To unload the RestoreHive file, the computer uses an appropriate function such as the Win32 API function RegUnloadKey(). - Other embodiments are within the scope and spirit of the invention and the appended claims. For example, although the above description focused on backing up Registries, the techniques described are not limited to backing up Registries or files. The techniques may be applied to any number of other data sets that impede traditional differential file backups. The techniques may be applied using knowledge of the substance or meaning of the contents of the data sets where the data sets contain indices and/or junk data, and/or display chaotic behavior due to modifications.
- Further, various techniques may be employed regarding determining and/or storing differences between current files and prior versions of the file. The above description focused on storing a baseline version of a file and at each subsequent backup, determining a difference file that represents the differences between the current version and the baseline version, and storing the difference file, with a new baseline file possibly being periodically stored. Alternatively, a file to be backed up can be stored locally and at each subsequent backup, a logical difference can be determined between the most-recently backed up version (as opposed to a baseline version) and the current version, and the determined difference stored as a difference file. Further, a binary version of a file can be converted to a canonical form that is amenable to differential backup, and this canonical form backed up. At each subsequent backup, the current file can be converted to the canonical form, and traditional differential backup processes applied to determine the differences between the two canonical-form files. Still other techniques are possible and within the scope and spirit of the invention and claims. Restoration using these alternative techniques can be performed by focusing on the substantive data of the backed up files.
- Referring again to
FIG. 1 , thesystem 10 can also be used to efficiently backup files or other data sets that break the effectiveness of typical common/redundant-file elimination (CFE/RFE) backup techniques. CFE-breaking files that thecomputer 12 is configured to efficiently backup contain aggregations of files or other data groups, such as email attachments, with indexes or other indicia of data subgroups within the larger file, resembling a database. With such CFE-breaking files, data subgroups may be identical to other data subgroups or files to be backed up. Thecomputer 12 is configured to determine the individual data subgroups and reduce, and preferably eliminate, redundant backup of the same data subgroup. Thus, CFE techniques are not limited to application to files, but may be applied to any group of data. - Referring to
FIG. 6 , an exemplary CFE-breakingfile 110 is shown logically, as thefile 110 may be physically divided among many, non-consecutive memory locations with appropriate pointers. Thefile 110 includes indexes indicating thebeginnings data subgroups file 110, but only the twodata subgroups data subgroups - Referring to
FIG. 7 , with further reference toFIGS. 1 and6 , aprocess 130 for backing up CFE-breaking files using thesystem 10 includes the stages shown. Theprocess 130, however, is exemplary only and not limiting. Theprocess 130 can be altered, e.g., by having stages added, removed, or rearranged. Theprocess 130 may be applied to data sets other than files. - At
stage 132, thecomputer 12 analyzes thefile 110 to determine thedata subgroups computer 12 finds theindexes file 110 to determine the beginnings and ends of thedata subgroups data subgroups - At
stage 134, thecomputer 12 stores thedata subgroups computer 12 stores thedata subgroups data subgroups computer 12 also applies standard common file elimination techniques to each stored data subgroup, as it is determined, relative to previously-stored files in thebackup storage 16. Alternatively, thecomputer 12 can store thedata subgroups - At
stage 136, a cross-referencing database is produced to relate redundant data subgroups with their associated data subgroups. For example, if thedata subgroup 120 is an email message and thedata subgroup 122 is an attachment that is redundant with a file already stored in thebackup storage 16, then thedata subgroup 122 will not be backed up in its entirety. Thecomputer 12 will insert a reference into the cross-referencing database that relates thedata subgroup 120 with the already-stored file that is the same as thedata subgroup 122. Thus, thecomputer 12 can use the cross-referencing database to determine what data subgroups, e.g., thedata subgroup 120, in thefile 110 have associated data subgroups, e.g., thedata subgroup 122, that were redundant, find the stored redundant data subgroup, and reassemble thefile 110 using the stored redundant data subgroup as the data subgroup, here thedata subgroup 122, that was not stored in its entirety with thefile 110. - Still other embodiments are within the scope of the appended claims. For example, due to the nature of software, functions described above can be implemented using software, hardware, firmware, hardwiring, or combinations of any of these. Features implementing functions may also be physically located at various positions, including being distributed such that portions of functions are implemented at different physical locations.
Claims (11)
- A system (10) for backing up desired data comprising a file, the system (10) comprising:a communication link (14) configured to transfer information between the system (10) and a backup-storage facility (16) for storing backed up data;a processor (18) coupled to the communication link (14) to transfer selected portions of associated data over the communication link (14) to the backup-storage facility (16) for storage; anda computer program product comprising instructions operable to configure the processor (18) to:identify, within the desired data, substantive data comprising current content of the file by analyzing a data file containing data subgroups and identifying the data subgroups within the data file;store the data subgroups as separate files for comparison;compare the substantive data of the desired data with stored data comprising the content of stored files by comparing the data subgroups with stored potentially-common data sets; andin accordance with that comparison, select the portions of the desired data to be transferred over the communication link (14) for storage and transfer the selected portions of the desired data, thereby backing up data subgroups based upon the comparison.
- The system (10) of claim 1 wherein the processor (18) is configured to bypass at least some of the substantive data, for transfer for storage, that are present in the stored data.
- The system (10) of claim 1 or 2 wherein the processor (18) is configured to transfer the substantive data for storage only if the substantive data are absent from the stored data.
- The system (10) of any preceding claim wherein the processor (18) is configured to perform differential backup on the substantive data to compare the substantive data with the stored data.
- The system (10) of any preceding claim wherein the processor (18) is further configured to transfer, over the communication link (14) indicia that substantive data are absent from the stored substantive data and that substantive data in the stored data are absent from the substantive data.
- The system (10) of claim 5 wherein the indicia include at least one of an add key command, an add value command, a remove key command, a remove value command, and a change value command.
- The system (10) of claim 1 wherein the processor (18) backs up only those data subgroups that are absent from the stored potentially-common data sets.
- The system (10) of claim 1, or 7 wherein the processor (18) compares the data subgroups with the stored potentially-common data sets using a common file elimination technique.
- The system (10) of any of claims 1, 7 or 8 wherein the processor (18) provides remove indicia indicating that at least one of the potentially-common data sets is associated with the data file.
- The system (10) of claim 9 wherein the processor (18) provides remove indicia indicating that at least one of the potentially-common data sets is associated with a particular portion of the data file.
- The computer program product of any of the preceding claims.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US31768401P | 2001-09-06 | 2001-09-06 | |
US317684P | 2001-09-06 | ||
US10/235,304 US7509356B2 (en) | 2001-09-06 | 2002-09-05 | Data backup |
US235304 | 2002-09-05 | ||
PCT/US2002/028406 WO2003023617A2 (en) | 2001-09-06 | 2002-09-06 | Selective data backup |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1428123A2 EP1428123A2 (en) | 2004-06-16 |
EP1428123B1 true EP1428123B1 (en) | 2011-11-02 |
Family
ID=26928792
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP02757632A Expired - Lifetime EP1428123B1 (en) | 2001-09-06 | 2002-09-06 | Selective data backup |
Country Status (6)
Country | Link |
---|---|
US (1) | US7509356B2 (en) |
EP (1) | EP1428123B1 (en) |
JP (2) | JP2005502956A (en) |
AU (1) | AU2002323635A1 (en) |
HK (1) | HK1067197A1 (en) |
WO (1) | WO2003023617A2 (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8443045B2 (en) * | 2002-10-01 | 2013-05-14 | Honda Motor Co., Ltd. | Storage of selected e-mails including attachments in document management system |
US7251680B2 (en) * | 2003-10-31 | 2007-07-31 | Veritas Operating Corporation | Single instance backup of email message attachments |
US7913053B1 (en) | 2005-02-15 | 2011-03-22 | Symantec Operating Corporation | System and method for archival of messages in size-limited containers and separate archival of attachments in content addressable storage |
US7370050B2 (en) * | 2005-02-28 | 2008-05-06 | Microsoft Corporation | Discoverability and enumeration mechanisms in a hierarchically secure storage system |
US7974952B1 (en) * | 2005-04-18 | 2011-07-05 | Emc Corporation | Tracking file system changes for backup |
US7624129B2 (en) * | 2006-06-30 | 2009-11-24 | Microsoft Corporation | Dual logging of changes to a user preference in a computer device |
US8458127B1 (en) * | 2007-12-28 | 2013-06-04 | Blue Coat Systems, Inc. | Application data synchronization |
US8527465B1 (en) * | 2008-12-24 | 2013-09-03 | Emc Corporation | System and method for modeling data change over time |
US8166038B2 (en) * | 2009-06-11 | 2012-04-24 | Kaufman Mark A | Intelligent retrieval of digital assets |
US9390088B2 (en) | 2013-04-22 | 2016-07-12 | International Business Machines Corporation | Ensuring access to long-term stored electronic documents |
JP7108784B2 (en) * | 2018-08-21 | 2022-07-28 | 華為技術有限公司 | DATA STORAGE METHOD, DATA ACQUISITION METHOD, AND DEVICE |
CN112328171B (en) * | 2020-10-23 | 2024-04-30 | 苏州元核云技术有限公司 | Data distribution prediction method, data equalization method, device and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5274807A (en) * | 1990-11-01 | 1993-12-28 | At&T Bell Laboratories | Method for reducing magnetic storage volume for computer disk image backup |
Family Cites Families (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5613113A (en) * | 1993-10-08 | 1997-03-18 | International Business Machines Corporation | Consistent recreation of events from activity logs |
US5819020A (en) * | 1995-10-16 | 1998-10-06 | Network Specialists, Inc. | Real time backup system |
US5765173A (en) * | 1996-01-11 | 1998-06-09 | Connected Corporation | High performance backup via selective file saving which can perform incremental backups and exclude files and uses a changed block signature list |
US5819251A (en) * | 1996-02-06 | 1998-10-06 | Oracle Corporation | System and apparatus for storage retrieval and analysis of relational and non-relational data |
US5813008A (en) * | 1996-07-12 | 1998-09-22 | Microsoft Corporation | Single instance storage of information |
US5758359A (en) * | 1996-10-24 | 1998-05-26 | Digital Equipment Corporation | Method and apparatus for performing retroactive backups in a computer system |
US6038665A (en) * | 1996-12-03 | 2000-03-14 | Fairbanks Systems Group | System and method for backing up computer files over a wide area computer network |
US6088693A (en) * | 1996-12-06 | 2000-07-11 | International Business Machines Corporation | Data management system for file and database management |
US6157931A (en) * | 1997-02-11 | 2000-12-05 | Connected Corporation | Database/template driven file selection for backup programs |
US5907848A (en) * | 1997-03-14 | 1999-05-25 | Lakeview Technology, Inc. | Method and system for defining transactions from a database log |
US6199074B1 (en) * | 1997-10-09 | 2001-03-06 | International Business Machines Corporation | Database backup system ensuring consistency between primary and mirrored backup database copies despite backup interruption |
US5991772A (en) * | 1997-10-31 | 1999-11-23 | Oracle Corporation | Method and apparatus for restoring a portion of a database |
US6088694A (en) * | 1998-03-31 | 2000-07-11 | International Business Machines Corporation | Continuous availability and efficient backup for externally referenced objects |
US6032145A (en) * | 1998-04-10 | 2000-02-29 | Requisite Technology, Inc. | Method and system for database manipulation |
US6189016B1 (en) * | 1998-06-12 | 2001-02-13 | Microsoft Corporation | Journaling ordered changes in a storage volume |
US6279011B1 (en) * | 1998-06-19 | 2001-08-21 | Network Appliance, Inc. | Backup and restore for heterogeneous file server environment |
US6269381B1 (en) * | 1998-06-30 | 2001-07-31 | Emc Corporation | Method and apparatus for backing up data before updating the data and for restoring from the backups |
US6141660A (en) * | 1998-07-16 | 2000-10-31 | International Business Machines Corporation | Command line interface for creating business objects for accessing a hierarchical database |
US6115772A (en) * | 1998-09-18 | 2000-09-05 | International Business Machines, Inc. | System and method for host expansion and connection adaptability for a SCSI storage array |
US6385626B1 (en) * | 1998-11-19 | 2002-05-07 | Emc Corporation | Method and apparatus for identifying changes to a logical object based on changes to the logical object at physical level |
JP2000200208A (en) | 1999-01-06 | 2000-07-18 | Fujitsu Ltd | Method and device for file backup, and program recording medium |
US6212512B1 (en) * | 1999-01-06 | 2001-04-03 | Hewlett-Packard Company | Integration of a database into file management software for protecting, tracking and retrieving data |
US6397307B2 (en) * | 1999-02-23 | 2002-05-28 | Legato Systems, Inc. | Method and system for mirroring and archiving mass storage |
US6374265B1 (en) * | 1999-03-29 | 2002-04-16 | Inventec Corp. | Method for backup and recovery of the long filename in computer system |
US6513051B1 (en) * | 1999-07-16 | 2003-01-28 | Microsoft Corporation | Method and system for backing up and restoring files stored in a single instance store |
US6317755B1 (en) * | 1999-07-26 | 2001-11-13 | Motorola, Inc. | Method and apparatus for data backup and restoration in a portable data device |
US6526418B1 (en) * | 1999-12-16 | 2003-02-25 | Livevault Corporation | Systems and methods for backing up data files |
US6460055B1 (en) * | 1999-12-16 | 2002-10-01 | Livevault Corporation | Systems and methods for backing up data files |
WO2001061563A1 (en) * | 2000-02-18 | 2001-08-23 | Avamar Technologies, Inc. | Hash file system and method for use in a commonality factoring system |
US7072916B1 (en) * | 2000-08-18 | 2006-07-04 | Network Appliance, Inc. | Instant snapshot |
US7730213B2 (en) * | 2000-12-18 | 2010-06-01 | Oracle America, Inc. | Object-based storage device with improved reliability and fast crash recovery |
US6868417B2 (en) * | 2000-12-18 | 2005-03-15 | Spinnaker Networks, Inc. | Mechanism for handling file level and block level remote file accesses using the same server |
US6745209B2 (en) * | 2001-08-15 | 2004-06-01 | Iti, Inc. | Synchronization of plural databases in a database replication system |
US6898688B2 (en) * | 2001-12-28 | 2005-05-24 | Storage Technology Corporation | Data management appliance |
US6820098B1 (en) * | 2002-03-15 | 2004-11-16 | Hewlett-Packard Development Company, L.P. | System and method for efficient and trackable asynchronous file replication |
US7302536B2 (en) * | 2003-06-17 | 2007-11-27 | Hitachi, Ltd. | Method and apparatus for managing replication volumes |
-
2002
- 2002-09-05 US US10/235,304 patent/US7509356B2/en not_active Expired - Fee Related
- 2002-09-06 AU AU2002323635A patent/AU2002323635A1/en not_active Abandoned
- 2002-09-06 EP EP02757632A patent/EP1428123B1/en not_active Expired - Lifetime
- 2002-09-06 WO PCT/US2002/028406 patent/WO2003023617A2/en active Application Filing
- 2002-09-06 JP JP2003527601A patent/JP2005502956A/en not_active Withdrawn
-
2004
- 2004-12-15 HK HK04109950.5A patent/HK1067197A1/en not_active IP Right Cessation
-
2009
- 2009-05-11 JP JP2009115038A patent/JP2009181590A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5274807A (en) * | 1990-11-01 | 1993-12-28 | At&T Bell Laboratories | Method for reducing magnetic storage volume for computer disk image backup |
Also Published As
Publication number | Publication date |
---|---|
JP2009181590A (en) | 2009-08-13 |
US7509356B2 (en) | 2009-03-24 |
HK1067197A1 (en) | 2005-04-01 |
US20030135524A1 (en) | 2003-07-17 |
WO2003023617A3 (en) | 2004-03-18 |
AU2002323635A1 (en) | 2003-03-24 |
WO2003023617A2 (en) | 2003-03-20 |
JP2005502956A (en) | 2005-01-27 |
EP1428123A2 (en) | 2004-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11561931B2 (en) | Information source agent systems and methods for distributed data storage and management using content signatures | |
EP0706686B1 (en) | System and method for distributed storage management on networked computer systems | |
JP2009181590A (en) | Selective data backup | |
US6374266B1 (en) | Method and apparatus for storing information in a data processing system | |
US5778395A (en) | System for backing up files from disk volumes on multiple nodes of a computer network | |
US20120131001A1 (en) | Methods and computer program products for generating search results using file identicality | |
US7680998B1 (en) | Journaled data backup during server quiescence or unavailability | |
US7203711B2 (en) | Systems and methods for distributed content storage and management | |
US6378054B1 (en) | Data backup device and method for use with a computer, and computer-readable recording medium having data backup program recorded thereon | |
CA2178213C (en) | Incremental backup system | |
US20040167941A1 (en) | System and method for archiving objects in an information store | |
US9002800B1 (en) | Archive and backup virtualization | |
EP2013974A2 (en) | Data compression and storage techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20040406 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LI LU MC NL PT SE SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL LT LV MK RO SI |
|
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1067197 Country of ref document: HK |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: IRON MOUNTAIN INCORPORATED |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 60241465 Country of ref document: DE Effective date: 20120202 |
|
REG | Reference to a national code |
Ref country code: HK Ref legal event code: GR Ref document number: 1067197 Country of ref document: HK |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20120803 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 60241465 Country of ref document: DE Effective date: 20120803 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: 732E Free format text: REGISTERED BETWEEN 20130808 AND 20130814 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: TP Owner name: AUTONOMY, INC., US Effective date: 20130930 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R081 Ref document number: 60241465 Country of ref document: DE Owner name: AUTONOMY, INC., US Free format text: FORMER OWNER: CONNECTED CORP., FRAMINGHAM, US Effective date: 20111208 Ref country code: DE Ref legal event code: R081 Ref document number: 60241465 Country of ref document: DE Owner name: AUTONOMY, INC., US Free format text: FORMER OWNER: IRON MOUNTAIN INC., BOSTON, US Effective date: 20131024 Ref country code: DE Ref legal event code: R081 Ref document number: 60241465 Country of ref document: DE Owner name: AUTONOMY, INC., SAN FRANCISCO, US Free format text: FORMER OWNER: CONNECTED CORP., FRAMINGHAM, MASS., US Effective date: 20111208 Ref country code: DE Ref legal event code: R081 Ref document number: 60241465 Country of ref document: DE Owner name: AUTONOMY, INC., SAN FRANCISCO, US Free format text: FORMER OWNER: IRON MOUNTAIN INC., BOSTON, MASS., US Effective date: 20131024 Ref country code: DE Ref legal event code: R081 Ref document number: 60241465 Country of ref document: DE Owner name: ENTIT SOFTWARE LLC, SUNNYVALE, US Free format text: FORMER OWNER: IRON MOUNTAIN INC., BOSTON, MASS., US Effective date: 20131024 Ref country code: DE Ref legal event code: R081 Ref document number: 60241465 Country of ref document: DE Owner name: ENTIT SOFTWARE LLC, SUNNYVALE, US Free format text: FORMER OWNER: CONNECTED CORP., FRAMINGHAM, MASS., US Effective date: 20111208 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 15 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 16 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R081 Ref document number: 60241465 Country of ref document: DE Owner name: MICRO FOCUS LLC (N.D.GES.D.STAATES DELAWARE), , US Free format text: FORMER OWNER: AUTONOMY, INC., SAN FRANCISCO, CALIF., US Ref country code: DE Ref legal event code: R082 Ref document number: 60241465 Country of ref document: DE Representative=s name: EISENFUEHR SPEISER PATENTANWAELTE RECHTSANWAEL, DE Ref country code: DE Ref legal event code: R081 Ref document number: 60241465 Country of ref document: DE Owner name: ENTIT SOFTWARE LLC, SUNNYVALE, US Free format text: FORMER OWNER: AUTONOMY, INC., SAN FRANCISCO, CALIF., US |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 17 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: 732E Free format text: REGISTERED BETWEEN 20181105 AND 20181107 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20190820 Year of fee payment: 18 Ref country code: DE Payment date: 20190820 Year of fee payment: 18 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20190820 Year of fee payment: 18 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R082 Ref document number: 60241465 Country of ref document: DE Representative=s name: EISENFUEHR SPEISER PATENTANWAELTE RECHTSANWAEL, DE Ref country code: DE Ref legal event code: R081 Ref document number: 60241465 Country of ref document: DE Owner name: MICRO FOCUS LLC (N.D.GES.D.STAATES DELAWARE), , US Free format text: FORMER OWNER: ENTIT SOFTWARE LLC, SUNNYVALE, CALIF., US |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 60241465 Country of ref document: DE |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20200906 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20210401 Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20200930 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20200906 |