EP3646198A1 - Efficient backup of compaction based databases - Google Patents

Efficient backup of compaction based databases

Info

Publication number
EP3646198A1
EP3646198A1 EP18824074.1A EP18824074A EP3646198A1 EP 3646198 A1 EP3646198 A1 EP 3646198A1 EP 18824074 A EP18824074 A EP 18824074A EP 3646198 A1 EP3646198 A1 EP 3646198A1
Authority
EP
European Patent Office
Prior art keywords
files
file
subsequent file
subsequent
compaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP18824074.1A
Other languages
German (de)
French (fr)
Other versions
EP3646198A4 (en
Inventor
Rajath Subramanyam
Pin ZHOU
Prasenjit Sarkar
Rohit Shekhar
Hyojun Kim
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rubrik Inc
Original Assignee
Rubrik Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rubrik Inc filed Critical Rubrik Inc
Publication of EP3646198A1 publication Critical patent/EP3646198A1/en
Publication of EP3646198A4 publication Critical patent/EP3646198A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/116Details of conversion of file system types or formats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/835Timestamp
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/84Using snapshots, i.e. a logical point-in-time copy of the data

Definitions

  • SSTables String Tables to store data entries therein. Examples of such systems include Cassandra, LevelDB, RocksDB, and BigTable database systems. SSTables were
  • An SSTable is an immutable data file that is sorted by database keys. As such, any edits or writes that need to be made to a particular SSTable instead creates a new SSTable with those writes or edits reflected in the new SSTable.
  • Embodiments disclosed herein provide systems, methods, and computer readable media for efficiently performing incremental backups of compaction based databases.
  • the method includes backing up two or more files from a first database system and, after backing up the two or more files, identifying a subsequent file from the first database system for backup.
  • the method further provides determining that the subsequent file comprises a compaction of the two or more files and, responsively, refraining to back up the subsequent file.
  • backing up the two or more files comprises at a first time, backing up one or more first files and, after the first time, identifying a second file for backup.
  • Those embodiments further include determining that the second file does not comprise a compaction of the one or more first files and, responsively, backing up the second file.
  • the subsequent file comprises a compaction of the two or more files when the subsequent file contains only information included in the two or more files.
  • determining that the subsequent file comprises a compaction of the two or more files comprises accessing ancestor information for the subsequent file and determining that the ancestor information indicates that the subsequent file was created by compacting the two or more files.
  • the method may further include adding ancestry information to an ancestry map and, after determining that the subsequent file was created by compacting the two or more files, using the ancestry information to determine that the two or more files have already been backed up.
  • the method may include adding subsequent ancestr' information about the subsequent file to the ancestry map. The subsequent ancestry information indicates that the two or more files are ancestors of the subsequent file,
  • determining that the subsequent file comprises a compaction of the two or more files comprises determining a first maximum timestamp for data entries in the subsequent file and determining that the first maximum timestamp is not greater than a maximum timestamp for data entries in the two or more files.
  • the method includes identifying the subsequent file for restoration to the first database system, compacting the two or more files to recreate the subsequent file, and restoring the subsequent file to the first database system.
  • the method includes identifying the subsequent file for restoration to the first database system, determining that the two or more files are ancestors of the subsequent file, and restoring the two or more files to the first database system.
  • the two or more files and the subsequent file comprise
  • a system having one or more computer readable storage media and a processing system operatively coupled with the one or more computer readable storage media.
  • Program instructions stored on the one or more computer readable storage media when read and executed by the processing system, direct the processing system to back up two or more files from a first database system and, after backing up the two or more files, identify a subsequent file from the first database system for backup.
  • the program instructions further direct the processing system to determine that the subsequent file comprises a compaction of the two or more files and, responsively, refrain to back up the subsequent file.
  • a computer readable storage medium having program instructions stored thereon.
  • the program instructions when executed by a processing system, direct the processing system to back up two or more files from a first database system and, after backing up the two or more files, identify a subsequent file from the first database system for backup.
  • the program instructions further direct the processing system to determine that the subsequent file comprises a compaction of the two or more files and, responsively, refrain to back up the subsequent file.
  • Figure 1 illustrates an implementation for efficiently backing up a compaction based database system
  • Figure 2 illustrates a scenario for the implementation to efficiently back up a compaction based database system.
  • Figure 3 illustrates another implementation for efficiently backing up a compaction based database system.
  • Figure 4 illustrates a scenario for the other implementation to efficiently back up a compaction based database system.
  • Figure 5 illustrates another scenario for the other implementation to efficiently back up a compaction based database system.
  • Figure 6 illustrates another scenario for the other implementation to efficiently back up a compaction based database system.
  • Figure 7 illustrates another scenario for the other implementation to efficiently back up a compaction based database system.
  • Figure 8 illustrates another scenario for the other implementation to efficiently back up a compaction based database system.
  • Figure 9 illustrates another scenario for the other implementation to efficiently back up a compaction based database system.
  • Figure 10 illustrates another scenario for the other implementation to efficiently back up a compaction based database system.
  • Figure 1 1 illustrates a computing architecture for efficiently backing up a compaction based database system.
  • the incremental backup technology described herein mitigates the storage space inefficiencies caused when incrementally backing up a compaction based database system.
  • the examples herein determine whether a file (e.g., an SSTable) being backed up is a compaction of two or more files that were backed up in a previous incremental backup. If the file is a compaction, then the file is not copied since all the entries within the file are already stored in backups of other files (i.e., the files from which the compacted file was created). Instead, a record, called an ancestry map herein, is kept to indicate the files that were compacted to create the compacted file. Should the compacted file ever need to be recovered, the record is used to identify the files that were copied to retrieve the data entries in the compacted file.
  • a file e.g., an SSTable
  • Figure 1 illustrates implementation 100 for efficiently backing up a compaction based database system.
  • Implementation 100 includes backup system 101, database system 102, data repository 121, and data repository 122.
  • Backup system 101 and database system 102 communicate over communication link 111.
  • database system 102 hosts a database that stores data entries in immutable files that may be compacted to conserve storage space in data repository 122.
  • database system 102 is a compaction based database system.
  • Backup system 101 is tasked with incrementally backing up the database files stored in data repository 122. To that end, backup system 101 identifies any file changes that have occurred since the most recent backup and copies those file changes, if necessary, to data repository 121. Specifically, backup system 101 only copies file changes (i.e., new files in the case of the immutable database files described herein) that include data entries not already stored in data repository 121.
  • backup system 101 if backup system 101 identifies a new file and that file is a compaction of files already stored in data repository 121, then backup system 101 refrains from copying that new file to data repository 121. If, however, the new file is not a compaction of other files or is a compaction of at least one file that was not already copied in a previous backup, then backup system 101 will copy the file as not all data entries in the new file have been copied to data repository 121.
  • FIG. 2 illustrates a scenario 200 for implementation 100 to efficiently back up a compaction based database system.
  • backup system 101 backs up two or more files from database system 102 (201).
  • Each of the two or more files comprises a file in data repository 122 that includes at least one data entry that was not previously copied in an ancestor of the file already stored in data repository 121 by backup system 101.
  • the files may be SSTables, which is a common format for compaction based database systems, but may be some other type of immutable database file.
  • the files may be backed up during the same incremental backup of database system 102, may be backed up during different incremental backups of database system 102, or may be backed up during an initial backup that includes all contents of data repository 122.
  • backup system 101 After backing up the two or more files, backup system 101 identifies a subsequent file from database system 102 to backup (202).
  • the subsequent file may be identified during an incremental backup subsequent to one or more incremental backups that copied the two or more files to data repository 121.
  • the subsequent file comprises a file that is new to data repository 122 since the previous incremental backup was performed.
  • backup system 101 first determines that the subsequent file comprises a compaction of the two or more files (203). As such, the data entries in the subsequent file have already been stored in data repository 121 by virtue of the two or more files having already been stored in data repository 121 along with data entries of the two or more files that are no longer valid and were discarded during compaction.
  • backup system 101 may reference ancestry information about the subsequent file that is maintained by database system 102. For instance, after identifying the subsequent file, backup system 101 may query database system 102 for the subsequent file's ancestry information that indicates whether the subsequent file is a compaction of other files and, if so, which files are the ancestors of the subsequent file. In this case, the ancestry information indicates that the two or more files backed up at step 201 are the ancestors of the subsequent file and backup system 101 can reference its own records to determine that the two or more files have already been copied to data repository 121.
  • backup system 101 refrains from backing up the subsequent file (204).
  • backup system 101 further maintains a map of ancestry information that indicates that the subsequent file was not copied and indicates that the two or more files are the ancestors of the subsequent file. This map maintains a list of all files backed up to data repository 121 and indicates to backup system 101 whether a file itself has already been copied or data that makes up the file has already been copied previously in ancestors of the file. As such, backup system 101 may use the ancestry map when
  • the ancestry map may also be used when accessing the file at a future time. That is, should the subsequent file ever need to be accessed (e.g., for restoration to data repository 122), backup system 101 can reference the ancestry map to identify the two or more files so that the data entries of the subsequent file can be reproduced. In some cases, backup system 101 may compact the two or more files itself to provide the subsequent data file for access or backup system 101 may provide the two or more files to database system 102 so that database system 102 can compact or otherwise handle the two or more files.
  • Backup system 101 is therefore more efficient while still managing to incrementally backup compacted database files.
  • backup system 101 comprises a computer system and communication interface.
  • Backup system 101 may also include other components such as a router, server, data storage system, and power supply.
  • Backup system 101 may reside in a single device or may be distributed across multiple devices.
  • Backup system 101 could be an application server(s), a personal workstation, or some other network capable computing system - including combinations thereof. While shown separately, all or portions of backup system 101 could be integrated with the components of database system 102.
  • Database system 102 comprises a computer system and communication interface.
  • Database system 102 may also include other components such as a router, server, data storage system, and power supply.
  • Database system 102 may reside in a single device or may be distributed across multiple devices, as is common with nodes of distributed databases.
  • Database system 101 could be an application server(s), a personal workstation, or some other network capable computing system - including combinations thereof. While shown separately, all or portions of database system 102 could be integrated with the components of database system 102. Either or both of backup system 101 and database system 102 may be implemented as cloud based systems.
  • Data repositories 121 and 122 each comprise one or more data storage systems having one or more non-transitory storage medium, such as a disk drive, flash drive, magnetic tape, data storage circuitry, or some other memory apparatus.
  • the data storage systems may also include other components such as processing circuitry, a network communication interface, a router, server, data storage system, and power supply.
  • the data storage systems may reside in a single device or may be distributed across multiple devices. For instance, in the case of database system 102 being a distributed database, the storage media may be distributed across nodes of the distributed database.
  • data repositories 121 and 122 may be physically incorporated into backup systems 101 and database system 102, respectively. It should be understood that in no case is the storage media a propagated signal. Either or both of data repositories 1021 and 122 may be implemented as cloud based storage systems.
  • Communication link 111 could be internal system busses or use various communication protocols, such as Time Division Multiplex (TDM), Internet Protocol (IP), Ethernet, communication signaling, Code Division Multiple Access (CDMA), Evolution Data Only (EVDO), Worldwide Interoperability for Microwave Access (WIMAX), Global System for Mobile Communication (GSM), Long Term Evolution (LTE), Wireless Fidelity (WIFI), High Speed Packet Access (HSPA), or some other communication format - including combinations thereof.
  • Communication link 11 1 could be a direct link or may include intermediate networks, systems, or devices.
  • Figure 3 illustrates implementation 300 for efficiently backing up a compaction based database system.
  • Implementation 300 includes backup system 301, database system 302, and communication network 303.
  • Backup system 301 includes data repository 321 storing SSTables 331 and ancestry map 341. While ancestry map 341 is shown outside of data repository 321, ancestry map 341 may be stored in data repository 321 along with SSTables 331.
  • Database system 302 includes data repository 322 storing SSTables 332.
  • backup system 301 incrementally backs up SSTables 332 and stores any SSTables copied in each incremental backup within SSTables 331 .
  • Backup system 301 identifies and retrieves new SSTables at each backup increment via communications exchanged over communication network 303, which may include one or more Local Area Networks (LANs), Wide Area Networks (WANs), the Internet, or some other type of communication network over which data may be transferred - including combinations thereof.
  • Database system 302 is a compaction based database system due to its use of immutable SSTables to store data entries.
  • backup system 301 uses ancestn,' map 341 to track SSTabie ancestry information so that SSTables with ancestors that have all previously been copied to data repository 321 are not copied again.
  • FIG 4 illustrates scenario 400 for implementation 300 to efficiently back up a compaction based database system.
  • backup system 301 creates a snapshot of SSTables 332 (401).
  • the snapshot may be created in the same manner as a conventional incremental backup system would create a snapshot of SSTables in a database.
  • backup system 301 determines whether the data entries in the changed SSTables have already been copied to data repository 321. To that end, backup system 301 obtains ancestor information about the SSTables in the snapshot from database system 302 (402). The ancestor information is generated and maintained by database system 302.
  • an SSTabie that has no ancestors will not have ancestor information or the ancestor information for that SSTabie may indicate that no ancestors exist (e.g., have a NULL value).
  • the ancestor information may be included as metadata with each SSTabie or may be maintained elsewhere by database system 302 either in data repository 322 or in some other storage component accessible by database system 302.
  • backup system 301 determines whether the
  • Backup system 301 may reference the ancestry map 341 to determine whether the SSTabie has already been stored or may determine whether the SSTabie is already stored in some other manner. If the SSTabie has already been stored in data repository 321, then backup system 301 does not copy the SSTabie to data repository 321 because the SSTabie is not a change to SSTables 332 since the previous backup (404). In contrast, if the SSTabie has not been stored in data repository 321, then backup system 301 determines whether all ancestors of the SSTabie are already stored in data repository 321 (405). Backup system 301 makes the aforementioned determination of step 405 rather than simply copying the SSTabie to data repository 321.
  • backup system 301 To determine whether the ancestors of the SSTabie are already stored in data repository 321, backup system 301 references ancestry map 341 with the ancestry
  • ancestry map 341 comprises a record of each SSTabie that has been backed up, but not necessarily copied to data repository 321, from database system 302. That is, ancestry map 341 indicates whether a particular SSTabie has already been copied to data repository 321 or, if the SSTable has not been copied, which SSTables of SSTables 331 are ancestors of the non-copied SSTable (i.e., the SSTables that were compacted to create the non-copied SSTable).
  • Backup system 301 cross references the ancestor information of the SSTable currently being processed with the information in ancestry map 341 .
  • backup system 301 determines that the SSTable does not need to be copied to data repository 321 since the data entries therein are already copied by virtue of their inclusion in the ancestor SSTables. Thus, rather than copying the SSTable, database system 302 adds another entry to ancestiy map 341 for the SSTable and lists the ancestors of the SSTable in that entry (406). If necessary, the ancestry of the SSTable can then be traced at a later time based on that entry in ancestiy map 341.
  • Ancestry map 341 may include pointers to the entries within ancestry map 341 of the ancestors.
  • backup system 301 copies the SSTable to data repository 321 for inclusion in SSTables 331 (407). Additionally, backup system 301 adds an entry to ancestry map 341 for the SSTable (408). In this case, since the SSTable is copies to data repository 321, rather than including ancestiy information for the SSTable, backup system 301 indicates in the entry that the SSTable has been copied and is therefore stored in data repository 321.
  • the entn,' for the SSTable in ancestry map 341 may include a pointer to the SSTable in data repository 321 , [0042] Processing each SSTable from the snapshot in the manner described above using steps 403-408 conserves storage space in data repository 321 by not copying (i.e., storing) any SSTable that can be reproduced using SSTables already stored in data repository 321 (i.e., that SSTable's ancestor).
  • database system 302 may not maintain ancestor information and backup system 301 may instead rely on timestamps, instead of ancestor information provided by database system 302, to filter out (and not back up) SSTables compacted from ancestors already backed up.
  • backup system 301 notes the max timestamp among all of the SSTables already backed up.
  • the max timestamp of a particular SSTable comprises the newest (which is also the highest/max) timestamp from all data entries inside the SSTable.
  • the backup system compares the max timestamp of each SSTable with the max timestamp determined by backup system 301 during the previous backup interval .
  • a compacted SSTable comprises only data entries also included in SSTables compacted to form the compacted SSTable, it logically follows that the compacted SSTable will not have a max timestamp greater than those ancestor SSTables.
  • the max timestamp of a SSTable from the snapshot is greater than the max timestamp of the previous interval, backup system 301 knows that the SSTable includes data entries not already backed up in other SSTables and copies and processes the SSTable from the snapshot. Othenvise, backup system 301 will refrain from backing up the SSTable from the snapshot.
  • the above example will continue to work properly as long as the time used by database system 302 when creating timestamps does not drift too much one way or another.
  • FIG. 5 illustrates scenario 500 for implementation 300 to efficiently back up a compaction based database system.
  • Scenario 500 is an example of how scenario 400 may be applied to an example set of SSTables 332.
  • SSTables 332 include SSTables 501-503 at time Tl .
  • the three SSTables used in scenario 500 is merely exemplary and any number of SSTables may exist in SSTables 332,
  • backup system 301 creates a snapshot of SSTables 332 at time Tl . For each of SSTables 501-503 in the snapshot, backup system 301 references ancestry map 341 at step 2 with respective ancestor information 511-513.
  • the ancestor information for each SSTable 501-503 comprises metadata associated with, and received with, each SSTable 501- 503, although, other manners of maintaining ancestry information may be used by database system 302.
  • Ancestry map 341 is shown as empty for the purposes of this example because no entries therein indicate that all ancestors of each SSTable 501-503 are accounted for therein (SSTables also may not be compactions and, therefore, may not have any ancestors).
  • scenario 500 may be an initial backup of SSTables 332 and, therefore, ancestry map 341 may actually be empty.
  • backup system 301 determines that SSTables 501-503 should be copied to SSTables 331 in data repository 321 and adds entries for SSTables 501- 503 to ancestry map 341 at step 3.
  • the entries indicate that SSTables 501-503 have been copied to data repository 321 and are, therefore, included in SSTables 331.
  • Backup system 301 also copies SSTables 501-503 to data repository 321 at step 4.
  • steps 1 -4 an incremental backup of SSTables 332, as they were at time Tl , is complete. It should be understood that adding entries to ancestry map 341 (step 3) and copying SSTables to data repository 321 (step 4) may occur in any order or at substantially the same time.
  • FIG. 6 illustrates scenario 600 for implementation 300 to efficiently back up a compaction based database system.
  • Scenario 600 is a continuation of scenario 500, Specifically, at time T2, which is after time Tl, SSTables 332 include SSTables 501 -503 and SSTable 601.
  • backup system 301 creates a snapshot of SSTables 332 at time T2. The snapshot includes SSTables 501-503 and SSTable 601 .
  • Backup system 301 references ancestry map 341 at step 2 to determine whether any of SSTables 501-503 and SSTable 601 should be copied to data repository 321.
  • ancestor information 511-513 for SSTables 501-503 may not even be used during step 2 of scenario 600. Rather, backup system 301 references ancestry map 341 to determine that SSTables 501-503 have already been copied without needing to consider their ancestry. However, ancestor information 61 1 of SSTable 601 is used by backup system 301 to determine whether SSTable 601 should be copied. In this case, SSTable 601 is not a compaction of other SSTables and ancestor information 61 1 indicates that fact. Therefore, backup system 301 determines that SSTable 601 should be copied to data repository 321.
  • backup system 301 adds an entry to ancestry map 341 for SSTable 601 at step 3.
  • the entry indicates that SSTable 601 has been copied to data repository 321.
  • backup system 301 copies SSTable 601 to data repository 321 at step 4 to join SSTables 501-503 in SSTables 331, which were copied previously.
  • steps 1-4 an incremental backup of SSTables 332, as they were at time T2, is complete. It should be understood that adding entries to ancestry map 341 (step 3) and copying SSTables to data repository 321 (step 4) may occur in any order or at substantially the same time.
  • FIG. 7 illustrates scenario 700 for implementation 300 to efficiently back up a compaction based database system.
  • Scenario 700 is a continuation of scenario 600.
  • database system 302 compacts SSTables 501 -503 at step into SSTable 701.
  • Database system 302 may use any method of data compaction that results in the still valid data entries of SSTables 501-503 being replicated in SSTable 701 while not replicating those entries that are no longer valid.
  • the newly created SSTable 701 is stored in data repository 322 at step 2 along with corresponding ancestor information 711, which indicates that SSTables 501-503 are the ancestors of SSTable 701 (i.e., the SSTables that were compacted to create SSTable 701).
  • SSTables 332 now include SSTable 601 and SSTable 701 in place of SSTables 501 -503 ,
  • FIG. 8 illustrates scenario 800 for implementation 300 to efficiently back up a compaction based database system.
  • Scenario 800 is a continuation of scenario 700.
  • SSTables 332 include SSTable 701 and SSTable 601.
  • backup system 301 creates a snapshot of SSTables 332 at time T4.
  • the snapshot includes SSTable 701 and SSTable 601.
  • Backup system 301 references ancestry map 341 at step 2 to determine whether any of SSTable 701 and SSTable 601 should be copied to data repository 321 .
  • ancestor information 611 for SSTable 601 may not even be used during step 2 of scenario 800. Rather, backup system 301 references ancestry map 341 to determine that SSTable 601 has already been copied without needing to consider SSTable 601 's ancestry.
  • ancestor information 71 1 of SSTable 701 is used by backup system 301 to determine whether SSTable 701 should be copied.
  • SSTable 701 is a compaction of SSTables 501-503, as described above, and ancestor information 711 indicates that fact to backup system 301 . Therefore, backup system 301 determines whether all of SSTables 501-503 have entries in ancestn,' map 341 . Since ancestry map 341 does include entries for SSTables 501-503, backup system 301 determines that SSTable 701 does not need to be copied to data repository 321 and, therefore, does not copy SSTable 701 for inclusion in SSTables 331. Instead, backup system 301 adds an entry at step 3 for SSTable 701 to ancestry map 341.
  • SSTables 501-503 are the ancestors of SSTable 701 so that SSTable 701 does not need to be copied at step 4.
  • steps 1-4 an incremental backup of SSTables 332, as they were at time T4, is complete even though SSTable 701 was not copied to data repository 321. Rather, SSTable 701 's entry in ancestry map 341 provides enough information to replicate SSTable 701 from its ancestors.
  • the ancestors of SSTable 701 i.e., SSTables 501-503 ail have entries in ancestry map 341 that indicate that the ancestors themselves were copied to data repository 321.
  • one or more of the entries may indicate further ancestors. For instance, if SSTable 701 is compacted with one or more other SSTables at some later time, then the SSTable created by that compaction, when backed up, would refer to SSTable 701 's entry in ancestry map 341 .
  • backup system 301 may trace such ancestry information all the way back to SSTables that were actually stored in data repository 321, the process for adding entries to ancestry map 341 (i.e., scenario 400) allows backup system 301 to assume that ancestors actually copied to data repository 321 will eventually be found if traced back through ancestry entries. Such an assumption prevents backup system 301 from having to perform such an ancestn,' trace until an SSTable needs to be accessed (e.g., for restoration to database system 302),
  • Figure 9 illustrates scenario 900 for implementation 300 to efficiently back up a compaction based database system.
  • scenario 900 provides an example of how a non-copied SSTabie may be restored to SSTables 332 of data repository 322.
  • Scenario 900 occurs at a time T5, which is after time T4, and data repository 322 is being restored to its state at time T4.
  • the state of data repository 322 at time T4 comprised SSTables 332 with SSTabie 701 and SSTabie 601.
  • backup system 301 like most incremental backup systems, maintains information about the state of data repository 322 at each incremental backup so that the data repository 322 can be restored to any one of the incremental states.
  • backup system 301 uses that information to determine that SSTabie 701 and SSTabie 601 should be restored to data repository 322.
  • Backup system 301 references ancestry map 341 at step 1 to determine whether
  • SSTabie 701 and SSTabie 601 are stored in SSTables 33 1 of data repository 321.
  • Data repository 321 indicates that SSTabie 601 is stored in data repository 321 but that SSTabie 701 is not.
  • ancestry map 341 indicates that SSTabie 701 is a compaction of ancestor SSTables 501-503.
  • Backup system 301 therefore references entries in ancestry map 341 for SSTables 501-503, which indicate that that SSTables 501-503 are stored in data repository 321.
  • backup system 301 compacts SSTables 501- 503 to create SSTabie 701 at step 2 in a manner similar to that used by database system 302 to create SSTabie 701 initially.
  • the reproduced SSTabie 701 is then restored at step 3, along with SSTabie 601 , to data repositon,' 321.
  • SSTables 332 include the same SSTables that were included in SSTables 332 at time T4.
  • backup system 301 performs the compaction to reproduce SSTabie 701 in the above example
  • other examples may restore SSTabie 701 in different manners.
  • backup system 301 may provide the ancestor SSTables (SSTables 501-503 in the above example) to database system 302 and database system 302 may compact the ancestor SSTables itself (or otherwise handle the ancestor SSTables as is sees fit).
  • FIG. 10 illustrates scenario 1000 for implementation 300 to efficiently back up a compaction based database system.
  • backup system 301 uses timestamps for data entries within SSTables rather than ancestor information to determine whether an SSTabie should be stored to data repository 321.
  • SSTables 1001-1003 comprise SSTables already stored within SSTables 331 of data repository 321.
  • SSTables 1001-1003 may have been stored in data repository 321 in response to previous iterations of scenario 1000.
  • Scenario 1000 provides backup system 301 determining maximum timestamp of the data entries within SSTables 1001-1003 at step 1. The actual content of each data entry is not relevant to any of the determinations in scenario 1000.
  • backup system 301 determines the maximum timestamp of each SSTabie 1001-1003 when storing each to data repository 321 and stores those maximum timestamps for later reference, although the maximum timestamps may be determined at some other time.
  • the maximum timestamps of SSTables 1001 -1003 are T3, T5, and T6, respectively. It should be understood that timestamps T with higher numbers represent later timestamps (i.e., more recent entries) than those with lower numbers. Accordingly, backup system 301 determines that the maximum time stamp of SSTables 1001-1003, which are the SSTables already stored in data repository 321 , is T6.
  • backup system 301 captures SSTable 1 104 in a snapshot of SSTables 332, Database system 301 determines the maximum timestamp of data entries within SSTable 1004 at step 2.
  • the maximum timestamp in SSTable 1004 is T6, which is then compared at step 3 to the maximum timestamp of the other SSTables (i.e., SSTables 1001-1003 in this example) already stored in data repository 321. Specifically, it is determined whether T6 is less than or equal to the maximum of already stored timestamps. In this case, T6 from SSTable 1004 is less than or equal to T6 from the previously stored SSTables 1001-1003.
  • SSTable 1004 includes no entries with timestamps later than T6 and data repository 321 already stores data entries up to and including T6, backup system 301 assumes that SSTable 1004 does not include data entries not already included in SSTables 331 of data repository 321 and refrains from storing SSTable 1004 at step 4.
  • T7 is determined to be greater than the previously stored maximum of T6, Accordingly, SSTable 1004 in that case includes at least one data entry- newer than any of the data entries already stored in SSTables 331 of data repository 321 and SSTable 1004 will need to be stored to data repository 321 by backup system 301 to backup any newer entries therein.
  • Index system 1100 is an example of index system 101, although system 101 may use alternative configurations.
  • Index system 1100 comprises communication interface 1 101, user interface 1 102, and processing system 1103.
  • Processing system 1 103 is linked to communication interface 1101 and user interface 1 102.
  • Processing system 1103 includes processing circuitry 1105 and memory device 1 106 that stores operating software 1 107,
  • Communication interface 1101 comprises components that communicate over communication links, such as network cards, ports, RF transceivers, processing circuitry and software, or some other communication devices.
  • Communication interface 1 101 may be configured to communicate over metallic, wireless, or optical links.
  • Communication interface 1 101 may be configured to use TDM, IP, Ethernet, optical networking, wireless protocols, communication signaling, or some other communication format - including combinations thereof.
  • User interface 1 102 comprises components that interact with a user.
  • User interface 1 102 may include a keyboard, display screen, mouse, touch pad, or some other user input/output apparatus.
  • User interface 1102 may be omitted in some examples.
  • Processing circuitry 1 105 comprises microprocessor and other circuitry that retrieves and executes operating software 107 from memory device 106.
  • Memory device 1106 comprises a non-transitory storage medium, such as a disk drive, flash drive, data storage circuitry, or some other memory apparatus.
  • Operating software 1107 comprises computer programs, firmware, or some other form of machine-readable processing instructions.
  • Operating software 1 107 includes data copying module 1 108 and ancestry module 1 109. Operating software 1107 may further include an operating system, utilities, drivers, network interfaces, applications, or some other type of software.
  • operating software 107 directs processing system 1103 to operate index system 1 100 as described herein.
  • data copying module 1108 directs processing system 1103 to back up two or more files from a first database system and, after backing up the two or more files, identify a subsequent file from the first database system for backup.
  • Ancestry module 1 109 directs processing system 1103 to determine that the subsequent file comprises a compaction of the two or more files and, responsively, refrain to back up the subsequent file.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments disclosed herein provide systems, methods, and computer readable media for sub-cluster recovery in a data storage environment having a plurality of storage nodes. In a particular embodiment, the method provides scanning data items in the plurality of nodes. While scanning, the method further provides indexing the data items into an index of a plurality of partition groups. Each partition group includes data items owned by a particular one of the plurality of storage nodes. The method then provides storing the index.

Description

EFFICIENT BACKUP OF COMPACTION BASED DATABASES
TECHNICAL FIELD
[0001] Many modern database systems are based on architectures that use Sorted
String Tables (SSTables) to store data entries therein. Examples of such systems include Cassandra, LevelDB, RocksDB, and BigTable database systems. SSTables were
implemented, in part, to overcome relatively slow writes allowed by B-Tree based database systems. An SSTable is an immutable data file that is sorted by database keys. As such, any edits or writes that need to be made to a particular SSTable instead creates a new SSTable with those writes or edits reflected in the new SSTable.
[0002] Since a new SSTable is created every time an SSTable is changed, much of the data in a new SSTable is also contained in a previously created SSTable. Most, if not all, databases that use SSTables, therefore, perform compactions to conserve storage space for newly created SSTables. Compaction combines multiple SSTables into a single new
SSTable by only copying still valid data entries into the new SSTable and then deletes the source SSTables. Thus, after compaction, the storage space previously taken up by no-longer valid data entries, and duplicates of valid data entries, can now be used for other purposes (i.e., storing additional SSTables).
[0003] Typically, when incrementally backing up data of any kind, only changes from a previous back up are copied since unchanged data will have already been copied in that- previous backup. That procedure typically does not change when backing up a compaction based database. If new SSTables exists since a previous incremental backup was made, only those new SSTables are copied in the current backup. However, one or more of the new SSTables may be compactions of previously backed up SSTables. As such, the compacted SSTables, while seemingly new to the backup system, include data entries already backed up in previous SSTables. Hence, while copying a compacted SSTable does not store a duplicate file, copying a compacted SSTable does store duplicate data entries, which constitutes an inefficient use of storage space.
OVERVIEW
[0004] Embodiments disclosed herein provide systems, methods, and computer readable media for efficiently performing incremental backups of compaction based databases. In a particular embodiment, the method includes backing up two or more files from a first database system and, after backing up the two or more files, identifying a subsequent file from the first database system for backup. The method further provides determining that the subsequent file comprises a compaction of the two or more files and, responsively, refraining to back up the subsequent file.
[0005] In some embodiments, backing up the two or more files comprises at a first time, backing up one or more first files and, after the first time, identifying a second file for backup. Those embodiments further include determining that the second file does not comprise a compaction of the one or more first files and, responsively, backing up the second file.
[0006] In some embodiments, the subsequent file comprises a compaction of the two or more files when the subsequent file contains only information included in the two or more files.
[0007] In some embodiments, determining that the subsequent file comprises a compaction of the two or more files comprises accessing ancestor information for the subsequent file and determining that the ancestor information indicates that the subsequent file was created by compacting the two or more files. In those embodiments, when backing up the two or more files, the method may further include adding ancestry information to an ancestry map and, after determining that the subsequent file was created by compacting the two or more files, using the ancestry information to determine that the two or more files have already been backed up. Also in those embodiments, the method may include adding subsequent ancestr' information about the subsequent file to the ancestry map. The subsequent ancestry information indicates that the two or more files are ancestors of the subsequent file,
[0008] In some embodiments, determining that the subsequent file comprises a compaction of the two or more files comprises determining a first maximum timestamp for data entries in the subsequent file and determining that the first maximum timestamp is not greater than a maximum timestamp for data entries in the two or more files.
[0009] In some embodiments, the method includes identifying the subsequent file for restoration to the first database system, compacting the two or more files to recreate the subsequent file, and restoring the subsequent file to the first database system.
[0010] In some embodiments, the method includes identifying the subsequent file for restoration to the first database system, determining that the two or more files are ancestors of the subsequent file, and restoring the two or more files to the first database system.
[0011] In some embodiments, the two or more files and the subsequent file comprise
Sorted String Tables (SSTables). [0012] In another embodiment, a system is provided having one or more computer readable storage media and a processing system operatively coupled with the one or more computer readable storage media. Program instructions stored on the one or more computer readable storage media, when read and executed by the processing system, direct the processing system to back up two or more files from a first database system and, after backing up the two or more files, identify a subsequent file from the first database system for backup. The program instructions further direct the processing system to determine that the subsequent file comprises a compaction of the two or more files and, responsively, refrain to back up the subsequent file.
[0013] In yet another embodiment, a computer readable storage medium is provided having program instructions stored thereon. The program instructions, when executed by a processing system, direct the processing system to back up two or more files from a first database system and, after backing up the two or more files, identify a subsequent file from the first database system for backup. The program instructions further direct the processing system to determine that the subsequent file comprises a compaction of the two or more files and, responsively, refrain to back up the subsequent file.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] Figure 1 illustrates an implementation for efficiently backing up a compaction based database system,
[0015] Figure 2 illustrates a scenario for the implementation to efficiently back up a compaction based database system.
[0016] Figure 3 illustrates another implementation for efficiently backing up a compaction based database system.
[0017] Figure 4 illustrates a scenario for the other implementation to efficiently back up a compaction based database system.
[0018] Figure 5 illustrates another scenario for the other implementation to efficiently back up a compaction based database system.
[0019] Figure 6 illustrates another scenario for the other implementation to efficiently back up a compaction based database system.
[0020] Figure 7 illustrates another scenario for the other implementation to efficiently back up a compaction based database system.
[0021] Figure 8 illustrates another scenario for the other implementation to efficiently back up a compaction based database system. [0022] Figure 9 illustrates another scenario for the other implementation to efficiently back up a compaction based database system.
[0023] Figure 10 illustrates another scenario for the other implementation to efficiently back up a compaction based database system.
[0024] Figure 1 1 illustrates a computing architecture for efficiently backing up a compaction based database system.
DETAILED DESCRIPTION
[0025] The incremental backup technology described herein mitigates the storage space inefficiencies caused when incrementally backing up a compaction based database system. In particular, when backing up a compaction based database, the examples herein determine whether a file (e.g., an SSTable) being backed up is a compaction of two or more files that were backed up in a previous incremental backup. If the file is a compaction, then the file is not copied since all the entries within the file are already stored in backups of other files (i.e., the files from which the compacted file was created). Instead, a record, called an ancestry map herein, is kept to indicate the files that were compacted to create the compacted file. Should the compacted file ever need to be recovered, the record is used to identify the files that were copied to retrieve the data entries in the compacted file.
[0026] Figure 1 illustrates implementation 100 for efficiently backing up a compaction based database system. Implementation 100 includes backup system 101, database system 102, data repository 121, and data repository 122. Backup system 101 and database system 102 communicate over communication link 111.
[0027] In operation, database system 102 hosts a database that stores data entries in immutable files that may be compacted to conserve storage space in data repository 122. Hence, database system 102 is a compaction based database system. Backup system 101 is tasked with incrementally backing up the database files stored in data repository 122. To that end, backup system 101 identifies any file changes that have occurred since the most recent backup and copies those file changes, if necessary, to data repository 121. Specifically, backup system 101 only copies file changes (i.e., new files in the case of the immutable database files described herein) that include data entries not already stored in data repository 121. Thus, if backup system 101 identifies a new file and that file is a compaction of files already stored in data repository 121, then backup system 101 refrains from copying that new file to data repository 121. If, however, the new file is not a compaction of other files or is a compaction of at least one file that was not already copied in a previous backup, then backup system 101 will copy the file as not all data entries in the new file have been copied to data repository 121.
[0028] Figure 2 illustrates a scenario 200 for implementation 100 to efficiently back up a compaction based database system. In scenario 200, backup system 101 backs up two or more files from database system 102 (201). Each of the two or more files comprises a file in data repository 122 that includes at least one data entry that was not previously copied in an ancestor of the file already stored in data repository 121 by backup system 101. The files may be SSTables, which is a common format for compaction based database systems, but may be some other type of immutable database file. The files may be backed up during the same incremental backup of database system 102, may be backed up during different incremental backups of database system 102, or may be backed up during an initial backup that includes all contents of data repository 122.
[0029] After backing up the two or more files, backup system 101 identifies a subsequent file from database system 102 to backup (202). The subsequent file may be identified during an incremental backup subsequent to one or more incremental backups that copied the two or more files to data repository 121. The subsequent file comprises a file that is new to data repository 122 since the previous incremental backup was performed. Rather than blindly copying the subsequent file to data repository 121, backup system 101 first determines that the subsequent file comprises a compaction of the two or more files (203). As such, the data entries in the subsequent file have already been stored in data repository 121 by virtue of the two or more files having already been stored in data repository 121 along with data entries of the two or more files that are no longer valid and were discarded during compaction. To determine that the subsequent file comprises a compaction of the two or more files, backup system 101 may reference ancestry information about the subsequent file that is maintained by database system 102. For instance, after identifying the subsequent file, backup system 101 may query database system 102 for the subsequent file's ancestry information that indicates whether the subsequent file is a compaction of other files and, if so, which files are the ancestors of the subsequent file. In this case, the ancestry information indicates that the two or more files backed up at step 201 are the ancestors of the subsequent file and backup system 101 can reference its own records to determine that the two or more files have already been copied to data repository 121.
[0030] Therefore, in response to determining that the subsequent file is a compaction of the two or more files, backup system 101 refrains from backing up the subsequent file (204). In some examples, backup system 101 further maintains a map of ancestry information that indicates that the subsequent file was not copied and indicates that the two or more files are the ancestors of the subsequent file. This map maintains a list of all files backed up to data repository 121 and indicates to backup system 101 whether a file itself has already been copied or data that makes up the file has already been copied previously in ancestors of the file. As such, backup system 101 may use the ancestry map when
performing step 203 above. In addition to using the ancestry map to determine whether backup system 101 should refrain from backing up a particular file due to that file's ancestors already being backed up, the ancestry map may also be used when accessing the file at a future time. That is, should the subsequent file ever need to be accessed (e.g., for restoration to data repository 122), backup system 101 can reference the ancestry map to identify the two or more files so that the data entries of the subsequent file can be reproduced. In some cases, backup system 101 may compact the two or more files itself to provide the subsequent data file for access or backup system 101 may provide the two or more files to database system 102 so that database system 102 can compact or otherwise handle the two or more files.
[0031] Advantageously, refraining from copying the subsequent file to data repository
121 conserves storage space in data repository 121 along with any other resources needed to copy the subsequent file to data repository 121, such as the processing resources used to process the subsequent file before the subsequent file is stored in data repository 121.
Backup system 101 is therefore more efficient while still managing to incrementally backup compacted database files.
[0032] Referring back to Figure 1, backup system 101 comprises a computer system and communication interface. Backup system 101 may also include other components such as a router, server, data storage system, and power supply. Backup system 101 may reside in a single device or may be distributed across multiple devices. Backup system 101 could be an application server(s), a personal workstation, or some other network capable computing system - including combinations thereof. While shown separately, all or portions of backup system 101 could be integrated with the components of database system 102.
[0033] Database system 102 comprises a computer system and communication interface. Database system 102 may also include other components such as a router, server, data storage system, and power supply. Database system 102 may reside in a single device or may be distributed across multiple devices, as is common with nodes of distributed databases. Database system 101 could be an application server(s), a personal workstation, or some other network capable computing system - including combinations thereof. While shown separately, all or portions of database system 102 could be integrated with the components of database system 102. Either or both of backup system 101 and database system 102 may be implemented as cloud based systems.
[0034] Data repositories 121 and 122 each comprise one or more data storage systems having one or more non-transitory storage medium, such as a disk drive, flash drive, magnetic tape, data storage circuitry, or some other memory apparatus. The data storage systems may also include other components such as processing circuitry, a network communication interface, a router, server, data storage system, and power supply. The data storage systems may reside in a single device or may be distributed across multiple devices. For instance, in the case of database system 102 being a distributed database, the storage media may be distributed across nodes of the distributed database. In some cases, data repositories 121 and 122 may be physically incorporated into backup systems 101 and database system 102, respectively. It should be understood that in no case is the storage media a propagated signal. Either or both of data repositories 1021 and 122 may be implemented as cloud based storage systems.
[0035] Communication link 111 could be internal system busses or use various communication protocols, such as Time Division Multiplex (TDM), Internet Protocol (IP), Ethernet, communication signaling, Code Division Multiple Access (CDMA), Evolution Data Only (EVDO), Worldwide Interoperability for Microwave Access (WIMAX), Global System for Mobile Communication (GSM), Long Term Evolution (LTE), Wireless Fidelity (WIFI), High Speed Packet Access (HSPA), or some other communication format - including combinations thereof. Communication link 11 1 could be a direct link or may include intermediate networks, systems, or devices.
[0036] Figure 3 illustrates implementation 300 for efficiently backing up a compaction based database system. Implementation 300 includes backup system 301, database system 302, and communication network 303. Backup system 301 includes data repository 321 storing SSTables 331 and ancestry map 341. While ancestry map 341 is shown outside of data repository 321, ancestry map 341 may be stored in data repository 321 along with SSTables 331. Database system 302 includes data repository 322 storing SSTables 332.
[0037] In operation, backup system 301 incrementally backs up SSTables 332 and stores any SSTables copied in each incremental backup within SSTables 331 . Backup system 301 identifies and retrieves new SSTables at each backup increment via communications exchanged over communication network 303, which may include one or more Local Area Networks (LANs), Wide Area Networks (WANs), the Internet, or some other type of communication network over which data may be transferred - including combinations thereof. Database system 302 is a compaction based database system due to its use of immutable SSTables to store data entries. Thus, backup system 301 uses ancestn,' map 341 to track SSTabie ancestry information so that SSTables with ancestors that have all previously been copied to data repository 321 are not copied again.
[0038] Figure 4 illustrates scenario 400 for implementation 300 to efficiently back up a compaction based database system. In scenario 400, backup system 301 creates a snapshot of SSTables 332 (401). At this point, the snapshot may be created in the same manner as a conventional incremental backup system would create a snapshot of SSTables in a database. However, while a conventional incremental backup system would simply copy and store SSTables that have changed since the previous backup snapshot, backup system 301 determines whether the data entries in the changed SSTables have already been copied to data repository 321. To that end, backup system 301 obtains ancestor information about the SSTables in the snapshot from database system 302 (402). The ancestor information is generated and maintained by database system 302. In some cases, an SSTabie that has no ancestors (i.e., is not a compaction of other SSTables) will not have ancestor information or the ancestor information for that SSTabie may indicate that no ancestors exist (e.g., have a NULL value). The ancestor information may be included as metadata with each SSTabie or may be maintained elsewhere by database system 302 either in data repository 322 or in some other storage component accessible by database system 302.
[0039] For each SSTabie of the snapshot, backup system 301 determines whether the
SSTabie is already stored in data repository 321 (403). Backup system 301 may reference the ancestry map 341 to determine whether the SSTabie has already been stored or may determine whether the SSTabie is already stored in some other manner. If the SSTabie has already been stored in data repository 321, then backup system 301 does not copy the SSTabie to data repository 321 because the SSTabie is not a change to SSTables 332 since the previous backup (404). In contrast, if the SSTabie has not been stored in data repository 321, then backup system 301 determines whether all ancestors of the SSTabie are already stored in data repository 321 (405). Backup system 301 makes the aforementioned determination of step 405 rather than simply copying the SSTabie to data repository 321.
[0040] To determine whether the ancestors of the SSTabie are already stored in data repository 321, backup system 301 references ancestry map 341 with the ancestry
information for the SSTabie. In particular, ancestry map 341 comprises a record of each SSTabie that has been backed up, but not necessarily copied to data repository 321, from database system 302. That is, ancestry map 341 indicates whether a particular SSTabie has already been copied to data repository 321 or, if the SSTable has not been copied, which SSTables of SSTables 331 are ancestors of the non-copied SSTable (i.e., the SSTables that were compacted to create the non-copied SSTable). Backup system 301 cross references the ancestor information of the SSTable currently being processed with the information in ancestry map 341 . If all ancestors of the SSTable, as indicated by the SSTable's ancestry information, have entries in ancestiy map 341, then backup system 301 determines that the SSTable does not need to be copied to data repository 321 since the data entries therein are already copied by virtue of their inclusion in the ancestor SSTables. Thus, rather than copying the SSTable, database system 302 adds another entry to ancestiy map 341 for the SSTable and lists the ancestors of the SSTable in that entry (406). If necessary, the ancestry of the SSTable can then be traced at a later time based on that entry in ancestiy map 341. Ancestry map 341 may include pointers to the entries within ancestry map 341 of the ancestors.
[0041] If one or more of the ancestors of the SSTable, as indicated by the SSTable's ancestry information, are not included in ancestry map 341 or, if the SSTable has no ancestors, then backup system 301 copies the SSTable to data repository 321 for inclusion in SSTables 331 (407). Additionally, backup system 301 adds an entry to ancestry map 341 for the SSTable (408). In this case, since the SSTable is copies to data repository 321, rather than including ancestiy information for the SSTable, backup system 301 indicates in the entry that the SSTable has been copied and is therefore stored in data repository 321. The entn,' for the SSTable in ancestry map 341 may include a pointer to the SSTable in data repository 321 , [0042] Processing each SSTable from the snapshot in the manner described above using steps 403-408 conserves storage space in data repository 321 by not copying (i.e., storing) any SSTable that can be reproduced using SSTables already stored in data repository 321 (i.e., that SSTable's ancestor).
[0043] In some examples, database system 302 may not maintain ancestor information and backup system 301 may instead rely on timestamps, instead of ancestor information provided by database system 302, to filter out (and not back up) SSTables compacted from ancestors already backed up. For each backup interval, backup system 301 notes the max timestamp among all of the SSTables already backed up. The max timestamp of a particular SSTable comprises the newest (which is also the highest/max) timestamp from all data entries inside the SSTable. For the backup interval, after creating a snapshot (step 401), the backup system compares the max timestamp of each SSTable with the max timestamp determined by backup system 301 during the previous backup interval . Since a compacted SSTable comprises only data entries also included in SSTables compacted to form the compacted SSTable, it logically follows that the compacted SSTable will not have a max timestamp greater than those ancestor SSTables. Thus, if the max timestamp of a SSTable from the snapshot is greater than the max timestamp of the previous interval, backup system 301 knows that the SSTable includes data entries not already backed up in other SSTables and copies and processes the SSTable from the snapshot. Othenvise, backup system 301 will refrain from backing up the SSTable from the snapshot. The above example will continue to work properly as long as the time used by database system 302 when creating timestamps does not drift too much one way or another.
[0044] Figure 5 illustrates scenario 500 for implementation 300 to efficiently back up a compaction based database system. Scenario 500 is an example of how scenario 400 may be applied to an example set of SSTables 332. In particular, SSTables 332 include SSTables 501-503 at time Tl . It should be understood that the three SSTables used in scenario 500 is merely exemplary and any number of SSTables may exist in SSTables 332, At step 1 , backup system 301 creates a snapshot of SSTables 332 at time Tl . For each of SSTables 501-503 in the snapshot, backup system 301 references ancestry map 341 at step 2 with respective ancestor information 511-513. In this example, the ancestor information for each SSTable 501-503 comprises metadata associated with, and received with, each SSTable 501- 503, although, other manners of maintaining ancestry information may be used by database system 302. Ancestry map 341 is shown as empty for the purposes of this example because no entries therein indicate that all ancestors of each SSTable 501-503 are accounted for therein (SSTables also may not be compactions and, therefore, may not have any ancestors). In some cases, scenario 500 may be an initial backup of SSTables 332 and, therefore, ancestry map 341 may actually be empty.
[0045] As a result of step 2, backup system 301 determines that SSTables 501-503 should be copied to SSTables 331 in data repository 321 and adds entries for SSTables 501- 503 to ancestry map 341 at step 3. The entries indicate that SSTables 501-503 have been copied to data repository 321 and are, therefore, included in SSTables 331. Backup system 301 also copies SSTables 501-503 to data repository 321 at step 4. Upon completion of steps 1 -4, an incremental backup of SSTables 332, as they were at time Tl , is complete. It should be understood that adding entries to ancestry map 341 (step 3) and copying SSTables to data repository 321 (step 4) may occur in any order or at substantially the same time.
[0046] Figure 6 illustrates scenario 600 for implementation 300 to efficiently back up a compaction based database system. Scenario 600 is a continuation of scenario 500, Specifically, at time T2, which is after time Tl, SSTables 332 include SSTables 501 -503 and SSTable 601. At step 1, backup system 301 creates a snapshot of SSTables 332 at time T2. The snapshot includes SSTables 501-503 and SSTable 601 . Backup system 301 references ancestry map 341 at step 2 to determine whether any of SSTables 501-503 and SSTable 601 should be copied to data repository 321. In view of step 403 of scenario 400, ancestor information 511-513 for SSTables 501-503 may not even be used during step 2 of scenario 600. Rather, backup system 301 references ancestry map 341 to determine that SSTables 501-503 have already been copied without needing to consider their ancestry. However, ancestor information 61 1 of SSTable 601 is used by backup system 301 to determine whether SSTable 601 should be copied. In this case, SSTable 601 is not a compaction of other SSTables and ancestor information 61 1 indicates that fact. Therefore, backup system 301 determines that SSTable 601 should be copied to data repository 321.
[0047] In response to the above determination, backup system 301 adds an entry to ancestry map 341 for SSTable 601 at step 3. The entry indicates that SSTable 601 has been copied to data repository 321. Accordingly, backup system 301 copies SSTable 601 to data repository 321 at step 4 to join SSTables 501-503 in SSTables 331, which were copied previously. Upon completion of steps 1-4, an incremental backup of SSTables 332, as they were at time T2, is complete. It should be understood that adding entries to ancestry map 341 (step 3) and copying SSTables to data repository 321 (step 4) may occur in any order or at substantially the same time.
[0048] Figure 7 illustrates scenario 700 for implementation 300 to efficiently back up a compaction based database system. Scenario 700 is a continuation of scenario 600. At time T3, which is after time T2, database system 302 compacts SSTables 501 -503 at step into SSTable 701. Database system 302 may use any method of data compaction that results in the still valid data entries of SSTables 501-503 being replicated in SSTable 701 while not replicating those entries that are no longer valid. The newly created SSTable 701 is stored in data repository 322 at step 2 along with corresponding ancestor information 711, which indicates that SSTables 501-503 are the ancestors of SSTable 701 (i.e., the SSTables that were compacted to create SSTable 701). Thus, after the compaction, SSTables 332 now include SSTable 601 and SSTable 701 in place of SSTables 501 -503 ,
[0049] Figure 8 illustrates scenario 800 for implementation 300 to efficiently back up a compaction based database system. Scenario 800 is a continuation of scenario 700.
Specifically, at time T4, which is after time T3, SSTables 332 include SSTable 701 and SSTable 601. At step 1 , backup system 301 creates a snapshot of SSTables 332 at time T4. The snapshot includes SSTable 701 and SSTable 601. Backup system 301 references ancestry map 341 at step 2 to determine whether any of SSTable 701 and SSTable 601 should be copied to data repository 321 . In view of step 403 of scenario 400, ancestor information 611 for SSTable 601 may not even be used during step 2 of scenario 800. Rather, backup system 301 references ancestry map 341 to determine that SSTable 601 has already been copied without needing to consider SSTable 601 's ancestry. However, ancestor information 71 1 of SSTable 701 is used by backup system 301 to determine whether SSTable 701 should be copied.
[0050] In this case, SSTable 701 is a compaction of SSTables 501-503, as described above, and ancestor information 711 indicates that fact to backup system 301 . Therefore, backup system 301 determines whether all of SSTables 501-503 have entries in ancestn,' map 341 . Since ancestry map 341 does include entries for SSTables 501-503, backup system 301 determines that SSTable 701 does not need to be copied to data repository 321 and, therefore, does not copy SSTable 701 for inclusion in SSTables 331. Instead, backup system 301 adds an entry at step 3 for SSTable 701 to ancestry map 341. The entry indicates that SSTables 501-503 are the ancestors of SSTable 701 so that SSTable 701 does not need to be copied at step 4. Upon completion of steps 1-4, an incremental backup of SSTables 332, as they were at time T4, is complete even though SSTable 701 was not copied to data repository 321. Rather, SSTable 701 's entry in ancestry map 341 provides enough information to replicate SSTable 701 from its ancestors.
[0051] In scenario 800, the ancestors of SSTable 701 (i.e., SSTables 501-503) ail have entries in ancestry map 341 that indicate that the ancestors themselves were copied to data repository 321. However, in other examples, one or more of the entries may indicate further ancestors. For instance, if SSTable 701 is compacted with one or more other SSTables at some later time, then the SSTable created by that compaction, when backed up, would refer to SSTable 701 's entry in ancestry map 341 . While backup system 301 may trace such ancestry information all the way back to SSTables that were actually stored in data repository 321, the process for adding entries to ancestry map 341 (i.e., scenario 400) allows backup system 301 to assume that ancestors actually copied to data repository 321 will eventually be found if traced back through ancestry entries. Such an assumption prevents backup system 301 from having to perform such an ancestn,' trace until an SSTable needs to be accessed (e.g., for restoration to database system 302),
[0052] Figure 9 illustrates scenario 900 for implementation 300 to efficiently back up a compaction based database system. Specifically, scenario 900 provides an example of how a non-copied SSTabie may be restored to SSTables 332 of data repository 322. Scenario 900 occurs at a time T5, which is after time T4, and data repository 322 is being restored to its state at time T4. As shown in Figure 8, the state of data repository 322 at time T4 comprised SSTables 332 with SSTabie 701 and SSTabie 601. While not described above, backup system 301, like most incremental backup systems, maintains information about the state of data repository 322 at each incremental backup so that the data repository 322 can be restored to any one of the incremental states. Thus, backup system 301 uses that information to determine that SSTabie 701 and SSTabie 601 should be restored to data repository 322.
[0053] Backup system 301 references ancestry map 341 at step 1 to determine whether
SSTabie 701 and SSTabie 601 are stored in SSTables 33 1 of data repository 321. Data repository 321 indicates that SSTabie 601 is stored in data repository 321 but that SSTabie 701 is not. However, ancestry map 341 indicates that SSTabie 701 is a compaction of ancestor SSTables 501-503. Backup system 301 therefore references entries in ancestry map 341 for SSTables 501-503, which indicate that that SSTables 501-503 are stored in data repository 321. Thus, to replicate SSTabie 701, backup system 301 compacts SSTables 501- 503 to create SSTabie 701 at step 2 in a manner similar to that used by database system 302 to create SSTabie 701 initially. The reproduced SSTabie 701 is then restored at step 3, along with SSTabie 601 , to data repositon,' 321. After restoration, SSTables 332 include the same SSTables that were included in SSTables 332 at time T4.
[0054] While backup system 301 performs the compaction to reproduce SSTabie 701 in the above example, other examples may restore SSTabie 701 in different manners. For instance, backup system 301 may provide the ancestor SSTables (SSTables 501-503 in the above example) to database system 302 and database system 302 may compact the ancestor SSTables itself (or otherwise handle the ancestor SSTables as is sees fit).
[0055] Figure 10 illustrates scenario 1000 for implementation 300 to efficiently back up a compaction based database system. In scenario 1000, backup system 301 uses timestamps for data entries within SSTables rather than ancestor information to determine whether an SSTabie should be stored to data repository 321. SSTables 1001-1003 comprise SSTables already stored within SSTables 331 of data repository 321. SSTables 1001-1003 may have been stored in data repository 321 in response to previous iterations of scenario 1000. Scenario 1000 provides backup system 301 determining maximum timestamp of the data entries within SSTables 1001-1003 at step 1. The actual content of each data entry is not relevant to any of the determinations in scenario 1000. Preferably, backup system 301 determines the maximum timestamp of each SSTabie 1001-1003 when storing each to data repository 321 and stores those maximum timestamps for later reference, although the maximum timestamps may be determined at some other time. Regardless, the maximum timestamps of SSTables 1001 -1003 are T3, T5, and T6, respectively. It should be understood that timestamps T with higher numbers represent later timestamps (i.e., more recent entries) than those with lower numbers. Accordingly, backup system 301 determines that the maximum time stamp of SSTables 1001-1003, which are the SSTables already stored in data repository 321 , is T6.
[0056] At a time after SSTables 1001-1003 have been stored to data repository 321, backup system 301 captures SSTable 1 104 in a snapshot of SSTables 332, Database system 301 determines the maximum timestamp of data entries within SSTable 1004 at step 2. The maximum timestamp in SSTable 1004 is T6, which is then compared at step 3 to the maximum timestamp of the other SSTables (i.e., SSTables 1001-1003 in this example) already stored in data repository 321. Specifically, it is determined whether T6 is less than or equal to the maximum of already stored timestamps. In this case, T6 from SSTable 1004 is less than or equal to T6 from the previously stored SSTables 1001-1003. Since SSTable 1004 includes no entries with timestamps later than T6 and data repository 321 already stores data entries up to and including T6, backup system 301 assumes that SSTable 1004 does not include data entries not already included in SSTables 331 of data repository 321 and refrains from storing SSTable 1004 at step 4. In an alternative example, if SSTable 1004 had a maximum timestamp of T7, then T7 is determined to be greater than the previously stored maximum of T6, Accordingly, SSTable 1004 in that case includes at least one data entry- newer than any of the data entries already stored in SSTables 331 of data repository 321 and SSTable 1004 will need to be stored to data repository 321 by backup system 301 to backup any newer entries therein.
[0057] Figure 11 illustrates data backup system 1 100. Index system 1100 is an example of index system 101, although system 101 may use alternative configurations. Index system 1100 comprises communication interface 1 101, user interface 1 102, and processing system 1103. Processing system 1 103 is linked to communication interface 1101 and user interface 1 102. Processing system 1103 includes processing circuitry 1105 and memory device 1 106 that stores operating software 1 107,
[0058] Communication interface 1101 comprises components that communicate over communication links, such as network cards, ports, RF transceivers, processing circuitry and software, or some other communication devices. Communication interface 1 101 may be configured to communicate over metallic, wireless, or optical links. Communication interface 1 101 may be configured to use TDM, IP, Ethernet, optical networking, wireless protocols, communication signaling, or some other communication format - including combinations thereof.
[0059] User interface 1 102 comprises components that interact with a user. User interface 1 102 may include a keyboard, display screen, mouse, touch pad, or some other user input/output apparatus. User interface 1102 may be omitted in some examples.
[0060] Processing circuitry 1 105 comprises microprocessor and other circuitry that retrieves and executes operating software 107 from memory device 106. Memory device 1106 comprises a non-transitory storage medium, such as a disk drive, flash drive, data storage circuitry, or some other memory apparatus. Operating software 1107 comprises computer programs, firmware, or some other form of machine-readable processing instructions. Operating software 1 107 includes data copying module 1 108 and ancestry module 1 109. Operating software 1107 may further include an operating system, utilities, drivers, network interfaces, applications, or some other type of software. When executed by circuitry 1 105, operating software 107 directs processing system 1103 to operate index system 1 100 as described herein.
[0061] In particular, data copying module 1108 directs processing system 1103 to back up two or more files from a first database system and, after backing up the two or more files, identify a subsequent file from the first database system for backup. Ancestry module 1 109 directs processing system 1103 to determine that the subsequent file comprises a compaction of the two or more files and, responsively, refrain to back up the subsequent file.
[0062] The above description and associated figures teach the best mode of the invention. The following claims specify the scope of the invention. Note that some aspects of the best mode may not fall within the scope of the invention as specified by the claims. Those skilled in the art will appreciate that the features described above can be combined in various ways to form multiple variations of the invention. As a result, the invention is not limited to the specific embodiments described above, but only by the following claims and their equivalents.

Claims

What is claimed is:
1. A method of incrementally hacking up compaction based database systems, the method comprising:
backing up two or more files from a first database system;
after backing up the two or more files, identifying a subsequent file from the first database system for backup;
determining that the subsequent file comprises a compaction of the two or more files and, responsively, refraining to back up the subsequent file.
2. The method of claim 1, wherein backing up the two or more files comprises:
at a first time, backing up one or more first files;
after the first time, identifying a second file for backup; and
determining that the second file does not comprise a compaction of the one or more first files and, responsively, backing up the second file.
3. The method of claim 1, wherein the subsequent file comprises a compaction of the two or more files when the subsequent file contains only information included in the two or more files.
4, The method of claim 1, wherein determining that the subsequent file comprises a compaction of the two or more files comprises:
accessing ancestor information for the subsequent file; and
determining that the ancestor information indicates that the subsequent file was created by compacting the two or more files,
5. The method of claim 4, further comprising:
when backing up the two or more files, adding ancestry information to an ancestry map; and
after determining that the subsequent file was created by compacting the two or more files, using the ancestry information to determine that the two or more files have already been backed up.
6. The method of claim 5, further comprising:
adding subsequent ancestry information about the subsequent file to the ancestry map, wherein the subsequent ancestry information indicates that the two or more files are ancestors of the subsequent file.
7. The method of claim 1, wherein determining that the subsequent file comprises a compaction of the two or more files comprises:
determining a first maximum timestamp for data entries in the subsequent file; and determining that the first maximum timestamp is not greater than a maximum timestamp for data entries in the two or more files.
8. The method of claim 1 , further comprising:
identifying the subsequent file for restoration to the first database system;
compacting the two or more files to recreate the subsequent file, and
restoring the subsequent file to the first database system.
9. The method of claim 1, further comprising:
identifying the subsequent file for restoration to the first database system;
determining that the two or more files are ancestors of the subsequent file; and restoring the two or more files to the first database system.
10. A system for incrementally backing up compaction based database systems, the system comprising:
one or more computer readable storage media;
a processing system operatively coupled with the one or more computer readable storage media; and
program instructions stored on the one or more computer readable storage media that, when read and executed by the processing system, direct the processing system to:
back up two or more files from a first database system;
after backing up the two or more files, identify a subsequent file from the first database system for backup;
determine that the subsequent file comprises a compaction of the two or more files and, responsively, refrain to back up the subsequent file.
1. The system of claim 10, wherein to back up the two or more files, the program instructions direct the processing system to:
at a first time, back up one or more first files;
after the first time, identify a second file for backup; and
determine that the second file does not comprise a compaction of the one or more first files and, responsively, backing up the second file.
12. The system of claim 10, wherein the subsequent file comprises a compaction of the two or more files when the subsequent file contains only information included in the two or more files.
13. The system of claim 10, wherein to determine that the subsequent file comprises a compaction of the two or more files, the program instructions direct the processing system to: access ancestor information for the subsequent file; and
determine that the ancestor information indicates that the subsequent file was created by compacting the two or more files.
14. The system of claim 13, wherein the program instructions further direct the processing system to:
when backing up the two or more files, add ancestry information to an ancestry map, and
after determining that the subsequent file was created by compacting the two or more files, use the ancestry information to determine that the two or more files have already been backed up. 5. The system of claim 14, wherein the program instaictions further direct the processing system to:
add subsequent ancestry information about the subsequent file to the ancestry map, wherein the subsequent ancestry information indicates that the two or more files are ancestors of the subsequent file.
16. The system of claim 10, wherein to determine that the subsequent file comprises a compaction of the two or more files, the program instructions direct the processing system to: determine a first maximum timestamp for data entries in the subsequent file, and determine that the maximum max timestamp is not greater than a maximum timestamp for data entries in the two or more files.
17. The system of claim 10, wherein the program instructions further direct the processing system to:
identify the subsequent file for restoration to the first database system;
compact the two or more files to recreate the subsequent file; and
restore the subsequent file to the first database system.
18. The system of claim 10, wherein the program instructions further direct the processing system to:
identify the subsequent file for restoration to the first database system;
determine that the two or more files are ancestors of the subsequent file; and restore the two or more files to the first database system.
19. A computer readable storage medium having program instructions stored thereon for incrementally backing up compaction based database systems, the program instructions, when executed by a processing system, direct the processing system to:
back up two or more files from a first database system;
after backing up the two or more files, identify a subsequent file from the first database system for backup; and
determine that the subsequent file comprises a compaction of the two or more files and, responsively, refrain to back up the subsequent file.
20. The system of claim 19, wherein to back up the two or more files, the program
instructions direct the processing system to:
at a first time, back up one or more first files;
after the first time, identify a second file for backup; and
determine that the second file does not comprise a compaction of the one or more first files and, responsively, backing up the second file.
EP18824074.1A 2017-06-29 2018-06-27 Efficient backup of compaction based databases Withdrawn EP3646198A4 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15/637,374 US10877934B2 (en) 2017-06-29 2017-06-29 Efficient backup of compaction based databases
PCT/US2018/039866 WO2019006036A1 (en) 2017-06-29 2018-06-27 Efficient backup of compaction based databases

Publications (2)

Publication Number Publication Date
EP3646198A1 true EP3646198A1 (en) 2020-05-06
EP3646198A4 EP3646198A4 (en) 2020-07-01

Family

ID=64734820

Family Applications (1)

Application Number Title Priority Date Filing Date
EP18824074.1A Withdrawn EP3646198A4 (en) 2017-06-29 2018-06-27 Efficient backup of compaction based databases

Country Status (3)

Country Link
US (1) US10877934B2 (en)
EP (1) EP3646198A4 (en)
WO (1) WO2019006036A1 (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8412909B2 (en) * 2009-04-08 2013-04-02 Samsung Electronics Co., Ltd. Defining and changing spare space and user space in a storage apparatus
US8321644B2 (en) 2009-10-01 2012-11-27 Hewlett-Packard Development Company, L.P. Backing up filesystems to a storage device
WO2015066081A1 (en) 2013-10-28 2015-05-07 Ramnarayanan Jagannathan Compacting data history files
US20150339314A1 (en) * 2014-05-25 2015-11-26 Brian James Collins Compaction mechanism for file system
US10303667B2 (en) 2015-01-26 2019-05-28 Rubrik, Inc. Infinite versioning by automatic coalescing
US11036590B2 (en) * 2017-03-02 2021-06-15 Salesforce.Com, Inc. Reducing granularity of backup data over time

Also Published As

Publication number Publication date
US20190005059A1 (en) 2019-01-03
EP3646198A4 (en) 2020-07-01
US10877934B2 (en) 2020-12-29
WO2019006036A1 (en) 2019-01-03

Similar Documents

Publication Publication Date Title
US11740974B2 (en) Restoring a database using a fully hydrated backup
US10956364B2 (en) Efficient data synchronization for storage containers
US10635632B2 (en) Snapshot archive management
US10628378B2 (en) Replication of snapshots and clones
US7257690B1 (en) Log-structured temporal shadow store
US8849777B1 (en) File deletion detection in key value databases for virtual backups
US20040199521A1 (en) Method, system, and program for managing groups of objects when there are different group types
US11645161B2 (en) Catalog of files associated with snapshots
US10146637B1 (en) Intelligent snapshot rollbacks
US11340839B2 (en) Sub-cluster recovery using a partition group index
US20130275541A1 (en) Reparse point replication
US10628298B1 (en) Resumable garbage collection
US10409691B1 (en) Linking backup files based on data partitions
US9916324B2 (en) Updating key value databases for virtual backups
US20230325363A1 (en) Time series data layered storage systems and methods
US10089190B2 (en) Efficient file browsing using key value databases for virtual backups
US20160217043A1 (en) Backup to and Restore from an Offsite Backup Location
US10877934B2 (en) Efficient backup of compaction based databases
US9229951B1 (en) Key value databases for virtual backups
US10795588B1 (en) Check point recovery based on identifying used blocks for block-based backup files
EP3451141B1 (en) Snapshot archive management
US11645333B1 (en) Garbage collection integrated with physical file verification
US11650882B1 (en) Method and system for performing rename operations during differential based backups in storage devices

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20191230

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

A4 Supplementary search report drawn up and despatched

Effective date: 20200602

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 11/14 20060101AFI20200526BHEP

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20201231