WO2020108604A1 - 数据恢复方法、装置、服务器以及计算机可读存储介质 - Google Patents

数据恢复方法、装置、服务器以及计算机可读存储介质 Download PDF

Info

Publication number
WO2020108604A1
WO2020108604A1 PCT/CN2019/121916 CN2019121916W WO2020108604A1 WO 2020108604 A1 WO2020108604 A1 WO 2020108604A1 CN 2019121916 W CN2019121916 W CN 2019121916W WO 2020108604 A1 WO2020108604 A1 WO 2020108604A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
backup
recovery
file
package
Prior art date
Application number
PCT/CN2019/121916
Other languages
English (en)
French (fr)
Inventor
李海翔
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP19889862.9A priority Critical patent/EP3822793A4/en
Priority to JP2021506468A priority patent/JP7108782B2/ja
Publication of WO2020108604A1 publication Critical patent/WO2020108604A1/zh
Priority to US17/175,139 priority patent/US11531594B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1474Saving, restoring, recovering or retrying in transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/273Asynchronous replication or reconciliation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques

Definitions

  • This application relates to the field of database technology, and in particular, to a data recovery method, device, server, and computer-readable storage medium.
  • a data recovery method, device, server, and storage medium are provided.
  • a data recovery method is executed by a server.
  • the method includes:
  • the hybrid backup refers to a backup process including a physical backup and a logical backup
  • the backup type identifying the backup data package includes:
  • a data recovery device includes:
  • Identification module used to identify the backup type of the backup data package
  • the data recovery module is used to perform data recovery based on the data of the physical backup in the backup data package when the identified backup type is a hybrid backup.
  • the hybrid backup refers to a backup process including a physical backup and a logical backup;
  • the data recovery module is further configured to perform data recovery on the data logically backed up in the backup data package after data recovery based on the physical backup data is completed.
  • a server includes a processor and a memory, where at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement the operation performed by the data recovery method described above.
  • a computer-readable storage medium having at least one instruction stored in the storage medium, the at least one instruction being loaded and executed by a processor to implement the operations performed by the data recovery method described above.
  • FIG. 1 is a schematic diagram of an implementation environment of a data recovery method provided by an embodiment of the present application
  • FIG. 3 is a page comparison diagram provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a data recovery device provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the database involved in the embodiments of the present application stores multiple data tables, and each data table may be used to store data items, and the data items may have one or more versions.
  • the database may be any type of database based on MVCC (Multi-Version Concurrency Control). In the embodiment of the present application, the type of the database is not specifically limited.
  • the data in the above database is based on state attributes and can include three states: the current state, the transition state, and the historical state. These three states are collectively referred to as the "full state of the data", referred to as the full state data, the full state
  • the different state attributes in the data can be used to identify the state of the data in its life cycle trajectory.
  • the latest version of the data item is the data at the current stage.
  • the state of the data at the current stage is called the current state.
  • Transitional state It is not the latest version of the data item nor the historical state version. During the transition from the current state to the historical state, the data in the transitional state is called half-life data.
  • Historical state A state in the history of a data item whose value is the old value, not the current value.
  • the state of the data in the historical stage is called the historical state.
  • the data may only exist in the historical state and the current state.
  • the new value of the data after the transaction is submitted is in the current state.
  • the data generated by the transaction before the smallest transaction in the current active transaction list is in a historical state.
  • the value of the data before submission becomes the value of the historical state, that is, the old value of the data item is in the historical state.
  • the read version still has active transactions (non-latest related transactions) in use, and because the latest related transaction modifies the value of the data item, the latest value is already in a current state, and the read value is relative to the current state It is already in a historical state, so its data state is between the current state and the historical state, so it is called a transition state.
  • the A account balance of the User table changes from 10 yuan recharge to 20 yuan, and then consumes 15 yuan to 5 yuan.
  • the financial B institution reads data to check the transaction has been in progress, and recharge after A If 20 yuan becomes 25 yuan, then 25 yuan is the current state data, 5 yuan that B is reading is the transition state, and the remaining two values 20 and 10 are the states that existed in history and are historical state data.
  • the above data recovery method can be applied to a hardware environment including multiple servers 101 as shown in FIG. 1.
  • multiple servers 101 are connected through a network, or the multiple servers 101 may also be data isolated.
  • the foregoing networks include but are not limited to: wide area network, metropolitan area network, or local area network.
  • the data recovery method in this embodiment of the present application may be executed by any one server 101 or multiple servers 101.
  • a database system may be running on the server 101 to provide data services, such as data storage, data query, and so on.
  • the database system can work through the database engine running on the server.
  • FIG. 2 A flowchart of a data recovery method provided by an embodiment of the present application. Referring to FIG. 2, this embodiment specifically includes the following steps:
  • the backup type of the backup data package can be identified by the file name in the backup data package, and based on the backup type, the subsequent Backup type to perform the corresponding data recovery process.
  • the identification of the backup type may also be identified by the type identifier included in the backup data package, that is, the corresponding backup type is obtained according to the type identifier carried in the backup data package.
  • the type identifier may be included in a specified file of the backup data package, or may be carried in the package name of the backup data package, which is not specifically limited in this embodiment of the present application. It should be noted that the above type identifier may be a field used to indicate the type of backup, such as a number, a character string, and so on.
  • the data contained in the backup data package obtained by the backup process of different backup types is different, and the file name is also different.
  • the examples are as follows:
  • the backup data package obtained by physical backup only has current state data, and no history state data or transition state data.
  • the file name of the file in the backup data packet obtained by the physical backup will carry a field for representing the physical backup.
  • the file name may be: PhMeta_00000001, PhData_00000001, Log_00000001, etc.
  • the backup data package obtained by the logical backup may include full state data, that is, include current state data, historical state data, and transition state data.
  • the file name of the file in the backup data package obtained by the logical backup will carry fields for representing the logical backup.
  • the file name may be: LoMeta_00000001, LoData_00000001, HMeta_00000001, HData_00000001, etc.
  • Hybrid backup that is, the backup data package obtained by mixing logical backup and physical backup
  • can also include full state data that is, include current state data, historical state data, and transition state data.
  • the file name of the file in the backup data packet obtained by the hybrid backup will carry a field for representing the hybrid backup.
  • the file name may be: Meta_00000001, Data_00000001, Log_00000001, HMeta_00000001, HData_00000001, etc.
  • the identified backup type is a logical backup
  • data is restored by creating a table in the destination database.
  • the data in the backup data package can be the current state time point value, the historical state time point value and the full state time period value, which are obtained based on the backup process of different snapshots:
  • the backup data package is obtained based on the logical backup of the conventional transaction snapshot.
  • the recovery process may include: reading the meta information of the table file from the meta information file of the backup data package, based on the The meta-information of the table file creates a table in the destination library. If the destination inventory is in a table with the same name, the recovery fails, and then an insert operation (such as an INSERT operation) is performed to recover the data. It should be noted that this kind of data recovery only supports "running state logic recovery”.
  • the backup data package is obtained by performing a logical backup based on a snapshot of historical transactions.
  • the backup data package includes backup data in the historical state and the transition state.
  • the recovery process may include: reading the meta information of the table file from the meta information file of the backup data package; if there is no table with the same name in the destination database, the meta information based on the table file is in the destination database Create a table in and then perform an insert operation (such as an INSERT operation) for data recovery. If the destination inventory is in a table with the same name, the table is not created, only data recovery.
  • the restoration fails, and if the same table in the target database does not have the same data as the data to be restored Data is restored based on the data to be restored (for example, a forced recovery is performed).
  • a primary key index that is the same as the primary key index of the data to be restored, it means that there is the same
  • the restoration is based on the data to be restored. It should be noted that this kind of data recovery only supports "running state logic recovery”. Further, in some database systems with differentiated permissions, the data recovery may be limited to Super User (Super User, the user with the largest local permissions).
  • the backup data package is obtained by performing logical backup based on historical transaction snapshots and regular transaction snapshots.
  • the backup data package includes backup data in the current state, historical state, and transition state.
  • the recovery process may include the first and second types of recovery process described above, that is, the data recovery method combined with the regular transaction snapshot and the historical transaction snapshot During data recovery, in order to improve the recovery efficiency, it is not necessary to repeat the check of the table with the same name, only need to perform the check of the table with the same name once.
  • the above-mentioned data recovery method combined with the conventional transaction snapshot and the historical transaction snapshot corresponding to the data recovery method includes: reading the meta information of the table file from the meta information file of the backup data package; the meta information based on the table file is in the destination library Create a table; perform an insert operation on the table to restore the logically backed up data in the backup data package; or,
  • creating a table is an optional step.
  • you can specify whether to perform the table creation step through parameters. For example, in the RECOVERY command, specify “CREATE TABLE Y/N”, Y Means create, N means not create.
  • the above technical process can be used to restore separately.
  • the data backed up by the logical backup will also be included.
  • the recovery of such data is implicit logical data recovery, and the recovery principle is the same as above.
  • multi-threading can be performed in parallel. Specifically, multiple threads read the files in the backup data package in parallel, and the read is performed in parallel. Recovery command. For example, the data files LoData_00000001, LoData_00000002, etc. can be read in parallel, and the SQL commands in each data file can be executed in parallel.
  • the server may prohibit various consistency check operations of the transaction, but directly store the data in the data page of the history table.
  • the recovery command of the logical backup may be a SQL statement or a CLI format command.
  • the following command can be used to restore data from ‘/usr/bak/my_first_backup_02’ backup data package to ‘/data/my_data_02’:
  • the server executes the CHECKPOINT operation, flushes the recovered data from the memory, and completes the data recovery operation.
  • the backup data package is obtained in a purely logical way, it can be allowed to be restored during the operation of the database engine, which is called “operational logic recovery", that is, the above can be performed during the operation of the database engine After the logical backup is restored, there is no need to restart the database engine after the recovery is completed.
  • the database engine can also be restarted, that is, the server can restart the database based on the target library obtained by the logical backup recovery operation
  • the system provides data services, which are not limited in the embodiments of the present application.
  • the physical backup method can usually only be performed when the database engine is in a non-operational state, which is called “non-operational physical recovery".
  • the backup data based on the hybrid method can be restored in the running state and can be restored in a semi-offline situation;
  • the semi-offline situation refers to that the physical backup data is restored in the offline state, then the server is started, and then the recovery is continued Data obtained by log and logical backup.
  • the recovery command of the physical backup may be a command in CLI (command-line interface) format.
  • the server executes the log file in the backup data package.
  • the log file For the current state data backed up, in order to ensure data consistency, the log file also needs to be backed up, that is, when data is restored, it needs to be restored based on the log file.
  • the process of executing the log file may use the principle of the ARIES algorithm to perform recovery work based on the log file (for example, REDO log) in order to achieve data consistency at the time of the backup point.
  • the server executes the log file, it executes the CHECKPOINT operation to flush the recovered data from the memory to complete the data recovery operation.
  • the server starts the database engine based on the new data directory, starts to provide data services, and ends.
  • the recovery process in the above steps 204 to 207 is to restore the backup data package to the blank data directory.
  • the server can start the database engine on the new data directory obtained based on the recovery operation of the physical backup to start providing data services.
  • the server performs data recovery based on the physical backup data in the backup data package.
  • the server checks whether the destination directory is empty, and if it is not empty, it reports an error and exits.
  • it is also necessary to perform data recovery based on the control file in the backup data package to recover to the environment at the time of backup. For example, you can use the ARIES algorithm principle to perform necessary data recovery such as control files.
  • the file copy method is used for the physical backup data. Because of this type of physical backup data, the block copy method is used during the backup. After the backup, an independent data file is formed, so it can also be directly used during recovery Time to use file copy.
  • the data recovery process based on file copying includes: using the file copying method, copying the data physically backed up in the backup data package to a location corresponding to the destination directory according to the file name and table name.
  • the physical backup data in the backup data package is taken in parallel, and multiple data files are copied to the corresponding location of the specified recovery destination directory at the same time. Since the data of this type of physical backup, an independent data file is formed after the backup, so when restoring, data can be restored directly by parallel copying.
  • the server executes the log file in the backup data package.
  • the server After the server completes the execution of the log file, it performs a CHECKPOINT operation to flush the recovered data from the memory to complete the data recovery operation.
  • the server starts the database engine based on the destination directory obtained by restoring the physical backup and starts to provide data services.
  • the server After starting the database engine on the destination directory, the server performs data recovery on the logically backed up data in the backup data package.
  • a logical recovery command After the data recovery based on the physical backup data is completed, construct a logical recovery command and execute the logical recovery command to trigger data recovery of the logically backed up data in the backup data package.
  • the implicit logical recovery SQL statement RECOVERY can be constructed, which is the same as the recovery of the above logical backup, which will not be repeated here.
  • the backup data package it may be a backup data package composed of multiple tar packages. In this case, you can first extract the multiple tar packages to the same temporary directory, and then perform data recovery based on the temporary directory.
  • the temporary directory is used as the content of the recovery data source clause in the recovery command.
  • the temporary directory is used as the content of the FROM clause in the RECOVERY command, and the RECOVERY command is executed to perform data recovery.
  • the recovery of the full amount of data in the database system can generally be performed by offline recovery, so you can only use the CLI BACKUP command to perform recovery, and the recovery of non-full amount of data can support recovery when offline , Can also support recovery when online.
  • the executor of a recovery command must have the ability to start the database engine.
  • the permissions of the operating system, and in some possible implementations, the quilt in the database may also have permission differences. In order to avoid the comprehensiveness of the recovered data, when recovering the logical data, the user permission is not checked, and the recovery operation is allowed.
  • the visibility of the data can also be judged to determine which data can be read and displayed to the user. For example, if the data is backed up in blocks, some data in one page is valid data (meets the backup conditions specified during the backup, such as the WHERE condition), and some is invalid data (does not meet the backup conditions specified during the backup, but due to Block backup mode and redundant backup) should not be read.
  • the data file copy process is not simply block copy, but can be divided into several cases:
  • the backup object of the backup data package is the full table space, and the file copy and/or data block copy can be directly performed without additional operations.
  • backup conditions during backup for example, WHERE conditions during backup
  • the backup conditions can cover all data files, and file copy and/or data block copy can be directly performed without additional operations.
  • backup conditions during backup such as WHERE condition during backup
  • the backup conditions can cover all data files
  • the backup takes the form of file copy or block copy for backup, and the backup is shown in the figure
  • the cross-page flag is set.
  • WHERE conditions can be stored in a meta-information file, such as the HMeta_00000001 file
  • each data block is identified and filtered, which does not meet For backup conditions, each version corresponds to a bit position of 1 in the "Data Valid Bit".
  • the data valid bit indicates that the version is not visible, the version is not returned, if the data valid bit indicates that the version is visible, the version is returned, for example, if The bit value of the corresponding "data valid bit” is 1, the version is not returned upwards; if the bit value of the corresponding "data valid bit” is not 1, the version can be returned upwards, so that The version is visible to the user. Since there is no direct correlation between data blocks, when data is restored, blocks can be used as a unit to set the data identification bits in parallel to improve the recovery efficiency. Usually, the data backup will use the entire backup table as the main part, so that the efficiency problems that may be brought about by the identification and filtering process can be avoided during recovery.
  • the data valid bits are divided into 2 parts.
  • the data valid bits may be represented by 2 bit bits, part of which is The data identification bit, used to indicate whether the version is visible, can be represented by one bit, and the other part is the cross-page identification bit, used to indicate whether the data on this page is spread, that is, the remaining one bit, can be used to indicate the cross Page, 0 means no spread.
  • the data identification bit used to indicate whether the version is visible
  • the other part is the cross-page identification bit, used to indicate whether the data on this page is spread, that is, the remaining one bit, can be used to indicate the cross Page, 0 means no spread.
  • the embodiment of the present application proposes a recovery method based on temporal data backup on the basis of the temporal database, so that after backing up any state data in the full state data, such as logical or physical methods and hybrid backup, Both can achieve data recovery, ensure the effective storage and security of temporal data, and provide effective protection.
  • steps in the flowchart of FIG. 2 are displayed in order according to the arrows, the steps are not necessarily executed in the order indicated by the arrows. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least a part of the steps in FIG. 2 may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed and completed at the same moment, but may be executed at different moments. The order is not necessarily sequential, but may be executed in turn or at least partly with other steps or sub-steps or stages of other steps.
  • FIG. 4 is a schematic structural diagram of a data recovery device provided by an embodiment of the present application.
  • the device includes:
  • the identification module 401 is used to identify the backup type of the backup data packet
  • the data recovery module 402 is used to perform data recovery based on the data of the physical backup in the backup data package when the identified backup type is a hybrid backup.
  • the hybrid backup means that the backup process includes physical backup and logical backup;
  • the data recovery module 402 is also used to perform data recovery on the data logically backed up in the backup data package after data recovery based on the physical backup data is completed.
  • the data recovery module 402 is configured to copy the data physically backed up in the backup data package to a location corresponding to the destination directory according to the file name and table name in a file copy mode.
  • the device further includes:
  • the trigger module is used to construct a logical recovery command after the data recovery based on the physical backup data is completed, execute the logical recovery command, and trigger data recovery of the logically backed up data in the backup data package.
  • the data recovery module 402 is also used to perform data recovery by creating a table in the destination database according to the metadata information file of the backup data package when the identified backup type is a logical backup .
  • the data recovery module 402 is configured to read the meta information of the table file from the meta information file of the backup data package based on the table when the backup data package is obtained based on a snapshot of a conventional transaction
  • the meta-information of the file creates a table in the destination library, and performs an insert operation to restore the data. If a table with the same name exists in the destination library, the restoration fails.
  • the data recovery module 402 is further used to read the meta information of the table file from the meta information file of the backup data package when the backup data package is obtained based on the snapshot of historical transactions; There is no table with the same name in the destination library. Create a table in the destination library based on the meta information of the table file and perform an insert operation to restore the data. If a table with the same name exists in the destination library, the table is not created, only the data restore.
  • the data recovery module 402 is further used to perform data recovery when the backup data packet is obtained based on the regular transaction snapshot and the historical transaction snapshot, and the data recovery method corresponding to the regular transaction snapshot and the historical transaction snapshot .
  • the data recovery module 402 is also used to restore if the same data in the target database in the target database has the same data as the data to be restored, if the table in the target database has the same name If there is no data that is the same as the data to be restored in the content, the data is restored based on the data to be restored.
  • the data recovery module 402 is also used to check whether the same data in the table with the same name in the destination database exists as the data to be recovered through the primary key index.
  • the data recovery module 402 is also used to perform data recovery based on the backup data package and the newly created data directory when the identified backup type is a physical backup.
  • the data recovery module 402 is also used to perform data recovery based on the data physically backed up in the backup data packet in a multi-threaded parallel manner.
  • the data recovery module 402 is also used to perform data recovery on the data logically backed up in the backup data package in a multi-threaded parallel manner.
  • the data recovery module 402 is also used to perform file copy or block copy during the data recovery process if there is no backup condition during backup;
  • backup conditions can cover all data files, file copy or block copy is performed.
  • the data recovery module 402 is also used in the process of data recovery, if the file copy or block copy method is used for backup, and the backup condition is with the backup condition, the backup condition cannot cover all Data file, based on the invalid data marked in the metadata file in the backup data package for data recovery.
  • the identification module 401 is used to obtain the backup type corresponding to the file name according to the file name in the backup data package; or,
  • the backup data packet According to the type identifier carried in the backup data packet, obtain the backup type corresponding to the type identifier.
  • the device further includes:
  • the reading module is used to read any version after the data recovery operation is completed. If the data valid bit of the version indicates that the version is not visible, the version is not returned. If the data valid bit of the version Indicates that the version is visible, then the version is returned.
  • the data recovery device provided in the above embodiment only uses the division of the above functional modules as an example to illustrate the data recovery.
  • the above functions can be allocated by different functional modules according to needs.
  • the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the data recovery device and the data recovery method embodiment provided in the foregoing embodiments belong to the same concept. For the specific implementation process, see the method embodiments, and details are not described here.
  • FIG. 5 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the server 500 may have a relatively large difference due to different configurations or performances, and may include one or more than one central processing unit (CPU) 501 and one Or more than one memory 502, wherein at least one instruction is stored in the memory 502, and the at least one instruction is loaded and executed by the processor 501 to implement the methods provided by the foregoing method embodiments.
  • the server may also have components such as a wired or wireless network interface, a keyboard, and an input-output interface for input and output.
  • the server may also include other components for implementing device functions, which will not be repeated here.
  • An embodiment of the present application further provides a computer-readable storage medium, which is applied to a server, and the computer-readable storage medium stores at least one instruction, at least one program, code set, or instruction set, the instruction , The program, the code set, or the instruction set is loaded and executed by the processor to implement the operations performed by the server in the data recovery method of the foregoing embodiment.
  • whether the version involved in the embodiment of the present application is visible refers to whether the version can be read by the firm at the moment of the transaction snapshot corresponding to the backup task.
  • it is determined whether the version is visible according to the transaction snapshot and the creation time, deletion time, and commit time of the version.
  • Each time the server reads a tuple from the data table it can read the life cycle information of the tuple, that is, the version creation time, deletion time, and the submission time of the version, etc., based on historical time Examples of segment visibility judgment:
  • the program may be stored in a computer-readable storage medium.
  • the mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Retry When Errors Occur (AREA)

Abstract

一种数据恢复方法、装置、服务器以及存储介质,属于数据库技术领域。所述方法包括:识别备份数据包的备份类型;当识别到的备份类型为混合备份时,基于所述备份数据包中物理备份的数据进行数据恢复,所述混合备份是指备份过程包括物理备份和逻辑备份;在基于所述物理备份的数据进行数据恢复完成后,对所述备份数据包中逻辑备份的数据进行数据恢复。

Description

数据恢复方法、装置、服务器以及计算机可读存储介质
本申请要求于2018年11月30日提交中国专利局,申请号为2018114571961,发明名称为“数据恢复方法、装置、服务器以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据库技术领域,特别涉及一种数据恢复方法、装置、服务器以及计算机可读存储介质。
背景技术
在数据处理系统中,尤其是OLAP(Online Analytical Processing,联机实时分析)处理系统、数据仓库、大数据分析等场景中,会涉及到在数据库中存储大量数据。由于业务可能不断更新,因此,一个数据项逻辑上会有多个状态对应的版本数据,这样,一个数据项的全态(当前态、过渡态和历史态)数据会得到保存,从而便于系统追踪历史态数据,充分挖掘数据的价值(任何数据都有价值,历史态的数据不能丢失)。然而,对于数据库来说,还可能涉及到基于备份数据的恢复过程,而对于包含上述多种状态数据的备份数据来说,数据之间的关系复杂,因此,如何恢复全态数据是一个挑战。
发明内容
根据本申请的各种实施例,提供了一种数据恢复方法、装置、服务器以及存储介质。
一种数据恢复方法,由服务器执行,所述方法包括:
识别备份数据包的备份类型;
当识别到的备份类型为混合备份时,基于所述备份数据包中物理备份的数据进行数据恢复,所述混合备份是指备份过程包括物理备份和逻辑备份;
在基于所述物理备份的数据进行数据恢复完成后,对所述备份数据包中逻辑备份的数据进行数据恢复。
可选地,所述识别备份数据包的备份类型包括:
根据所述备份数据包中的文件名称,获取所述文件名称对应的备份类型; 或,根据所述备份数据包所携带的类型标识,获取所述类型标识对应的备份类型。
一种数据恢复装置,所述装置包括:
识别模块,用于识别备份数据包的备份类型;
数据恢复模块,用于当识别到的备份类型为混合备份时,基于所述备份数据包中物理备份的数据进行数据恢复,所述混合备份是指备份过程包括物理备份和逻辑备份;
所述数据恢复模块,还用于在基于所述物理备份的数据进行数据恢复完成后,对所述备份数据包中逻辑备份的数据进行数据恢复。
一种服务器,该服务器包括处理器和存储器,该存储器中存储有至少一条指令,该至少一条指令由该处理器加载并执行以实现如上述数据恢复方法所执行的操作。
一种计算机可读存储介质,该存储介质中存储有至少一条指令,该至少一条指令由处理器加载并执行以实现如上述数据恢复方法所执行的操作。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
图1是本申请实施例提供的一种数据恢复方法的实施环境示意图;
图2是本申请实施例提供的一种数据恢复方法流程图;
图3是本申请实施例提供的一种页面对比图;
图4是本申请实施例提供的一种数据恢复装置的结构示意图;
图5是本申请实施例提供的一种服务器的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请实施例涉及的数据库存储有多个数据表,每个数据表可以用于存储数据项,数据项可以有一个或多个版本。其中,该数据库可以为基于MVCC(Multi-Version Concurrency Control,多版本并发控制)的任一类型的数据库。在本申请实施例中,对该数据库的类型不作具体限定。需要说明的是,上述数 据库中的数据基于状态属性,可以包括三种状态:当前态、过渡态和历史态,该三种状态合称为“数据的全态”,简称全态数据,全态数据中的各个不同状态属性,可以用于标识数据在其生命周期轨迹中所处的状态。
当前态(Current State):数据项的最新版本的数据,是处于当前阶段的数据。处于当前阶段的数据的状态,称为当前态。
过渡态(Transitional State):不是数据项的最新的版本也不是历史态版本,处于从当前态向历史态转变的过程中,处于过渡态的数据,称为半衰数据。
历史态(Historical state):数据项在历史上的一个状态,其值是旧值,不是当前值。处于历史阶段的数据的状态,称为历史态。一个数据项的历史态,可以有多个,反映了数据的状态变迁的过程。处于历史态的数据,只能被读取而不能被修改或删除。
需要说明的是,在MVCC机制下,数据的上述三种状态均存在,在非MVCC机制下,数据可以只存在历史态和当前态。在MVCC或封锁并发访问控制机制下,事务提交后的数据的新值处于当前态。以MVCC机制为例,当前活跃事务列表中最小的事务之前的事务生成的数据,其状态处于历史态。在封锁并发访问控制机制下,事务提交后,提交前的数据的值变为历史态的值,即数据项的旧值处于历史态。而被读取的版本上尚有活跃事务(非最新相关事务)在使用,而由于最新相关事务修改了数据项的值,其最新值已经处于一个当前态,被读取到的值相对当前态已经处于一个历史状态,因此,其数据状态介于当前态和历史态之间,所以称为过渡态。
例如,MVCC机制下,User表的A账户余额从10元充值变为20元,然后消费了15元变为5元,此时金融B机构读取数据做检查事务一直进行中,A之后又充值20元变为25元,则25元为当前态数据,B正在读取到的5元为过渡态,其余的两个值20、10是历史上存在过的状态,都是历史态数据。
可选地,在本实施例中,上述数据恢复方法可以应用于如图1所示的包括多个服务器101的硬件环境中。如图1所示,多个服务器101通过网络进行连接,或者该多个服务器101也可以数据隔离,上述网络包括但不限于:广域网、城域网或局域网。本申请实施例的数据恢复方法可以由任一个服务器101或多个服务器101来执行。服务器101上可以运行有数据库系统,从而提供数据服务,例如数据存储、数据查询等等。该数据库系统可以通过服务器上所运行的 数据库引擎工作。
本申请实施例提供的一种数据恢复方法的流程图。参见图2,该实施例具体包括以下步骤:
201、根据备份数据包内的文件名称,识别备份类型。
由于不同备份类型所产生的备份数据包中文件的文件名称不同,因此,可以通过备份数据包内的文件名称,来识别该备份数据包的备份类型,而基于该备份类型,可以在后续根据不同备份类型来进行对应的数据恢复流程。在另一种可能实施方式中,对于备份类型的识别还可以通过对备份数据包所包括的类型标识进行识别,也即是根据该备份数据包所携带的类型标识获取对应的备份类型。该类型标识可以包括在备份数据包的一个指定文件中,也可以携带于备份数据包的包名中,本申请实施例对此不做具体限定。需要说明的是,上述类型标识可以是用于表示备份类型的字段,如编号、字符串等等。
以基于文件名称的类型识别为例,对于备份数据包来说,不同备份类型的备份过程所得到的备份数据包所包含的数据不同,文件名称也有所不同,示例如下:
物理备份所得到的备份数据包,只有当前态数据,没有历史态数据或过渡态数据。物理备份所得到的备份数据包中文件的文件名称会携带用于表示物理备份的字段,例如文件名称可以为:PhMeta_00000001、PhData_00000001、Log_00000001等。
逻辑备份所得到的备份数据包,可包括全态数据,也即是包括当前态数据、历史态数据和过渡态数据。逻辑备份所得到的备份数据包中文件的文件名称会携带用于表示逻辑备份的字段,例如文件名称可以为:LoMeta_00000001、LoData_00000001、HMeta_00000001、HData_00000001等。
混合备份,也即是混合逻辑备份和物理备份所得到的备份数据包,也可以包括全态数据,也即是包括当前态数据、历史态数据和过渡态数据。混合备份所得到的备份数据包中文件的文件名称会携带用于表示混合备份的字段,例如文件名称可以为:Meta_00000001、Data_00000001、Log_00000001、HMeta_00000001、HData_00000001等。
202、当识别到的备份类型为逻辑备份时,根据该备份数据包的元信息文件,在目的库中通过创建表的方式进行数据恢复。
对于逻辑备份来说,备份数据包中的数据可以是当前态时间点值、历史态时间点值以及全态时间段值,分别是基于不同快照的备份过程得到:
第一种,备份数据包是基于常规事务快照进行逻辑备份得到,对于这类备份的恢复,其恢复过程可以包括:从备份数据包的元信息文件中,读取表文件的元信息,基于该表文件的元信息在目的库中创建表,如果目的库存在同名的表,则恢复失败,然后执行插入操作(如INSERT操作)进行数据恢复。需要说明的是,这种数据恢复只支持“运行态逻辑恢复”。
第二种,备份数据包是基于历史事务快照进行逻辑备份得到,备份数据包中包括历史态和过渡态的备份数据。对于这类备份的恢复,其恢复过程可以包括:从备份数据包的元信息文件中,读取表文件的元信息;如果目的库不存在同名的表,基于该表文件的元信息在目的库中创建表,然后执行插入操作(如INSERT操作)进行数据恢复。如果目的库存在同名的表,则不进行表的创建,只进行数据恢复。在一种可能实施方式中,如果该目的库中该同名的表内中存在与待恢复数据相同的数据,则恢复失败,如果该目的库中该同名的表内中不存在与待恢复数据相同的数据,则基于该待恢复数据进行恢复(例如,强制执行恢复)。其中,可以通过主键索引检查该目的库中同名的表内中是否存在与待恢复数据相同的数据,当存在与待恢复数据的主键索引相同的主键索引时,则说明存在与待恢复数据相同的数据,则恢复失败,而如果目的库的同名的表上无该主键索引,则基于该待恢复数据进行恢复。需要说明的是,这种数据恢复只支持“运行态逻辑恢复”。进一步地,在一些区分权限的数据库系统中,该数据恢复可以是限定由超级用户(Super User,本地的权限最大的用户)进行。
第三种,备份数据包是基于历史事务快照和常规事务快照进行逻辑备份得到,备份数据包中包括当前态、历史态和过渡态的备份数据。对于这类备份的恢复,其恢复过程可以包括上述第一种和第二种的恢复过程,也即是,结合该常规事务快照和该历史事务快照对应的数据恢复方式进行数据恢复,而在进行数据恢复时,为了提高恢复效率,可以不必重复进行同名表的检查,只需进行一次同名表的检查即可。
上述结合该常规事务快照和该历史事务快照对应的数据恢复方式进行数据恢复,包括:从备份数据包的元信息文件中,读取表文件的元信息;基于表文件的元信息在目的库中创建表;对表执行插入操作,以将备份数据包中逻辑备份的数据进行数据恢复;或者,
从备份数据包的元信息文件中,读取表文件的元信息;若目的库中不存在同名的表,基于表文件的元信息在目的库中创建表,执行插入操作进行数据恢复;若目的库中存在同名的表,则对同名的表执行插入操作进行数据恢复。
需要说明的是,在上述过程中,创建表是可选步骤,恢复命令中可以通过参数来指定是否进行表的创建步骤,例如,RECOVERY命令中通过参数“CREATE TABLE=Y/N”指定,Y表示创建,N表示不创建。
对于纯逻辑备份所得到的备份数据包,可以采用如上技术过程来单独恢复。但是,在混合备份的备份数据包中,也会包括采用逻辑备份进行备份的数据,此种数据的恢复,是隐含的逻辑数据恢复,其恢复原理同上。
可选地,为了提高数据恢复的效率,在对多个表进行数据恢复时,可以通过多线程并行进行,具体地,由多个线程并行读取备份数据包中的文件,并行执行读取到的恢复命令。例如,可以对数据文件LoData_00000001、LoData_00000002等并行读取,并行执行每个数据文件里面的SQL命令。
可选地,为了提高数据恢复的效率,对于历史态数据,还可以在恢复过程中,服务器可以禁止事务的各种一致性检查操作,而是直接把数据存放到历史表的数据页中。
在一个示例中,逻辑备份的恢复命令可以是一个SQL语句也可以是一个CLI格式的命令。如,从‘/usr/bak/my_first_backup_02’备份数据包恢复数据到‘/data/my_data_02’中可以采用以下命令:
RECOVERY FROM‘/usr/bak/my_first_backup_02’TO‘/data/my_data_02’;//物理备份的恢复,使用CLI的RECOVERY命令向空目录中恢复
RECOVERY FROM‘/usr/bak/my_first_backup_02’INCLUDE my_table01,my_table02;//逻辑备份的恢复,从一个正在运行的系统中恢复
203、服务器执行CHECKPOINT操作,把已恢复数据从内存中刷出,完成数据恢复操作。
需要说明的是,如果备份数据包是纯逻辑方式得到,可以允许在数据库引擎运行的过程中进行恢复,称为“运行态逻辑恢复”,也即是,可以在数据库引擎的运行过程中进行上述逻辑备份的恢复,在恢复完成后,无需再次启动数据库引擎,当然,为了提高运行速度,也可以重新启动数据库引擎,也即是,服务器可以基于对逻辑备份进行恢复操作得到的目的库,重启数据库系统提供数 据服务,本申请实施例对此不做限定。而物理备份方式的备份通常只能在数据库引擎处于非运行状态时进行,称为“非运行态物理恢复”。基于混合方式的备份数据,在运行态下恢复,可在半脱机情况下恢复;半脱机的情况是指,物理备份的数据,在脱机状态下恢复完毕,然后启动服务器,再继续恢复日志和逻辑备份方式得到的数据。
204、当识别到的备份类型为物理备份时,将该备份数据包恢复到新的数据目录中。
在一个示例中,物理备份的恢复命令可以是一个CLI(command-line interface,命令行界面)格式的命令。
205、当数据恢复后,服务器执行备份数据包中的日志文件。
对于所备份的当前态数据来说,为了保证数据一致性,还需要对日志文件进行备份,也即是,在数据恢复时,需要基于日志文件进行恢复。在一种可能实施方式中,该执行日志文件的过程可以使用ARIES算法原理,来进行基于日志文件(例如REDO日志)的恢复工作,目的是达到备份点时刻的数据的一致性。
206、服务器对日志文件执行完成后,执行CHECKPOINT操作,把已恢复数据从内存中刷出,完成数据恢复操作。
207、服务器基于新数据目录上启动数据库引擎,开始提供数据服务,结束。
上述步骤204至207中的恢复过程是将备份数据包恢复到空白数据目录中,服务器可以在基于对物理备份进行恢复操作得到的新的数据目录上启动数据库引擎,开始提供数据服务。
208、当识别到的备份类型为混合备份时,服务器基于该备份数据包中物理备份的数据进行数据恢复。
在开始进行数据恢复之前,服务器检查目的目录是否为空,如果非空,则报错退出。可选地,对于一些数据库系统来说,还需要基于备份数据包中的控制文件先进行数据恢复,以恢复到备份时的环境。例如,可以使用ARIES算法原理,进行控制文件等必要的数据恢复。
对于物理备份的数据,采用文件拷贝的方式,由于这类物理备份的数据,在备份时就采用了块拷贝的方式进行,其备份后形成了独立的数据文件,因此,在恢复时也可以直接采用文件拷贝的方式时间。具体地,该基于文件拷贝的数据恢复过程包括:采用文件拷贝的方式,将该备份数据包中物理备份的数据, 按照文件名和表名复制到目的目录对应的位置中。如,一个数据文件在原系统中的子目录mydata下,则需要在目的目录中建立对应的子目录mydata,然后把对应的数据文件拷贝到此子目录下,由于文件名是按照Meta系列来进行命名的,即在备份时的原系统中的名称,因此,恢复时可以保证数据一致性。
为了提高数据恢复的效率,对于备份数据包中物理备份的数据,采取并行方式,同时拷贝多个数据文件到指定的恢复目的目录对应的位置。由于这类物理备份的数据,在备份后均形成了独立的数据文件,因此恢复时,可以直接通过并行拷贝的方式来进行数据恢复。
209、当数据恢复后,服务器执行备份数据包中的日志文件。
210、服务器对日志文件执行完成后,执行CHECKPOINT操作,把已恢复数据从内存中刷出,完成数据恢复操作。
211、服务器基于对物理备份进行恢复操作得到的目的目录上启动数据库引擎,开始提供数据服务。
需要说明的是,在服务器在启动数据库引擎的过程中,可以禁止执行“系统故障的恢复”过程。
上述步骤208-211的对物理备份的恢复过程还可以参考上述实施例所涉及的物理备份的恢复所记载的技术内容,在此不做赘述。
212、服务器在目的目录上启动数据库引擎后,对该备份数据包中的逻辑备份的数据进行数据恢复。
在基于该物理备份的数据进行数据恢复完成后,构造逻辑恢复命令,执行该逻辑恢复命令,触发对该备份数据包中逻辑备份的数据进行数据恢复。例如,可以构造隐含的逻辑恢复SQL语句RECOVERY,采用与上述逻辑备份的恢复同理,在此不做赘述。
对于备份数据包来说,可以是由多个tar包构成的一个备份数据包,则此时,可以先将该多个tar包解压到同一个临时目录,然后基于临时目录进行数据恢复,在一种实施方式中,把临时目录作为恢复命令中的恢复数据来源子句的内容,例如,将临时目录作为RECOVERY命令中的FROM子句的内容,执行该RECOVERY命令进行数据恢复。
212、当服务器对逻辑备份的数据进行数据恢复完成后,再次执行CHECKPOINT操作。
需要说明的是,对于数据库系统的全量数据的恢复一般可以通过脱机恢复 进行,因此,可以只采用CLI的BACKUP命令来执行恢复,而对于非全量数据的恢复,既能够支持在脱机时恢复,也能够支持联机时的恢复。
另外,在一些可能实施方式中,对于一些数据库引擎的操作系统来说,用户之间具有权限差别,因此,可以对恢复操作的用户权限进行限制,例如恢复命令的执行者必须具有启动数据库引擎的操作系统的权限,而在一些可能实施方式中,数据库内被也可以具有权限差别,为了避免恢复数据的全面性,恢复逻辑数据时,不进行用户权限的检查,允许进行恢复操作。
可选地,为了保证数据的一致性,在进行了数据恢复后,还可以对数据的可见性进行判断,以确定哪些数据可以被读取显示给用户。例如,如果是以块方式备份的数据,一个页面内,有些数据是有效数据(符合备份时指定的备份条件,例如WHERE条件),有些是无效数据(不符合备份时指定的备份条件,但由于块备份方式而被冗余备份)不应该被读取到。
基于上述因素,相应地,在进行数据恢复时,数据文件的拷贝过程不是简单进行块拷贝,而是可以分为几种情况处理:
1、备份时不带有备份条件(例如备份时不带有WHERE条件),则备份数据包的备份对象为全表空间,可以直接进行文件拷贝和/或数据块拷贝,无额外操作。
2、备份时带有备份条件(例如备份时带有WHERE条件),且备份条件能覆盖全部数据文件,可以直接进行文件拷贝和/或数据块拷贝,无额外操作。
3、备份时带有备份条件(例如备份时带有WHERE条件),且备份条件能不能够覆盖全部数据文件,而备份时采取了文件拷贝或块拷贝的方式进行了备份,且备份时如图3所示,设置过跨页标识位。这种情况下,要在进行数据恢复时,依据备份条件(以混合备份为例,WHERE条件可以存储于元信息文件中,如HMeta_00000001文件中),对每一个数据块进行识别和过滤,不符合备份条件,每个版本在“数据有效位”中都对应的一个bit位置为1。数据恢复完成后,当读取数据操作读到的每个版本,如果数据有效位表示该版本不可见,则不返回该版本,如果数据有效位表示该版本可见,则返回该版本,例如,如果其对应的“数据有效位”的bit位的值为1,则不向上返回该版本;如果其对应的“数据有效位”的bit位的值不为1,则可以向上返回该版本,使得该版本被用户可见。由于数据块之间没有直接关联关系,在数据恢复时,可以采用块为单位, 并行设置数据标识位,以提高恢复效率。通常情况下,数据备份会采用备份表的整体为主,则恢复时能够避免识别和过滤过程可能带来的效率问题。
数据恢复完成后,在上述读取数据的过程中,如图3所示,数据有效位分为2部分,在一种实施方式中,该数据有效位可以采用2个bit位来表示,一部分是数据标识位,用于表示该版本是否可见,可以采用一个bit位表示,另一部分是跨页标识位,用于表示本页数据是否跨页,也即是余下一个bit位,可以采用1表示跨页,0表示不跨页。当有跨页标识位存在时,则寻找数据时不能停止,继续读入下一页进行页面解析。
本申请实施例在时态数据库的基础上,提出了基于时态数据备份的恢复方法,使得在对于全态数据中的任何状态的数据进行备份后,例如逻辑方式或物理方式以及混合备份后,均能够实现数据恢复,保证了时态数据的有效存储和安全可靠性,提供了有效保障。
应该理解的是,虽然图2的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交底地执行。
图4是本申请实施例提供的一种数据恢复装置的结构示意图。参见图4,该装置包括:
识别模块401,用于识别备份数据包的备份类型;
数据恢复模块402,用于当识别到的备份类型为混合备份时,基于该备份数据包中物理备份的数据进行数据恢复,该混合备份是指备份过程包括物理备份和逻辑备份;
该数据恢复模块402,还用于在基于该物理备份的数据进行数据恢复完成后,对该备份数据包中逻辑备份的数据进行数据恢复。
在一种可能实现方式中,数据恢复模块402,用于采用文件拷贝的方式,将该备份数据包中物理备份的数据,按照文件名和表名复制到目的目录对应的位置中。
在一种可能实现方式中,在对该备份数据包中物理备份的数据进行复制时,采用并行方式进行。
在一种可能实现方式中,该装置还包括:
触发模块,用于在基于该物理备份的数据进行数据恢复完成后,构造逻辑恢复命令,执行该逻辑恢复命令,触发对该备份数据包中逻辑备份的数据进行数据恢复。
在一种可能实现方式中,该数据恢复模块402,还用于当识别到的备份类型为逻辑备份时,根据该备份数据包的元信息文件,在目的库中通过创建表的方式进行数据恢复。
在一种可能实现方式中,该数据恢复模块402,用于当该备份数据包是基于常规事务快照得到,从该备份数据包的元信息文件中,读取表文件的元信息,基于该表文件的元信息在目的库中创建表,执行插入操作进行数据恢复,如果该目的库中存在同名的表,则恢复失败。
在一种可能实现方式中,该数据恢复模块402还用于当该备份数据包是基于历史事务快照得到,从该备份数据包的元信息文件中,读取表文件的元信息;若所述目的库中不存在同名的表,基于该表文件的元信息在目的库中创建表,执行插入操作进行数据恢复;若该目的库中存在同名的表,则不进行表的创建,只进行数据恢复。
在一种可能实现方式中,该数据恢复模块402还用于当该备份数据包是基于常规事务快照和历史事务快照得到,结合该常规事务快照和该历史事务快照对应的数据恢复方式进行数据恢复。
在一种可能实现方式中,该数据恢复模块402,还用于如果该目的库中该同名的表内中存在与待恢复数据相同的数据,则恢复失败,如果该目的库中该同名的表内中不存在与待恢复数据相同的数据,则基于该待恢复数据进行恢复。
在一种可能实现方式中,该数据恢复模块402,还用于通过主键索引检查该目的库中该同名的表内中是否存在与该待恢复数据相同的数据。
在一种可能实现方式中,该数据恢复模块402,还用于当识别到的备份类型为物理备份时,基于该备份数据包和新建的数据目录,进行数据恢复。
在一种可能实现方式中,该数据恢复模块402,还用于通过多线程并行的方式,基于所述备份数据包中物理备份的数据进行数据恢复。
在一种可能实现方式中,该数据恢复模块402,还用于通过多线程并行的方 式,对所述备份数据包中逻辑备份的数据进行数据恢复。
在一种可能实现方式中,该数据恢复模块402,还用于在数据恢复过程中,如果备份时不带有备份条件,进行文件拷贝或块拷贝;
如果备份时带有备份条件,且该备份条件能覆盖全部数据文件,进行文件拷贝或块拷贝。
在一种可能实现方式中,该数据恢复模块402,还用于在数据恢复过程中,如果备份时采用文件拷贝或块拷贝方式进行,且备份时带有备份条件,该备份条件不能够覆盖全部数据文件,基于备份数据包中元信息文件中所标记的无效数据进行数据恢复。
在一种可能实现方式中,该识别模块401用于根据该备份数据包中的文件名称,获取该文件名称对应的备份类型;或,
根据该备份数据包所携带的类型标识,获取该类型标识对应的备份类型。
在一种可能实现方式中,该装置还包括:
读取模块,用于在数据恢复完成后,当读取数据操作读取到任一版本,如果该版本的数据有效位表示该版本不可见,则不返回该版本,如果该版本的数据有效位表示该版本可见,则返回该版本。
需要说明的是:上述实施例提供的数据恢复装置在数据恢复时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的数据恢复装置与数据恢复方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图5是本申请实施例提供的一种服务器的结构示意图,该服务器500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)501和一个或一个以上的存储器502,其中,所述存储器502中存储有至少一条指令,所述至少一条指令由所述处理器501加载并执行以实现上述各个方法实施例提供的方法。当然,该服务器还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该服务器还可以包括其他用于实现设备功能的部件,在此不做赘述。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质应用于服务器,该计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,该指令、该程序、该代码集或该指令集由处理器加载并执行以实现上述实施例的数据恢复方法中服务器所执行的操作。
需要说明的是,本申请实施例涉及的版本是否可见,是指该版本在备份任务所对应的事务快照的时刻是否能够被事务所读取。对于数据表内的任一元组的任一版本,根据事务快照以及所述版本的创建时刻、删除时刻以及所述版本的提交时刻,确定所述版本是否可见。服务器从数据表中每读取一条元组,均可以读取到该元组的生命周期信息,也即是该版本的创建时刻、删除时刻以及所述版本的提交时刻等信息,以基于历史时间段的可见性判断为例:
(一):当该版本为插入操作生成,当该创建时刻在该历史时间段的起始时刻之前,该提交时刻在该历史时间段之间,确定该版本可见;或者,当该创建时刻和该提交时刻均在该历史时间段之间,确定该版本可见。
(二):当该版本为删除操作生成,当该删除时刻在该历史时间段的起始时刻之前,该提交时刻在该历史时间段之间,确定该版本可见;或者,当该删除时刻和该提交时刻均在该历史时间段之间,确定该版本可见。
(三):当该版本为更新操作生成,当该创建时刻在该历史时间段的起始时刻之前,该提交时刻在该历史时间段之间,确定该版本可见;或者,当该创建时刻在该历史时间段的起始时刻之后,该提交时刻在该历史时间段之间,确定该版本可见。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但 并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种数据恢复方法,由服务器执行,其特征在于,所述方法包括:
    识别备份数据包的备份类型;
    当识别到的备份类型为混合备份时,基于所述备份数据包中物理备份的数据进行数据恢复,所述混合备份是指备份过程包括物理备份和逻辑备份;
    在基于所述物理备份的数据进行数据恢复完成后,对所述备份数据包中逻辑备份的数据进行数据恢复。
  2. 根据权利要求1所述的方法,其特征在于,所述当识别到的备份类型为混合备份时,基于所述备份数据包中物理备份的数据进行数据恢复包括:
    采用文件拷贝的方式,将所述备份数据包中物理备份的数据,按照文件名和表名复制到目的目录对应的位置中。
  3. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    采用并行方式,将所述备份数据包中物理备份的数据复制到目的目录对应的位置。
  4. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    在基于所述物理备份的数据进行数据恢复完成后,构造逻辑恢复命令;
    执行所述逻辑恢复命令,触发对所述备份数据包中逻辑备份的数据进行数据恢复。
  5. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    当识别到的备份类型为逻辑备份时,根据所述备份数据包的元信息文件,在目的库中创建表,进行数据恢复。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述备份数据包的元信息文件,在目的库中创建表,进行数据恢复包括:
    当所述备份数据包是基于常规事务快照得到时,从所述备份数据包的元信息文件中,读取表文件的元信息;
    基于所述表文件的元信息在所述目的库中创建所述表;
    对所述表执行插入操作,以将所述备份数据包中逻辑备份的数据进行数据恢复。
  7. 根据权利要求5所述的方法,其特征在于,所述根据所述备份数据包的元信息文件,在目的库中通过创建表的方式进行数据恢复包括:
    当所述备份数据包是基于历史事务快照得到,从所述备份数据包的元信息 文件中,读取表文件的元信息;
    若所述目的库中不存在同名的表,基于所述表文件的元信息在目的库中创建表,执行插入操作进行数据恢复;
    若所述目的库中存在同名的表,则对所述同名的表执行插入操作进行数据恢复。
  8. 根据权利要求5所述的方法,其特征在于,所述根据所述备份数据包的元信息文件,在目的库中创建表,进行数据恢复包括:
    当所述备份数据包是基于常规事务快照和历史事务快照得到时,从所述备份数据包的元信息文件中,读取表文件的元信息;基于所述表文件的元信息在所述目的库中创建所述表;对所述表执行插入操作,以将所述备份数据包中逻辑备份的数据进行数据恢复;或者,
    从所述备份数据包的元信息文件中,读取表文件的元信息;若所述目的库中不存在同名的表,基于所述表文件的元信息在目的库中创建表,执行插入操作进行数据恢复;若所述目的库中存在同名的表,则对所述同名的表执行插入操作进行数据恢复。
  9. 根据权利要求7所述的方法,其特征在于,所述若所述目的库中存在同名的表,则对所述同名的表执行插入操作进行数据恢复包括:
    如果所述目的库中所述同名的表内中不存在与待恢复数据相同的数据,则基于所述待恢复数据进行恢复。
  10. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    当识别到的备份类型为物理备份时,基于所述备份数据包和新建的数据目录,进行数据恢复。
  11. 根据权利要求1所述的方法,其特征在于,所述基于所述备份数据包中物理备份的数据进行数据恢复包括:
    通过多线程并行的方式,基于所述备份数据包中物理备份的数据进行数据恢复。
  12. 根据权利要求1所述的方法,其特征在于,所述对所述备份数据包中逻辑备份的数据进行数据恢复包括:
    通过多线程并行的方式,对所述备份数据包中逻辑备份的数据进行数据恢复。
  13. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    在数据恢复过程中,如果备份时不带有备份条件,进行文件拷贝或块拷贝;
    如果备份时带有备份条件,且所述备份条件能覆盖全部数据文件,进行文件拷贝或块拷贝。
  14. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    在数据恢复过程中,如果备份时采用文件拷贝或块拷贝方式进行,且备份时带有备份条件,所述备份条件不能够覆盖全部数据文件,基于备份数据包中元信息文件中所标记的无效数据进行数据恢复。
  15. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    在数据恢复完成后,当读取数据操作读取到任一版本,如果所述版本的数据有效位表示所述版本不可见,则不返回所述版本;
    如果所述版本的数据有效位表示所述版本可见,则返回所述版本。
  16. 一种数据恢复装置,其特征在于,所述装置包括:
    识别模块,用于识别备份数据包的备份类型;
    数据恢复模块,用于当识别到的备份类型为混合备份时,基于所述备份数据包中物理备份的数据进行数据恢复,所述混合备份是指备份过程包括物理备份和逻辑备份;
    所述数据恢复模块,还用于在基于所述物理备份的数据进行数据恢复完成后,对所述备份数据包中逻辑备份的数据进行数据恢复。
  17. 一种服务器,其特征在于,所述服务器包括处理器和存储器,所述存储器中存储有至少一条指令,所述至少一条指令由该处理器加载并执行时,使得所述服务器执行:
    识别备份数据包的备份类型;
    当识别到的备份类型为混合备份时,基于所述备份数据包中物理备份的数据进行数据恢复,所述混合备份是指备份过程包括物理备份和逻辑备份;
    在基于所述物理备份的数据进行数据恢复完成后,对所述备份数据包中逻辑备份的数据进行数据恢复。
  18. 根据权利要求17所述的服务器,其特征在于,所述至少一条指令由该处理器加载并执行,使得所述服务器具体执行以下步骤:
    采用文件拷贝的方式,将所述备份数据包中物理备份的数据,按照文件名和表名复制到目的目录对应的位置中。
  19. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少 一条指令,所述至少一条指令由该处理器加载并执行时,使得所述服务器执行:
    识别备份数据包的备份类型;
    当识别到的备份类型为混合备份时,基于所述备份数据包中物理备份的数据进行数据恢复,所述混合备份是指备份过程包括物理备份和逻辑备份;
    在基于所述物理备份的数据进行数据恢复完成后,对所述备份数据包中逻辑备份的数据进行数据恢复。
  20. 根据权利要求19所述的计算机可读存储介质,其特征在于,所述至少一条指令由该处理器加载并执行,使得所述服务器具体执行以下步骤:
    采用文件拷贝的方式,将所述备份数据包中物理备份的数据,按照文件名和表名复制到目的目录对应的位置中。
PCT/CN2019/121916 2018-11-30 2019-11-29 数据恢复方法、装置、服务器以及计算机可读存储介质 WO2020108604A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP19889862.9A EP3822793A4 (en) 2018-11-30 2019-11-29 DATA RECOVERY METHOD AND APPARATUS, SERVER AND COMPUTER READABLE STORAGE MEDIUM
JP2021506468A JP7108782B2 (ja) 2018-11-30 2019-11-29 データリカバリー方法、装置、サーバ及びコンピュータ・プログラム
US17/175,139 US11531594B2 (en) 2018-11-30 2021-02-12 Data recovery method and apparatus, server, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811457196.1A CN110209527B (zh) 2018-11-30 2018-11-30 数据恢复方法、装置、服务器以及存储介质
CN201811457196.1 2018-11-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/175,139 Continuation US11531594B2 (en) 2018-11-30 2021-02-12 Data recovery method and apparatus, server, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2020108604A1 true WO2020108604A1 (zh) 2020-06-04

Family

ID=67779952

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121916 WO2020108604A1 (zh) 2018-11-30 2019-11-29 数据恢复方法、装置、服务器以及计算机可读存储介质

Country Status (5)

Country Link
US (1) US11531594B2 (zh)
EP (1) EP3822793A4 (zh)
JP (1) JP7108782B2 (zh)
CN (1) CN110209527B (zh)
WO (1) WO2020108604A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114896102A (zh) * 2022-05-23 2022-08-12 北京智博万维科技有限公司 一种数据保护的时间点恢复方法与系统

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209527B (zh) * 2018-11-30 2023-05-05 腾讯科技(深圳)有限公司 数据恢复方法、装置、服务器以及存储介质
CN111125060B (zh) * 2019-12-17 2023-10-31 中国联合网络通信集团有限公司 一种数据库管理方法、系统、设备及存储介质
CN112882861B (zh) * 2021-02-18 2023-11-07 北京思特奇信息技术股份有限公司 一种业务配置数据装载与恢复系统及方法
CN114153653B (zh) * 2021-10-25 2024-07-30 中国农业银行股份有限公司福建省分行 一种基于增量备份的数据恢复方法、设备及介质
CN113821382B (zh) * 2021-11-24 2022-03-01 西安热工研究院有限公司 一种实时数据库数据处理方法、系统和设备
CN116467037B (zh) * 2023-06-09 2023-09-22 成都融见软件科技有限公司 一种图形用户界面工作状态的恢复方法
CN118331795B (zh) * 2024-06-17 2024-10-11 天津南大通用数据技术股份有限公司 在线备份设备和方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942117A (zh) * 2013-01-21 2014-07-23 中国移动通信集团河南有限公司 一种数据备份方法、装置及系统
US8990161B1 (en) * 2008-09-30 2015-03-24 Emc Corporation System and method for single segment backup
CN107179965A (zh) * 2017-04-25 2017-09-19 北京潘达互娱科技有限公司 数据库恢复方法及装置
CN110209527A (zh) * 2018-11-30 2019-09-06 腾讯科技(深圳)有限公司 数据恢复方法、装置、服务器以及存储介质

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6938180B1 (en) * 2001-12-31 2005-08-30 Emc Corporation Logical restores of physically backed up data
JP2004139217A (ja) 2002-10-16 2004-05-13 Hewlett Packard Co <Hp> データベースの移行方法
US7389314B2 (en) * 2004-08-30 2008-06-17 Corio, Inc. Database backup, refresh and cloning system and method
US7756833B2 (en) 2004-09-22 2010-07-13 Microsoft Corporation Method and system for synthetic backup and restore
JP4104586B2 (ja) 2004-09-30 2008-06-18 株式会社東芝 ファイル管理機能を備えたファイルシステム及びファイル管理方法
US8099572B1 (en) * 2008-09-30 2012-01-17 Emc Corporation Efficient backup and restore of storage objects in a version set
JP5683088B2 (ja) 2009-08-31 2015-03-11 沖電気工業株式会社 復旧システム、復旧方法及びバックアップ制御システム
US8639665B2 (en) * 2012-04-04 2014-01-28 International Business Machines Corporation Hybrid backup and restore of very large file system using metadata image backup and traditional backup
US9183205B1 (en) * 2012-10-05 2015-11-10 Symantec Corporation User-based backup
JP6064608B2 (ja) 2013-01-17 2017-01-25 富士通株式会社 ストレージ装置、バックアッププログラム、およびバックアップ方法
US9424265B1 (en) * 2013-05-30 2016-08-23 Emc Corporation Method and system for backing up and restoring a multi-user relational database management system
CN104657382B (zh) 2013-11-21 2018-09-14 阿里巴巴集团控股有限公司 用于MySQL主从服务器数据一致性检测的方法和装置
CN104850598B (zh) * 2015-04-28 2017-11-14 江苏瑞中数据股份有限公司 一种实时数据库备份恢复方法
CN106354583B (zh) * 2016-08-30 2019-09-17 广州鼎甲计算机科技有限公司 一种MySQL数据库的热备份方法与系统
CN106658753B (zh) * 2016-09-14 2020-01-17 Oppo广东移动通信有限公司 一种数据迁移方法及终端设备
CN108319623B (zh) * 2017-01-18 2021-10-22 华为技术有限公司 一种数据重分布方法、装置及数据库集群
CN107145403B (zh) * 2017-04-20 2020-06-30 浙江工业大学 面向Web开发环境的关系型数据库数据回溯方法
CN110309233B (zh) 2018-03-28 2022-11-15 腾讯科技(深圳)有限公司 数据存储的方法、装置、服务器和存储介质
CN110309122B (zh) 2018-03-28 2022-12-30 腾讯科技(深圳)有限公司 获取增量数据的方法、装置、服务器和存储介质
CN110196758A (zh) 2018-05-10 2019-09-03 腾讯科技(深圳)有限公司 数据处理方法和装置、存储介质及电子装置
CN110209528B (zh) 2018-11-30 2022-10-28 腾讯科技(深圳)有限公司 数据备份方法、装置、服务器以及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8990161B1 (en) * 2008-09-30 2015-03-24 Emc Corporation System and method for single segment backup
CN103942117A (zh) * 2013-01-21 2014-07-23 中国移动通信集团河南有限公司 一种数据备份方法、装置及系统
CN107179965A (zh) * 2017-04-25 2017-09-19 北京潘达互娱科技有限公司 数据库恢复方法及装置
CN110209527A (zh) * 2018-11-30 2019-09-06 腾讯科技(深圳)有限公司 数据恢复方法、装置、服务器以及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3822793A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114896102A (zh) * 2022-05-23 2022-08-12 北京智博万维科技有限公司 一种数据保护的时间点恢复方法与系统
CN114896102B (zh) * 2022-05-23 2022-11-25 北京智博万维科技有限公司 一种数据保护的时间点恢复方法与系统

Also Published As

Publication number Publication date
CN110209527A (zh) 2019-09-06
CN110209527B (zh) 2023-05-05
US11531594B2 (en) 2022-12-20
US20210165716A1 (en) 2021-06-03
JP2021533495A (ja) 2021-12-02
EP3822793A1 (en) 2021-05-19
EP3822793A4 (en) 2022-04-20
JP7108782B2 (ja) 2022-07-28

Similar Documents

Publication Publication Date Title
WO2020108604A1 (zh) 数据恢复方法、装置、服务器以及计算机可读存储介质
EP2356560B1 (en) Atomic multiple modification of data in a distributed storage system
US10204112B1 (en) Integrated workflow management and version control
US9135287B2 (en) Distributed, transactional key-value store
US7117229B2 (en) Method and system for online reorganization of databases
US9223805B2 (en) Durability implementation plan in an in-memory database system
CN110209528B (zh) 数据备份方法、装置、服务器以及存储介质
Pillai et al. Crash consistency
US20240143386A1 (en) Using multiple blockchains for applying transactions to a set of persistent data objects in persistent storage systems
US7765247B2 (en) System and method for removing rows from directory tables
Pillai et al. Crash Consistency: Rethinking the Fundamental Abstractions of the File System

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19889862

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021506468

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2019889862

Country of ref document: EP

Effective date: 20210212

NENP Non-entry into the national phase

Ref country code: DE