CN116185711A - Data backup and recovery method and device - Google Patents

Data backup and recovery method and device Download PDF

Info

Publication number
CN116185711A
CN116185711A CN202211686669.1A CN202211686669A CN116185711A CN 116185711 A CN116185711 A CN 116185711A CN 202211686669 A CN202211686669 A CN 202211686669A CN 116185711 A CN116185711 A CN 116185711A
Authority
CN
China
Prior art keywords
backup
log
recovery
data
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211686669.1A
Other languages
Chinese (zh)
Inventor
项军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202211686669.1A priority Critical patent/CN116185711A/en
Publication of CN116185711A publication Critical patent/CN116185711A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a data backup and recovery method and device. Comprising the following steps: and responding to a data backup instruction, and backing up the physical file by adopting a pre-written log based on a preset target file format to obtain a backup log file, wherein the target file format comprises the following steps: backup abstract, backup information list and target recovery information; storing the backup log file into a backup list, wherein each backup node in the backup list comprises backup information of a corresponding physical file, and the backup information comprises: the ID of the backup node, the identification of the backup node, a first log serial number corresponding to the backup node for starting backup and a second log serial number for ending backup; and responding to the data recovery instruction, and recovering the physical files according to the corresponding backup log files in the backup list based on the target file format. The method and the device solve the technical problem that the data backup and recovery time is long because the related technology cannot optimize the data backup and recovery process.

Description

Data backup and recovery method and device
Technical Field
The present invention relates to the field of data storage technologies, and in particular, to a data backup and recovery method and apparatus.
Background
Database systems are widely used in cloud services and application programs, and data backup and data recovery are one of important characteristics of the database systems, so that in order to avoid data damage caused by database system faults, the cloud services can effectively backup data through a certain mechanism.
At present, data security is ensured in two modes of physical backup and logical backup, wherein the physical backup is to restore original data by directly copying files, and the logical backup is to generate a series of query statement sequences capable of reproducing databases and store the query statement sequences in new files. While both of the above methods support full and incremental backups of the current database data, both require additional I/O operations to extract the original data from the database system or to perform duplicate statements at the time of the backup, and in addition, both methods require a long time to restore the backup.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the application provides a data backup and recovery method and device, which at least solve the technical problem that the data backup and recovery time is long because the related technology cannot optimize the data backup and recovery process.
According to an aspect of the embodiments of the present application, there is provided a data backup and restore method, including: and responding to a data backup instruction, and backing up the physical file by adopting a pre-written log based on a preset target file format to obtain a backup log file, wherein the target file format comprises the following steps: backup abstract, backup information list and target recovery information; storing the backup log files into a backup list, wherein each backup node in the backup list contains backup information of the corresponding physical file, and the backup information comprises: the ID of the backup node, the identification of the backup node, a first log serial number corresponding to the backup node for starting backup and a second log serial number for ending backup; and responding to the data recovery instruction, and recovering the physical files according to the corresponding backup log files in the backup list based on the target file format.
Optionally, the target file format includes: the backup abstract comprises the following steps: the ID of the backup log file, the log serial number of the backup log file and the number of the effective backup log files; the backup information list includes: backup information of the pre-written log corresponding to each backup node in the backup list; the target recovery information includes: the state of the recovery log file, the recovery type of the recovery log file, the ID of the recovery log file, the third log serial number corresponding to the recovery log file and used for starting recovery, the fourth log serial number corresponding to the recovery log file and used for ending recovery, and the recovery start time and the recovery end time.
Optionally, in response to a data backup instruction, backing up the physical file with a pre-written log based on a preset target file format to obtain a backup log file, including: responding to a data backup instruction, and recording a first log serial number of the pre-written log for starting backup to a target file; determining a first backup ID and a first log serial number of an end backup node in a backup list; creating a target backup node corresponding to the physical file, and inserting the target backup node into the tail of the backup list, wherein the identification of the target backup node is a time stamp; recording the backup position of the pre-written log according to the log serial number of the backup start and the log serial number of the backup end corresponding to the target backup node until a second log serial number of the tail log data of the pre-written log is read, wherein the second log serial number is the backup end position of the first backup node and the backup start position of the next backup node; and inserting the target backup node into the value backup list to obtain a backup log file.
Optionally, after obtaining the backup log file, the method includes: recording a second log serial number of the pre-written log, which ends backup, to a target file, and updating a backup list and a backup abstract; when all log data in the pre-written log are backed up, deleting the log data of the pre-written log in the backup log file.
Optionally, in response to the data recovery instruction, recovering the physical file according to the corresponding backup log file in the backup list based on the target file format, including: responding to a data recovery instruction, and recording a third log serial number and a recovery start time for starting the recovery of the pre-written log to a target file; determining a first amount of recovery log data within the backup log file, and when the first amount exceeds a second amount of log data that can be accommodated by the buffer pool, determining to divide the recovery log data into a plurality of recovery range blocks, wherein a third amount of recovery log data within each recovery range block does not exceed the second amount of log data that can be accommodated by the buffer pool; for each recovery range block, scanning recovery log data in the recovery range block, and copying all recovery log data in the recovery range block to a buffer pool; the recovery log data is copied to the database table space based on the multithreaded update buffer pool.
Optionally, after copying the recovery log data to the database table space, further comprising: and recording the fourth log serial number and the recovery ending time of the ending recovery of the pre-written log to the target file.
Optionally, determining the log length of the pre-written log, and dividing the pre-written log into log segments with target length; if the log length is smaller than the target length, determining a difference value between the log length and the target length, and filling log segments corresponding to the difference value by adopting a fourth number of all zero blocks.
Optionally, determining the column content of each log block in the pre-written log; if the contents of each column of each log block contain the same information, extracting the same first mark, and counting the occurrence times of the same first mark; if the content of each column of each log block contains similar information, reserving a second mark of each log column in each log block, and converting the second mark by adopting an incremental coding mode; if each column of content of each log block contains a third repetition mark, constructing a dictionary corresponding to the third repetition mark, and replacing the third repetition mark by an index corresponding to the third repetition mark in the dictionary; and if the contents of each column of each log block contain prefix character strings, extracting the prefix character strings, storing the contents of the prefix character strings in the contents of each column by adopting a fourth mark, and deleting the contents of the prefix character strings in the contents of each column.
According to another aspect of the embodiments of the present application, there is also provided a data backup and recovery apparatus, including: the backup module is used for responding to a data backup instruction, backing up the physical files by adopting a pre-written log based on a preset target file format to obtain backup log files, wherein the target file format comprises the following components: backup abstract, backup information list and target recovery information; the storage module is used for storing the backup log files into a backup list, wherein each backup node in the backup list contains backup information of the corresponding physical file, and the backup information comprises: the ID of the backup node, the identification of the backup node, a first log serial number corresponding to the backup node for starting backup and a second log serial number for ending backup; and the recovery module is used for responding to the data recovery instruction, and recovering the physical files according to the corresponding backup log files in the backup list based on the target file format.
According to another aspect of the embodiments of the present application, there is also provided an electronic device including: the system comprises a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the data backup and recovery method through the computer program.
In this embodiment of the present application, in response to a data backup instruction, a physical file is backed up by using a pre-written log based on a preset target file format, so as to obtain a backup log file, where the target file format includes: backup abstract, backup information list and target recovery information; storing the backup log files into a backup list, wherein each backup node in the backup list contains backup information of the corresponding physical file, and the backup information comprises: the ID of the backup node, the identification of the backup node, a first log serial number corresponding to the backup node for starting backup and a second log serial number for ending backup; and responding to the data recovery instruction, and recovering the physical files according to the corresponding backup log files in the backup list based on the target file format. The pre-written log is used as backup data, so that additional I/O operation is reduced, backup efficiency is improved, and further the technical problem that the data backup recovery time is long because the data backup recovery process cannot be optimized in the related technology is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a flow chart of an alternative data backup and restore method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of the format of an alternative object file according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an alternative data backup and restore device according to an embodiment of the present application.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and the accompanying drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
The related database backup mechanism and the recovery mechanism are mainly divided into application layer backup recovery and file system layer backup recovery. The application layer backup recovery is mainly physical backup and logical backup, wherein the physical backup is realized by directly copying the whole original data file of the database, and the logical backup is realized by using a binlog binary log containing event sentences of the modified database. Since both of the above methods perform additional I/O operations at the time of backup, it takes a long time to extract data from the database or copy files at the time of performing the backup.
The file system layer is mainly backed up and restored through file version control and file snapshot, mainly using copy-write strategy, but the method relies on specific ext3 file system, and the performance of SSD may be impaired due to the additional I/O operations existing in the copy-write strategy.
Therefore, both the backup mechanism and the recovery mechanism cannot realize the optimized data backup recovery process, resulting in longer data backup recovery time.
In order to solve the above-described problems, the embodiments of the present application provide a data backup and restore method, it should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order different from that illustrated herein.
FIG. 1 is a flow chart of an alternative data backup and restore method according to an embodiment of the present application, as shown in FIG. 1, the method at least includes steps S102-S106, wherein:
step S102, in response to a data backup instruction, backing up physical files by using a pre-written log based on a preset target file format to obtain backup log files, wherein the target file format comprises: backup summary, backup information list and target recovery information.
In the technical solution provided in the above step S102 of the present invention, the physical file includes: a data file and a log file; a Write Ahead Log (WAL) is one of the basic components of a database system, which has a record of all data changes in the database, so the Log data through the Write Ahead Log is used as backup data in the present application, so that no additional I/O operation needs to be performed.
As an optional implementation manner, in the technical solution provided in step S102 of the present invention, the method includes: the backup abstract comprises the following steps: the ID of the backup log file, the log serial number of the backup log file and the number of the effective backup log files; the backup information list includes: backup information of the pre-written log corresponding to each backup node in the backup list; the target recovery information includes: the state of the recovery log file, the recovery type of the recovery log file, the ID of the recovery log file, the third log serial number corresponding to the recovery log file and used for starting recovery, the fourth log serial number corresponding to the recovery log file and used for ending recovery, and the recovery start time and the recovery end time.
Optionally, fig. 2 is a schematic format diagram of an optional target file according to an embodiment of the present application, as can be seen from fig. 2, the last backup ID, the last backup log sequence number, the effective backup number, and the last restore information can be intuitively read by the target file, and in addition, the backup information corresponding to each timestamp can be read by the backup information list in the target file shown in fig. 2, where the log sequence number (Log Sequence Number, LSN) is used to identify the location where a specific log file is recorded in the backup log file.
As an alternative implementation manner, in response to a data backup instruction, recording a first log serial number of a pre-written log for starting backup to a target file; determining a first backup ID and a first log serial number of an end backup node in a backup list; creating a target backup node corresponding to the physical file, and inserting the target backup node into the tail of the backup list, wherein the identification of the target backup node is a time stamp; recording the backup position of the pre-written log according to the log serial number of the backup start and the log serial number of the backup end corresponding to the target backup node until a second log serial number of the tail log data of the pre-written log is read, wherein the second log serial number is the backup end position of the first backup node and the backup start position of the next backup node; and inserting the target backup node into the value backup list to obtain a backup log file.
Further, the second log serial number of the pre-written log, which is used for finishing backup, is recorded to the target file, and a backup list and a backup abstract are updated; when all log data in the pre-written log are backed up, deleting the log data of the pre-written log in the backup log file.
In this embodiment, when the database system receives a data backup instruction, the database system first obtains the current position of the pre-written log at the current log position, and records the LSN value of the current log in the target file to represent that the backup process is started; secondly, loading the last backup node from the backup list, and acquiring the backup ID and LSN value of the backup node; then, the database system creates and distributes a new target backup node, wherein the unique identification of the target backup node is a time stamp, and the target backup node contains necessary target backup information; then, log data of the WAL is used as backup data, the position of each log data backup is recorded by using the starting log sequence value and the ending log sequence value until the LSN value of the last page of the WAL is read, wherein the LSN value can be used as the ending position of the current backup and the starting position of the next backup, and the target backup node is inserted into the end of the backup list to obtain the backup log file.
In addition, the system updates the target file according to the updated backup list and the backup abstract at the end of each backup, and deletes the log data of the redundant pre-written log after the complete backup.
As an alternative embodiment, the database system may also use a full backup policy to backup all physical files under the data list to the backup list in response to the data backup instruction.
Step S104, storing the backup log files into a backup list, wherein each backup node in the backup list contains backup information of the corresponding physical file, and the backup information comprises: the ID of the backup node, the identification of the backup node, a first log serial number corresponding to the backup node for starting backup and a second log serial number for ending backup.
In the technical scheme provided in the step S104, in order to facilitate the server to quickly recover the data file from the backup data when the disaster occurs, the backup log file may be stored in the backup list according to the time stamp sequence.
Step S106, responding to the data recovery instruction, and recovering the physical files according to the corresponding backup log files in the backup list based on the target file format.
In the technical solution provided in the above step S106 of the present invention, in order to increase the speed of data recovery and simultaneously ensure the reliability of the recovered data, the target file format in step S102 may be adopted to recover the physical file from the backup log file.
As an optional implementation manner, in the technical solution provided in step S106 of the present invention, the method includes: responding to a data recovery instruction, and recording a third log serial number and a recovery start time for starting the recovery of the pre-written log to a target file; determining a first amount of recovery log data within the backup log file, and when the first amount exceeds a second amount of log data that can be accommodated by the buffer pool, determining to divide the recovery log data into a plurality of recovery range blocks, wherein a third amount of recovery log data within each recovery range block does not exceed the second amount of log data that can be accommodated by the buffer pool; for each recovery range block, scanning recovery log data in the recovery range block, and copying all recovery log data in the recovery range block to a buffer pool; the recovery log data is copied to the database table space based on the multithreaded update buffer pool.
Further, the fourth log serial number and the recovery end time of the end recovery of the pre-written log are recorded to the target file.
In this embodiment, when receiving a data recovery instruction, the database system records recovery information that the pre-written log starts to recover in the target file first, so as to represent starting of a recovery flow, and can also ensure that data consistency is provided in the case of interruption of the recovery process; secondly, in order to prevent the single-thread recovery overload caused by overlarge log data in the buffer pool in the recovery process, the database system can estimate the size of the recovery log data, and if the size of the recovery log data exceeds the size of the buffer pool, the whole recovery log data is divided into a plurality of recovery range blocks which are not larger than the size of the buffer pool; then, for each recovery range block, scanning and reading the log data in the recovery range block, and copying the log data into a buffer pool until the log data in the current recovery range block is completely loaded, so that the recovery of the recovery range block is completed, and when the recovery range block needs to be described, the log data in the table space of the recovery data system can be skipped at the moment, so that the recovery data volume is reduced; finally, the database system enables the multithreaded flush buffer pool and flushes the recovery log data into the tablespace.
If there are other recovery range blocks, the recovery process is continued. If the log data of the backup log file is restored, the database system updates the restoration ending information into the backup/restoration file to indicate that the restoration process is completed; otherwise, the data consistency is ensured by the resumption of the database.
As an alternative embodiment, the database system may also employ a full backup restoration policy to restore the physical files from the backup list to the data list in response to the data restoration instruction.
In addition, in order to reduce the leakage of the WAL, in the embodiment of the present application, padding is used to ensure that each WAL written to the persistent storage has the same size, thereby ensuring that an attacker cannot use the size of the WAL write to discover the secret, and thus the persistence of the time checkpoint.
As another alternative embodiment, determining the log length of the pre-written log, and dividing the pre-written log into log segments of a target length; if the log length is smaller than the target length, determining a difference value between the log length and the target length, and filling log segments corresponding to the difference value by adopting a fourth number of all zero blocks.
Alternatively, the WAL write follows the write of the slot buffer in the server memory, then each slot write is divided into fixed-size segments, the remaining segments are filled with "0" if necessary, and then each segment is written to the log file on disk sequentially once. Wherein the padding always consists of one or more all zero blocks of 4 bytes, since the length of the log record and segment is a multiple of 4.
It should be noted that, the segment size of the segmentation is a key parameter, setting too small reduces space overhead caused by filling the written segment boundary, and setting too large segment size increases space overhead and reduces delay; in addition, considering that the delay of writing the WAL is increased due to the sequential writing of the segments, increasing the total number of segments, etc., in this embodiment of the present application, the default is 100B, which is only illustrated here as an example, and the size of the divided segments may be set according to the actual scenario, which is not particularly limited herein.
On the other hand, since the log lines in the small log blocks in the log file are generated in a short time, the same, similar, duplicate, common prefix information is likely to be contained in the small log blocks, and thus in the embodiment of the present application, in order to reduce the size of the backup data, preprocessing is performed for the repeatability in the non-content part of the log file.
As another alternative embodiment, the respective column content of each log block in the pre-written log is determined; when the contents of each column of each log block contain the same information, extracting the same first mark, and counting the occurrence times of the same first mark; when the content of each column of each log block contains similar information, reserving a second mark of each log column in each log block, and converting the second mark by adopting an incremental coding mode; when each column of content of each log block contains a third repetition mark, constructing a dictionary corresponding to the third repetition mark, and replacing the third repetition mark by an index corresponding to the third repetition mark in the dictionary; when each column of content of each log block contains a prefix character string, extracting the prefix character string, storing the content of the prefix character string in each column of content by adopting a fourth mark, and deleting the content of the prefix character string in each column of content.
Alternatively, for columns having the same marks, the same marks are extracted and the number of occurrences of these marks is calculated; for columns with similar marks, reserving a first mark in each column, and referring to the increment between the current mark and the previous mark, and replacing the mark of the longer text with a shorter representation form by adopting an increment coding mode; for a small number of repeated marked columns, constructing a dictionary of each identical mark, and replacing the marks by corresponding indexes in the dictionary; for columns with common prefix strings, the prefix strings are extracted and the rest of the column is stored by tagging, deleting the rest of the column that is redundant.
In this embodiment of the present application, in response to a data backup instruction, a physical file is backed up by using a pre-written log based on a preset target file format, so as to obtain a backup log file, where the target file format includes: backup abstract, backup information list and target recovery information; storing the backup log files into a backup list, wherein each backup node in the backup list contains backup information of the corresponding physical file, and the backup information comprises: the ID of the backup node, the identification of the backup node, a first log serial number corresponding to the backup node for starting backup and a second log serial number for ending backup; and responding to the data recovery instruction, and recovering the physical files according to the corresponding backup log files in the backup list based on the target file format. The pre-written log is used as backup data, so that additional I/O operation is reduced, backup efficiency is improved, and further the technical problem that the data backup recovery time is long because the data backup recovery process cannot be optimized in the related technology is solved.
Example 2
According to an embodiment of the present application, there is further provided a data backup and restore device for implementing the foregoing data backup and restore method, and fig. 3 is a schematic structural diagram of an alternative data backup and restore device according to an embodiment of the present application, where the data backup and restore device at least includes a backup module 31, a storage module 32 and a restore module 33, as shown in fig. 3, where:
the backup module 31 is configured to, in response to a data backup instruction, backup a physical file with a pre-written log based on a preset target file format, to obtain a backup log file, where the target file format includes: backup summary, backup information list and target recovery information.
Among these, a Write Ahead Log (WAL) is one of the basic components of a database system, which has a record of all data changes in the database. By pre-writing the log data of the log as backup data, no additional I/O operations need to be performed.
Optionally, the backup summary includes: the ID of the backup log file, the log serial number of the backup log file and the number of the effective backup log files; the backup information list includes: backup information of the pre-written log corresponding to each backup node in the backup list; the target recovery information includes: the state of the recovery log file, the recovery type of the recovery log file, the ID of the recovery log file, the third log serial number corresponding to the recovery log file and used for starting recovery, the fourth log serial number corresponding to the recovery log file and used for ending recovery, and the recovery start time and the recovery end time.
As an alternative embodiment, the backup module 31 responds to the data backup instruction to record the first log serial number of the pre-written log to the target file; determining a first backup ID and a first log serial number of an end backup node in a backup list; creating a target backup node corresponding to the physical file, and inserting the target backup node into the tail of the backup list, wherein the identification of the target backup node is a time stamp; recording the backup position of the pre-written log according to the log serial number of the backup start and the log serial number of the backup end corresponding to the target backup node until a second log serial number of the tail log data of the pre-written log is read, wherein the second log serial number is the backup end position of the first backup node and the backup start position of the next backup node; and inserting the target backup node into the value backup list to obtain a backup log file.
Further, the second log serial number of the pre-written log, which is used for finishing backup, is recorded to the target file, and a backup list and a backup abstract are updated; when all log data in the pre-written log are backed up, deleting the log data of the pre-written log in the backup log file.
In this embodiment, when the database system receives a data backup instruction, the database system first obtains the current position of the pre-written log at the current log position, and records the LSN value of the current log in the target file to represent that the backup process is started; secondly, loading the last backup node from the backup list, and acquiring the backup ID and LSN value of the backup node; then, the database system creates and distributes a new target backup node, wherein the unique identification of the target backup node is a time stamp, and the target backup node contains necessary target backup information; then, log data of the WAL is used as backup data, the position of each log data backup is recorded by using the starting log sequence value and the ending log sequence value until the LSN value of the last page of the WAL is read, wherein the LSN value can be used as the ending position of the current backup and the starting position of the next backup, and the target backup node is inserted into the end of the backup list to obtain the backup log file.
In addition, the system updates the target file according to the updated backup list and the backup abstract at the end of each backup, and deletes the log data of the redundant pre-written log after the complete backup.
As an alternative implementation, the backup module 31 may also use a full backup policy to backup all physical files under the data list to the backup list in response to the data backup instruction.
The storage module 32 is configured to store the backup log file into a backup list, where each backup node in the backup list includes backup information of a corresponding physical file, and the backup information includes: the ID of the backup node, the identification of the backup node, a first log serial number corresponding to the backup node for starting backup and a second log serial number for ending backup.
To facilitate rapid recovery of data files from backup data in the event of a disaster at the server, the storage module 32 may store backup log files in a backup list in time-stamped order.
The restoring module 33 is configured to restore the physical file according to the corresponding backup log file in the backup list based on the target file format in response to the data restoring instruction.
As an alternative embodiment, the recovery module 33 may record the third log sequence number and the recovery start time at which the pre-written log starts to be recovered to the target file in response to the data recovery instruction; determining a first amount of recovery log data within the backup log file, and when the first amount exceeds a second amount of log data that can be accommodated by the buffer pool, determining to divide the recovery log data into a plurality of recovery range blocks, wherein a third amount of recovery log data within each recovery range block does not exceed the second amount of log data that can be accommodated by the buffer pool; for each recovery range block, scanning recovery log data in the recovery range block, and copying all recovery log data in the recovery range block to a buffer pool; the recovery log data is copied to the database table space based on the multithreaded update buffer pool.
Further, the fourth log serial number and the recovery end time of the end recovery of the pre-written log are recorded to the target file.
In this embodiment, when receiving a data recovery instruction, the database system records recovery information that the pre-written log starts to recover in the target file first, so as to represent starting of a recovery flow, and can also ensure that data consistency is provided in the case of interruption of the recovery process; secondly, in order to prevent the single-thread recovery overload caused by overlarge log data in the buffer pool in the recovery process, the database system can estimate the size of the recovery log data, and if the size of the recovery log data exceeds the size of the buffer pool, the whole recovery log data is divided into a plurality of recovery range blocks which are not larger than the size of the buffer pool; then, for each recovery range block, scanning and reading the log data in the recovery range block, and copying the log data into a buffer pool until the log data in the current recovery range block is completely loaded, so that the recovery of the recovery range block is completed, and when the recovery range block needs to be described, the log data in the table space of the recovery data system can be skipped at the moment, so that the recovery data volume is reduced; finally, the database system enables the multithreaded flush buffer pool and flushes the recovery log data into the tablespace.
If there are other recovery range blocks, the recovery process is continued. If the log data of the backup log file is restored, the database system updates the restoration ending information into the backup/restoration file to indicate that the restoration process is completed; otherwise, the data consistency is ensured by the resumption of the database.
As an alternative embodiment, the restore module 33 may also employ a full backup restore policy to restore the physical files from the backup list to the data list in response to the data restore instruction.
In addition, in order to reduce the leakage of the WAL, a log filling module is further added in the embodiment of the application, so that each WAL written into the persistent storage is ensured to have the same size, and therefore an attacker cannot find out the secret by using the size of the WAL write, and the persistence of the time check point is ensured.
As another alternative embodiment, the log filling module may first determine the log length of the pre-written log, and divide the pre-written log into log segments of a target length; if the log length is smaller than the target length, determining a difference value between the log length and the target length, and filling log segments corresponding to the difference value by adopting a fourth number of all zero blocks.
Alternatively, the WAL write follows the write of the slot buffer in the server memory, then each slot write is divided into fixed-size segments, the remaining segments are filled with "0" if necessary, and then each segment is written to the log file on disk sequentially once. Wherein the padding always consists of one or more all zero blocks of 4 bytes, since the length of the log record and segment is a multiple of 4.
It should be noted that, the segment size of the segmentation is a key parameter, setting too small reduces space overhead caused by filling the written segment boundary, and setting too large segment size increases space overhead and reduces delay; in addition, considering that the delay of writing the WAL is increased due to the sequential writing of the segments, increasing the total number of segments, etc., in this embodiment of the present application, the default is 100B, which is only illustrated here as an example, and the size of the divided segments may be set according to the actual scenario, which is not particularly limited herein.
On the other hand, since the log lines in the small log blocks in the log file are generated in a short time, the same, similar, repeated, common prefix information is likely to be contained in the small log blocks, and thus the compression module is also included in the embodiment of the present application.
As another alternative embodiment, the compression module may first determine the column content of each log block in the pre-written log; when the contents of each column of each log block contain the same information, extracting the same first mark, and counting the occurrence times of the same first mark; when the content of each column of each log block contains similar information, reserving a second mark of each log column in each log block, and converting the second mark by adopting an incremental coding mode; when each column of content of each log block contains a third repetition mark, constructing a dictionary corresponding to the third repetition mark, and replacing the third repetition mark by an index corresponding to the third repetition mark in the dictionary; when each column of content of each log block contains a prefix character string, extracting the prefix character string, storing the content of the prefix character string in each column of content by adopting a fourth mark, and deleting the content of the prefix character string in each column of content.
Alternatively, the compression module extracts the same labels for columns with the same labels and calculates the number of occurrences of these labels; for columns with similar marks, reserving a first mark in each column, and referring to the increment between the current mark and the previous mark, and replacing the mark of the longer text with a shorter representation form by adopting an increment coding mode; for a small number of repeated marked columns, constructing a dictionary of each identical mark, and replacing the marks by corresponding indexes in the dictionary; for columns with common prefix strings, the prefix strings are extracted and the rest of the columns are stored by the tag, and the rest of the redundant columns are deleted, thereby effectively reducing the size of the backup data.
It should be noted that, each module in the data backup and recovery device in the embodiment of the present application corresponds to each implementation step of the data backup and recovery method in embodiment 1 one by one, and since detailed description has been already made in embodiment 1, details that are not shown in part in this embodiment may refer to embodiment 1, and will not be repeated here.
Example 3
According to an embodiment of the present application, there is further provided a nonvolatile storage medium including a stored program, where a device in which the nonvolatile storage medium is located executes the data backup and restore method in embodiment 1 by running the program.
Optionally, the device where the nonvolatile storage medium is located performs the following steps by running the program: and responding to a data backup instruction, and backing up physical files in a data list by adopting a pre-written log based on a preset target file format to obtain backup log files, wherein the target file format comprises the following steps: backup abstract, backup information list and target recovery information; storing the backup log files into a backup list, wherein each backup node in the backup list contains backup information of the corresponding physical file, and the backup information comprises: the ID of the backup node, the identification of the backup node, a first log serial number corresponding to the backup node for starting backup and a second log serial number for ending backup; responding to a data recovery instruction, and recovering physical files according to corresponding backup log files in a backup list based on a target file format; the physical file is stored in a data list.
According to an embodiment of the present application, there is further provided a processor, which is configured to execute a program, where the data backup and recovery method in embodiment 1 is executed when the program is executed.
Optionally, the program execution realizes the following steps: and responding to a data backup instruction, and backing up physical files in a data list by adopting a pre-written log based on a preset target file format to obtain backup log files, wherein the target file format comprises the following steps: backup abstract, backup information list and target recovery information; storing the backup log files into a backup list, wherein each backup node in the backup list contains backup information of the corresponding physical file, and the backup information comprises: the ID of the backup node, the identification of the backup node, a first log serial number corresponding to the backup node for starting backup and a second log serial number for ending backup; responding to a data recovery instruction, and recovering physical files according to corresponding backup log files in a backup list based on a target file format; the physical file is stored in a data list.
According to an embodiment of the present application, there is also provided an electronic device including: a memory and a processor, wherein the memory stores a computer program, the processor is configured to execute the data backup and restore method in embodiment 1 by the computer program.
Optionally, the processor is configured to implement the following steps by computer program execution: and responding to a data backup instruction, and backing up physical files in a data list by adopting a pre-written log based on a preset target file format to obtain backup log files, wherein the target file format comprises the following steps: backup abstract, backup information list and target recovery information; storing the backup log files into a backup list, wherein each backup node in the backup list contains backup information of the corresponding physical file, and the backup information comprises: the ID of the backup node, the identification of the backup node, a first log serial number corresponding to the backup node for starting backup and a second log serial number for ending backup; responding to a data recovery instruction, and recovering physical files according to corresponding backup log files in a backup list based on a target file format; the physical file is stored in a data list.
The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of units may be a logic function division, and there may be another division manner in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution, in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims (10)

1. A method of data backup and restore, comprising:
and responding to a data backup instruction, and backing up the physical file by adopting a pre-written log based on a preset target file format to obtain a backup log file, wherein the target file format comprises the following steps: backup abstract, backup information list and target recovery information;
storing the backup log file into a backup list, wherein each backup node in the backup list contains backup information of the corresponding physical file, and the backup information comprises: the ID of the backup node, the identification of the backup node, a first log serial number corresponding to the backup node for starting backup and a second log serial number for ending backup;
and responding to a data recovery instruction, and recovering the physical file according to the corresponding backup log file in the backup list based on the target file format.
2. The method according to claim 1, wherein the target file format includes:
the backup abstract comprises the following steps: the ID of the backup log file, the log serial number of the backup log file and the number of the effective backup log files;
The backup information list includes: backup information of the pre-written log corresponding to each backup node in the backup list;
the target recovery information includes: the method comprises the steps of recovering a state of a log file, recovering the log file, ID of the log file, a third log serial number and a fourth log serial number which are corresponding to the log file and start to recover, and recovering the log file, wherein the third log serial number and the fourth log serial number are corresponding to the log file and start to recover and end to recover.
3. The method of claim 1, wherein backing up the physical file with the pre-written log based on a preset target file format in response to the data backup command to obtain a backup log file, comprising:
responding to the data backup instruction, and recording a first log serial number of the pre-written log for starting backup to the target file;
determining a first backup ID and a first log sequence number of an end backup node in the backup list;
creating a target backup node corresponding to the physical file, and inserting the target backup node to the tail end of the backup list, wherein the identification of the target backup node is a timestamp;
Recording the backup position of the pre-written log according to the log sequence number of the backup start and the log sequence number of the backup end corresponding to the target backup node until a second log sequence number of the end log data of the pre-written log is read, wherein the second log sequence number is the backup end position of the first backup node and the backup start position of the next backup node;
and inserting the target backup node into the backup list to obtain the backup log file.
4. A method according to claim 3, wherein after obtaining the backup log file, the method comprises:
recording a second log serial number of the pre-written log, which ends backup, to the target file, and updating the backup list and the backup abstract;
and deleting the log data of the pre-written log in the backup log file when the backup of all the log data in the pre-written log is completed.
5. The method of claim 1, wherein in response to a data recovery instruction, based on the target file format, recovering the physical file from the corresponding backup log file in the backup list comprises:
Responding to the data recovery instruction, and recording a third log serial number and a recovery start time of the pre-written log to the target file;
determining a first amount of recovery log data within the backup log file, and when the first amount exceeds a second amount of log data that can be accommodated by a buffer pool, determining to divide the recovery log data into a plurality of recovery range blocks, wherein a third amount of recovery log data within each recovery range block does not exceed the second amount of log data that can be accommodated by the buffer pool;
for each recovery range block, scanning recovery log data in the recovery range block, and copying all recovery log data in the recovery range block to the buffer pool;
the recovery log data is copied to a database table space based on multithreading updating the buffer pool.
6. The method of claim 5, wherein after copying the recovery log data to a database table space, the method further comprises:
and recording the fourth log serial number and the recovery ending time of the ending recovery of the pre-written log to the target file.
7. The method according to claim 1, wherein the method further comprises:
Determining the log length of the pre-written log, and dividing the pre-written log into log segments with target length;
if the log length is smaller than the target length, determining a difference value between the log length and the target length, and filling the log segment corresponding to the difference value by adopting a fourth number of all-zero blocks.
8. The method according to claim 1, further comprising
Determining the contents of each column of each log block in the pre-written log;
if the contents of each column of each log block contain the same information, extracting the same first mark, and counting the occurrence times of the same first mark;
if the content of each column of each log block contains similar information, reserving a second mark of each log column in each log block, and converting the second mark by adopting an incremental coding mode;
if each column of content of each log block contains a third repetition mark, constructing a dictionary corresponding to the third repetition mark, and replacing the third repetition mark by an index corresponding to the third repetition mark in the dictionary;
and if the contents of each column of each log block contain prefix character strings, extracting the prefix character strings, storing the contents of each column except the prefix character strings by adopting a fourth mark, and deleting the contents of each column except the prefix character strings.
9. A data backup and restore apparatus, comprising:
the backup module is used for responding to a data backup instruction, backing up physical files by adopting a pre-written log based on a preset target file format to obtain backup log files, wherein the target file format comprises the following components: backup abstract, backup information list and target recovery information;
the storage module is configured to store the backup log file into a backup list, where each backup node in the backup list includes backup information of the corresponding physical file, and the backup information includes: the ID of the backup node, the identification of the backup node, a first log serial number corresponding to the backup node for starting backup and a second log serial number for ending backup;
and the recovery module is used for responding to a data recovery instruction, and recovering the physical file according to the backup log file corresponding to the backup list based on the target file format.
10. An electronic device, comprising: a memory and a processor, wherein the memory has stored therein a computer program, the processor being configured to perform the data backup and restore method of any of claims 1 to 8 by means of the computer program.
CN202211686669.1A 2022-12-27 2022-12-27 Data backup and recovery method and device Pending CN116185711A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211686669.1A CN116185711A (en) 2022-12-27 2022-12-27 Data backup and recovery method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211686669.1A CN116185711A (en) 2022-12-27 2022-12-27 Data backup and recovery method and device

Publications (1)

Publication Number Publication Date
CN116185711A true CN116185711A (en) 2023-05-30

Family

ID=86441361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211686669.1A Pending CN116185711A (en) 2022-12-27 2022-12-27 Data backup and recovery method and device

Country Status (1)

Country Link
CN (1) CN116185711A (en)

Similar Documents

Publication Publication Date Title
US8738588B2 (en) Sequential media reclamation and replication
US6311193B1 (en) Computer system
CN108319602B (en) Database management method and database system
US6665815B1 (en) Physical incremental backup using snapshots
US7421551B2 (en) Fast verification of computer backup data
US20070208918A1 (en) Method and apparatus for providing virtual machine backup
US9239761B2 (en) Storage system format for transaction safe file system
US8150851B2 (en) Data processing apparatus and method of processing data
US7266574B1 (en) Identification of updated files for incremental backup
US20060143241A1 (en) System and method for scaleable multiplexed transactional log recovery
CN108009098B (en) Storage tiering with compressed forward map
CN105302665B (en) A kind of improved Copy on write Snapshot Method and system
US8065557B2 (en) Apparatus for managing data backup
US7739464B1 (en) Consistent backups of data using a roll-back log
US10430294B2 (en) Image recovery from volume image files
CN110515543B (en) Object bucket-based snapshot method, device and system
JP2005050073A (en) Data restoration method, and data recorder
US20050262033A1 (en) Data recording apparatus, data recording method, program for implementing the method, and program recording medium
KR100501414B1 (en) Method of and apparatus for logging and restoring the meta data in file system
US10452496B2 (en) System and method for managing storage transaction requests
CN116185711A (en) Data backup and recovery method and device
CN111143116A (en) Method and device for processing bad blocks of disk
KR100775141B1 (en) An implementation method of FAT file system which the journaling is applied method
JPS62245348A (en) Method and device for updating data base
CN111221801A (en) Database migration method, system and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination