CN112269681A - Method, device and equipment for continuously protecting virtual machine data - Google Patents

Method, device and equipment for continuously protecting virtual machine data Download PDF

Info

Publication number
CN112269681A
CN112269681A CN202011112737.4A CN202011112737A CN112269681A CN 112269681 A CN112269681 A CN 112269681A CN 202011112737 A CN202011112737 A CN 202011112737A CN 112269681 A CN112269681 A CN 112269681A
Authority
CN
China
Prior art keywords
backup
data
log
incremental
recovery
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011112737.4A
Other languages
Chinese (zh)
Inventor
刘海伟
颜秉珩
刘为峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202011112737.4A priority Critical patent/CN112269681A/en
Publication of CN112269681A publication Critical patent/CN112269681A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1482Generic software techniques for error detection or fault masking by means of middleware or OS functionality
    • G06F11/1484Generic software techniques for error detection or fault masking by means of middleware or OS functionality involving virtual machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device and equipment for continuously protecting virtual machine data and a readable storage medium. The method and the device integrate three backup technologies of continuous data protection, full backup and incremental backup, protect recent data at a fine-grained IO level by the continuous data protection technology, and protect early and coarse-grained historical snapshot data by the full backup and the incremental backup to form a sparse-to-dense data protection system. Based on the backup mode, the recent data can be subjected to IO-level data recovery, zero loss of recent data assets is achieved, and RPO indexes are reduced; the early historical data can be subjected to data recovery at an incremental backup level or a full backup level, so that the occupation of storage space is reduced. And when CDP data recovery is carried out, only IO data in a limited range between backup points need to be recovered, so that the IO quantity of data recovery is reduced, and the RTO index is reduced.

Description

Method, device and equipment for continuously protecting virtual machine data
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for continuously protecting virtual machine data.
Background
With the continuous maturity of virtualization technology, more and more government departments and enterprise units adopt a cloud computing mode to deploy own virtual data centers. In the field of cloud computing, data protection of a virtual machine is particularly important.
Continuous Data Protection (CDP) is a method that can continuously capture or track any change of target Data without affecting the operation of main Data, and can recover to any previous time point. In 2011, the CDP technical group of SNIA (global network storage industry association) published a technical document for CDP, which explicitly points out three major standards for CDP: (1) any data change of the source data can be captured; (2) at least one other place can be backed up (disaster recovery); (3) it is possible to recover to any point in time.
Based on the three above-mentioned criteria defined by SNIA, the industry also defines two metrics to measure CDP data protection and data recovery, which are: a Recovery Time Object (RTO) and a Recovery Point Object (RPO). Wherein, the recovery time target refers to the maximum time length required from the disaster to the system recovery; the recovery point objective refers to the length of time that the data is most likely to be lost when a disaster occurs.
The continuous data protection technology can reduce the RPO to 0 theoretically based on the IO of each disk. However, when the disk IO is continuously saved for a long time, or the disk IOPS (Input/Output Operations Per Second) is too high, and the amount of IO data saved in the CDP is too large, there are two problems at this time: (1) IO data which are too long in time are stored, and too much storage space is occupied; (2) when the IO data stored for a long time is subjected to data recovery, the recovery time is too long, that is, the RTO is too large.
In order to solve the above problems, the prior art proposes a method for reading a time base line with a similar recovery point into a memory, and improving the data recovery efficiency by using the characteristic of a fast data processing speed in the memory. However, the basic starting point of this scheme is to trade memory space for data recovery efficiency (space-time). In practical application, a large memory space in a system needs to be consumed, and when the recovered IO data is large, the method is unavailable.
In summary, how to protect the data of the virtual machine, and overcome the disadvantages of the backup data that occupies too much space and the data recovery consumes a long time are urgent problems to be solved by those skilled in the art.
Disclosure of Invention
The application aims to provide a method, a device, equipment and a readable storage medium for continuously protecting virtual machine data, which are used for solving the problems that the current virtual machine data protection scheme has overlarge backup data occupation space or long data recovery time. The specific scheme is as follows:
in a first aspect, the present application provides a method for continuously protecting virtual machine data, including:
writing the IO data into a disk whenever the IO data of the virtual machine is detected; generating an IO log according to the IO data by adopting a continuous data protection technology, and storing the IO log to a first backup area;
performing incremental backup on the disk according to the incremental backup frequency, and storing incremental backup data to a second backup area;
performing full backup on the disk according to full backup frequency, and storing full backup data into a third backup area, wherein the full backup frequency is lower than the incremental backup frequency;
when the backup duration of the first backup area exceeds the retention duration of the IO log, clearing the IO log on the first backup area;
and when the backup duration of the second backup area exceeds the retention duration of incremental backup data, removing the incremental backup data on the second backup area, wherein the retention duration of the incremental backup data is greater than the retention duration of the IO log.
Preferably, the incremental backup frequency is once every N hours, and performing incremental backup on the disk according to the incremental backup frequency includes:
performing incremental backup on the disk every N hours every day, wherein N is a factor of 24;
correspondingly, the full backup frequency is once a day, and performing full backup on the disk according to the full backup frequency includes:
and performing full backup on the disk at a target time point every day.
Preferably, before the removing the IO log in the first backup area when the backup duration of the first backup area exceeds the retention duration of the IO log, the method further includes:
traversing the IO logs on the first backup area, and determining the earliest time stamp of all the IO logs; and calculating the difference between the current time and the earliest timestamp to serve as the backup time length of the first backup area.
Preferably, the method further comprises the following steps:
setting a backup policy using a policy management component, wherein the backup policy comprises: the method comprises the steps of incremental backup frequency, full backup frequency, retention time of IO logs, retention time of incremental backup data, an address of a first backup area, an address of a second backup area and an address of a third backup area.
Preferably, the generating an IO log according to the IO data and storing the IO log in a first backup area includes:
and generating an IO log according to the IO data by using an IO filter, and storing the IO log to a first backup area, wherein the IO log comprises SuperBlock and IO metadata.
Preferably, the method further comprises the following steps:
determining a recovery time point according to the data recovery instruction;
if the difference between the current time and the recovery time point is less than or equal to the retention time of the IO log, performing IO-level data recovery according to the IO log on the first backup area;
if the difference between the current time and the recovery time point is larger than the retention time of the IO log and smaller than or equal to the retention time of the incremental backup data, performing data recovery according to the incremental backup data on the second backup area;
and if the difference between the current time and the recovery time point is greater than the retention time of the incremental backup data, performing data recovery according to the full backup data on the third backup area.
Preferably, the performing of the data recovery at the IO level according to the IO log on the first backup area includes:
determining an incremental backup process closest to the recovery time point, determining the actual backup time of the incremental backup process, and acquiring corresponding incremental backup data;
obtaining an IO log generated between the actual backup time and a recovery time point;
and according to the incremental backup data and the IO log, performing IO-level data recovery.
In a second aspect, the present application provides an apparatus for continuously protecting virtual machine data, including:
an IO backup module: the method comprises the steps of writing IO data into a disk when the IO data of a virtual machine is detected; generating an IO log according to the IO data by adopting a continuous data protection technology, and storing the IO log to a first backup area;
an incremental backup module: the incremental backup device is used for performing incremental backup on the disk according to the incremental backup frequency and storing incremental backup data to a second backup area;
a full backup module: the full backup is carried out on the disk according to the full backup frequency, and full backup data are stored in a third backup area, wherein the full backup frequency is lower than the incremental backup frequency;
an IO clear module: the IO log clearing module is used for clearing the IO log on the first backup area when the backup duration of the first backup area exceeds the retention duration of the IO log;
an increment removal module: and the incremental backup data in the second backup area is cleared when the backup duration of the second backup area exceeds the retention duration of the incremental backup data, wherein the retention duration of the incremental backup data is greater than the retention duration of the IO log.
In a third aspect, the present application provides an apparatus for continuously protecting virtual machine data, including:
a memory: for storing a computer program;
a processor: for executing the computer program to implement the method for continuously protecting virtual machine data as described above.
In a fourth aspect, the present application provides a readable storage medium having stored thereon a computer program for implementing the method of continuously protecting virtual machine data as described above when executed by a processor.
The application provides a method for continuously protecting virtual machine data, which comprises the following steps: writing the IO data into a disk whenever the IO data of the virtual machine is detected; generating an IO log according to the IO data by adopting a continuous data protection technology, and storing the IO log to a first backup area; performing incremental backup on the disk according to the incremental backup frequency, and storing incremental backup data to a second backup area; performing full backup on the disk according to the full backup frequency, and storing full backup data into a third backup area, wherein the full backup frequency is lower than the incremental backup frequency; when the backup duration of the first backup area exceeds the retention duration of the IO log, clearing the IO log on the first backup area; and when the backup time length of the second backup area exceeds the retention time length of the incremental backup data, removing the incremental backup data on the second backup area, wherein the retention time length of the incremental backup data is greater than that of the IO log.
Therefore, the method integrates three backup technologies of continuous data protection, full backup and incremental backup, the continuous data protection technology protects the data of fine-grained IO level in the near future (defining the near-term time range according to the backup strategy), and the full backup and the incremental backup protect the historical snapshot data of coarse granularity in the early stage (defining the early-stage time range according to the backup strategy), so as to form a sparse-to-dense data protection system. Based on the backup mode, the recent data can be subjected to IO-level data recovery, zero loss of recent data assets is achieved, and RPO indexes are reduced; the early historical data can be subjected to data recovery at an incremental backup level or a full backup level, so that the occupation of storage space is reduced. And when CDP data recovery is carried out, only IO data in a limited range between backup points need to be recovered, so that the IO quantity of data recovery is reduced, and the RTO index is reduced.
In addition, the application also provides a device, equipment and a readable storage medium for continuously protecting the virtual machine data, and the technical effect of the device, the equipment and the readable storage medium corresponds to the technical effect of the method, and the details are not repeated here.
Drawings
For a clearer explanation of the embodiments or technical solutions of the prior art of the present application, the drawings needed for the description of the embodiments or prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart illustrating a first implementation of a method for continuously protecting virtual machine data according to an embodiment of the present disclosure;
fig. 2 is a schematic process diagram of a second embodiment of a method for continuously protecting virtual machine data according to the present application;
FIG. 3 is a diagram illustrating a complete data protection cycle for continuously protecting virtual machine data;
FIG. 4 is a schematic diagram of the structure of an IO log;
FIG. 5 is a schematic flow chart of IO data protection performed by CDP
FIG. 6 is a comparison of RTO metrics between a data protection scheme using full/delta/CDP and a conventional CDP scheme;
fig. 7 is a functional block diagram of an embodiment of an apparatus for continuously protecting virtual machine data according to the present application.
Detailed Description
The core of the application is to provide a method, a device, equipment and a readable storage medium for continuously protecting virtual machine data, and combine three backup technologies of continuous data protection, full backup and incremental backup to form a sparse-to-dense data protection system, so that RPO indexes and RTO indexes are effectively reduced.
In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, a first embodiment of a method for continuously protecting virtual machine data provided by the present application is described below, where the first embodiment includes:
s101, writing IO data into a disk whenever the IO data of the virtual machine is detected; generating an IO log according to the IO data by adopting a continuous data protection technology, and storing the IO log to a first backup area;
s102, performing incremental backup on the disk according to the incremental backup frequency, and storing incremental backup data to a second backup area;
s103, carrying out full backup on the disk according to the full backup frequency, and storing full backup data into a third backup area, wherein the full backup frequency is lower than the incremental backup frequency;
s104, when the backup duration of the first backup area exceeds the retention duration of the IO log, clearing the IO log on the first backup area;
and S105, when the backup time length of the second backup area exceeds the retention time length of the incremental backup data, removing the incremental backup data on the second backup area, wherein the retention time length of the incremental backup data is greater than the retention time length of the IO log.
The embodiment relates to the technical problem of data security in the field of cloud computing, in particular to continuous data protection of virtual machine data by using a continuous data protection technology in a virtualized environment QEMU (Quick Emulator).
Specifically, before executing backup, a backup policy is set, specifically including an incremental backup frequency, a full backup frequency, a retention time of an IO log, and a retention time of incremental backup data. The incremental backup frequency is higher than the full backup frequency, for example, the full backup frequency may be once a day, and the full backup is performed at a fixed time point every day; the incremental backup frequency may be N hours once, and for ease of calculation, N may be a factor of 24.
The backup data and the IO log are not usually stored in a production storage pool of the virtual machine, but stored in a backup storage pool having backup and disaster recovery functions, so as to facilitate data recovery. Therefore, the backup policy may further include an address of the first backup area, an address of the second backup area, and an address of the third backup area.
The backup duration of the first backup area refers to: the difference between the current time and the earliest time to write the IO log on the first backup area. Specifically, the S104 may include: traversing IO logs on the first backup area, and determining the earliest time stamp of all the IO logs; and calculating the difference between the current time and the final time stamp to be used as the backup time length of the first backup area.
Similarly, the backup duration of the second backup area refers to: the difference between the current time and the time at which the incremental backup data was written earliest on the second backup area.
Based on the backup mode, the data recovery process comprises the following steps:
s201, determining a recovery time point according to a data recovery instruction;
s202, if the difference between the current time and the recovery time point is less than or equal to the retention time of the IO log, performing IO-level data recovery according to the IO log on the first backup area;
specifically, a full backup process and an incremental backup process closest to the recovery time point are determined, and corresponding full backup data and incremental backup data are acquired. And determining the backup time of the incremental backup process closest to the recovery time point, and acquiring the IO log between the backup time and the recovery time point. And finally, according to the obtained full backup data, the incremental backup data and the IO log, the data recovery of the IO level is realized.
S203, if the difference between the current time and the recovery time point is greater than the retention time of the IO log and less than or equal to the retention time of the incremental backup data, performing data recovery according to the incremental backup data on the second backup area;
specifically, a full backup process and an incremental backup process closest to the recovery time point are determined, corresponding full backup data and incremental backup data are obtained, and data recovery at the incremental backup level is achieved according to the full backup process and the incremental backup process.
And S204, if the difference between the current time and the recovery time point is greater than the retention time of the incremental backup data, performing data recovery according to the full backup data on the third backup area.
Specifically, a full backup process closest to the recovery time point is determined, corresponding full backup data is acquired, and accordingly, data recovery at the full backup level is achieved.
The embodiment provides a method for continuously protecting virtual machine data, and provides a data protection mode integrating three backup technologies (CDP, incremental backup, full backup). Compared with a common CDP protection scheme, the embodiment can perform IO protection between two backup points based on a full backup point or an incremental backup point, reduce data recovery time and reduce RTO indexes. Compared with a common backup scheme, the embodiment can realize the protection granularity of the IO level between two backup points by adopting a CDP technology, provide the data recovery of the IO level, and achieve zero data loss, namely, the RPO index approaches to 0.
The following begins to describe in detail an embodiment of a method for continuously protecting virtual machine data provided by the present application.
As shown in fig. 2, the second embodiment is implemented based on four major components, which are respectively: the system comprises a policy management component, a data protection component, a data recovery component and a data cleaning component, wherein each component is introduced firstly.
First, the policy management component
The component mainly completes management of virtual machine protection strategies, and the strategies mainly comprise:
(1) the incremental backup frequency backup _ frequency, may be configured hourly (1< backup _ frequency < 24).
(2) The full backup frequency is set here to once a day, specifically once a day in the early morning.
For example, the incremental backup frequency is configured to be once in 2 hours, the full backup is performed at 0 point every morning, and then the incremental backup is performed every 2 hours on the day, so that 1 full backup point and 11 incremental backup points are formed.
(3) The retention duration IO _ barrier of the IO log, the IO log recorded in this time range (the IO log refers to IO data in a virtual machine captured by a CDP continuous IO protection module in the data protection component) is always stored, and may be used for IO-level data recovery, and may be configured according to the day, where 1 ═ IO _ barrier ═ 30. Beyond this time range, the IO log may be cleared by the data clearing component for saving storage space.
As shown in fig. 3, assuming that the retention time for configuring the IO log is 7 days, the IO log of the last 7 days is retained all the time, and data recovery at the IO level can be performed within the last 7 days. The IO logs in more than 7 days are cleaned, and the data recovery of the IO level cannot be carried out.
(4) The retention time back _ barrier of the incremental backup data: within this time range, the recorded incremental backup points (i.e., incremental backup data) are always reserved and are configured by default according to the day, usually backup _ barrier > IO _ barrier. The method can be used for restoring the fine-grained backup points, and the incremental backup points can be cleared by the data clearing component to save the storage space when the time range is exceeded.
As shown in FIG. 3, assuming that the configured incremental backup retention period is 14 days, the data in 7-14 can be restored on an hourly basis. Beyond 14 days, only day-level data recovery is possible.
(5) Storage location backup _ store of backup point and IO log: when the production storage pool where the original virtual machine is located is abnormal, the data of the virtual machine can be recovered based on the backup point and the IO log.
For example, the virtual machine production storage pool is a production _ store _ pool, the storage location of the configured backup and IO log is a back _ store _ pool, and when the production _ store _ pool is caused by an unexpected situation such as a hardware failure, data recovery (with a certain disaster recovery function) can be performed on the original virtual machine through the backup and the log in the back _ store _ pool, so that service switching of the original virtual machine is rapidly completed.
Second, data protection component
The component completes data protection of the virtual machine and is divided into incremental/full backup and CDP continuous IO protection according to the properties:
(1) full/incremental backup: the backup points are discrete backup points, full backup is fixedly performed at 0 a.m. each day, and incremental backup is triggered by a backup frequency defined in the policy management component. The backup is based on a drive-backup mechanism in the QEMU, IO write operation can be performed in the virtual machine while backup is performed, and service continuity in the virtual machine is guaranteed.
(2) CDP continuous IO protection: the IO protection is continuous, linear. When the IO is written in the virtual machine, the IO filter in the QEMU backs up the IO to a predefined storage position to form an IO log. The IO log is stored according to a certain format, so that the IO recovery of the data recovery part can be ensured, and the format of the IO log is described in detail below and is not expanded here.
Third, data recovery part
The component completes the data recovery at the user-specified point in time. When the virtual machine fails, for example, the virtual machine suffers from a lasso virus, data in the virtual machine is deleted by mistake, and an irreparable failure occurs in an original production storage pool of the virtual machine, data recovery can be performed according to a time point specified by a user, which includes the following two cases:
(1) data recovery at IO level: when the time point appointed by the user is within the retention duration range of the IO log defined by the policy management component, the data recovery of the IO level can be carried out, and the zero loss of the data assets is achieved;
(2) data recovery between backup points: when the time point specified by the user exceeds the retention time of the IO log defined by the policy management component and is smaller than the retention time of the incremental backup, the data recovery at the hour level can be carried out; when the time point appointed by the user exceeds the retention time of the incremental backup defined by the strategy management component, day-level data recovery can be carried out;
fourthly, data cleaning component
The component mainly completes cleaning of expired (expired means exceeding reserved time) data, saves storage space, and comprises the following two conditions:
(1) when the current time exceeds the retention time of the IO log defined by the policy management component, the data cleaning component deletes the backup defined by the policy management component and the IO log in the storage of the IO log;
(2) and the current time exceeds the retention time of the incremental backup defined by the strategy management component, and the data cleaning component deletes the backup defined by the strategy management component and the incremental backup in the storage of the IO log.
The IO log can store and recover IO data, and in order to achieve the storage and recovery of data and the consistency of data, a separate data format needs to be designed for processing, and the IO log format is briefly described here.
As shown in fig. 4, the IO log is mainly composed of SuperBlock and IO metadata:
(1) SuperBlock represents the total information of the IO log, such as the size of the IO log, the total IO number, the first IO information (tail block IO block position and tail block IO timestamp), the last IO information (head block IO block position and head block IO timestamp), and the size of the SuperBlock is 512 bytes;
(2) the IO metadata is composed of a DESCRIPTOR BLOCK BLOCK, an original IO info BLOCK, and a COMMIT BLOCK BLOCK in sequence. The DESCRIPTOR BLOCK is description BLOCK information (or understood as a header of the IO metadata) of each IO metadata, and the size thereof is fixed to 512 bytes. The COMMIT BLOCK BLOCK is COMMIT BLOCK information (or understood as the tail of the IO metadata) for each IO metadata, and is fixed to a size of 512 bytes. Each DESCRIPTOR BLOCK information corresponds to a unique COMMIT BLOCK. The original IO information BLOCKs (DATA between the DESCRIPTOR BLOCK and the COMMIT BLOCK in fig. 4) are IO information in the source disk of the slave virtual machine, and are surrounded by the DESCRIPTOR BLOCK as a header and the COMMIT BLOCK as a trailer.
Based on the above, the implementation of the second embodiment will be described.
Firstly, a data protection strategy is established by utilizing a strategy management component
(1) Configuring an incremental backup frequency backup _ frequency, assuming that the backup _ frequency is 2 hours;
(2) configuring a retention time IO _ barrier of the IO log, and assuming that the IO _ barrier is 7 days;
(3) configuring a retention time back _ barrier of the incremental backup, and assuming that the back _ barrier is 14 days;
(4) and configuring a backup and storage position backup _ store of the IO log, and assuming that the backup _ store is backstore _ pool.
It can be seen that, in this embodiment, a data protection mode in which full backup, incremental backup, and CDP are integrated is provided, and three time phases of data protection are defined based on a data protection policy, so as to form a complete data protection period of a virtual machine, as shown in fig. 3.
Secondly, the data protection component carries out virtual machine data protection
Step 1, full backup: detecting whether the current time is 0 point in the morning, and if not, entering the step 2; if so, performing full backup drive-backup, and when performing full backup, simultaneously regenerating the IO log so that the IO log is consistent with the backup point.
The incremental backup parameters are as follows:
Figure BDA0002729123830000121
step 2, incremental backup: it is detected whether the current Time is the configured incremental backup Time, i.e., whether the current Time is an even integer of 2,4,6,8,10,12,14,16,18,20, 22. If not, continuing to step 3; if yes, incremental backup is carried out, and meanwhile, the IO log is regenerated.
The incremental backup parameters are as follows:
Figure BDA0002729123830000131
step 3, continuous IO protection: when IO is written in the virtual machine, the IO data is firstly stored in an IO log, such as IO _ log _ inc _1.raw in the parameters, and then is normally written into a disk of the virtual machine; this step is completed by the IO Filter in QEMU, and the process is shown in fig. 5:
the method comprises the following steps: capturing IO when writing in a source disk;
step two: adding the new IO into the IO log image;
step three: and recording the new IO into a source disk of the virtual machine.
Thirdly, the data recovery part carries out data recovery
Specifying a time point T of data recovery, the data recovery section performing data recovery at the time point T, as follows:
(1) data recovery at IO level: when Current _ Time-T < IO _ barrier, the IO log recorded by the data protection component is not cleared, so that IO-level data recovery can be performed, and the recovery steps are as follows:
step 1, firstly, according to a Time point T, finding out which backup point T corresponds to, namely finding out which backup point backup _ Y.img the IO log is based on, the IO log IO _ log _ inc _ Y.raw and the corresponding Time _ Y;
step 2, finding out corresponding IO _ log _ inc _ Y.raw from the back _ store _ pool storage pool, and taking out IO data in the range of (T-Time _ Y);
step 3, based on backup _ Y.img in the back _ store _ pool storage pool, writing back the taken-out IO data to a backup point one by one to obtain a backup _ IO _ time.img data disk;
and 4, based on the backup _ io _ time. img data disk, namely, the recovery data corresponding to the time point T.
(2) Backup point level data recovery: when Current _ Time-T > IO _ barrier, the IO log recorded by the data protection component has already been cleared, and only data recovery at the backup point level can be performed, and the recovery steps are as follows:
step 1, firstly, according to a Time point T, finding out which backup point T corresponds to, namely finding out which backup point backup _ Y.img and Time _ Y the IO log is based on;
step 2, copying backup _ Y.img from the backup _ store _ pool storage pool to be backup _ time.img, and preventing the backup point from being covered;
and 3, using the backup _ time.img in the backup _ store _ pool storage pool as a data disk, namely recovering the data at the time point.
Therefore, in the time phase closest to the current time, the embodiment can provide data recovery at the IO level; at a time stage centered from the current time, the present embodiment may provide data recovery at an hour level; this embodiment may provide day-level data recovery at the time phase farthest from the current time. In general, different granularities, different levels of data recovery capability are provided at different time ranges.
Fourthly, the data cleaning component cleans up the overdue data
The method mainly comprises the following steps of clearing an overdue IO log and clearing an overdue backup point:
(1) cleaning the data of the overdue IO log, comprising the following steps:
step 1, traversing IO log data in the backup storage pool back _ store _ pool;
step 2, finding out the latest Timestamp IO _ Timestamp of the IO log;
step 3, when Current _ Time-IO _ Timestamp > IO _ barrier at the Current Time point, deleting the IO log data;
and 4, traversing the next IO log until all IO logs are traversed.
(2) Cleaning up data of expired backup points, comprising the following steps:
step 1, traversing all incremental backup points in a backup storage pool back _ store _ pool;
step 2, finding out the Timestamp backup _ Timestamp of the incremental backup point;
step 3, deleting the incremental backup point when the Current Time point Current _ Time-IO _ Timestamp > backup _ barrier;
and 4, traversing the next incremental backup point until all the incremental backup points are traversed.
Therefore, the embodiment provides a data cleaning method, which is used for cleaning out the expired data defined by the data protection policy, and avoiding the waste of storage space in the user environment.
In summary, the method for continuously protecting data of a virtual machine provided in this embodiment forms a complete virtual machine data protection period by way of fusion of full backup/incremental backup/CDP. According to the method, a plurality of base points of data protection are formed by full backup/incremental backup, and when IO data are written in a virtual machine between adjacent base points of data protection, the IO data are recorded as IO logs, so that a linear protection range is formed.
In addition, the embodiment defines three protection phases according to the data protection policy, and can provide IO-level data recovery at the time phase closest to the current time; at a time stage centered from the current time, data recovery at an hour level may be provided; day-level data recovery may be provided during the time period furthest from the current time. The data recovery capability with different granularities and different levels is provided according to the time range, so that the requirements of data protection in the actual production environment are met (the closer the time range to the current time is, the closer the time point is expected to be recoverable), and the farther the time range is from the current time, the more sparse the time point is expected to be recoverable), and the storage space in the environment is saved.
It should be noted that this embodiment is not a combination of simple backup and CDP schemes. In one aspect, the CDP protection phase relies on the backup point of the phase, and when a new backup is generated, the CDP protection phase implements IO data level protection based on the backup. On the other hand, the backup point defines the IO protection range of two different stages, the backup provides a protection base point through the fusion of full quantity/increment/CDP, and the CDP provides the IO log between the protection base points. When a data protection base point is generated, the IO log must be regenerated to form segmented IO data. If the former section of IO log is unavailable, the integrity of the next section of IO log cannot be influenced, and the availability of data protection in the system is improved.
Specifically, the difference between the present embodiment and the ordinary CDP-based IO persistent data protection scheme (hereinafter, referred to as the ordinary scheme) is shown in fig. 6. In the general scheme, the IO data on the whole time axis are protected based on the CDP data, as shown by the time axis above fig. 6; in this embodiment, full backup/incremental backup/CDP is integrated, disk data is periodically saved through full backup and incremental backup, and IO protection is performed between backup points based on the CDP technology, as shown by a time axis below fig. 6.
Therefore, when the recovery time point designated by the user is T1, the ordinary CDP scheme needs to recover IO data between Base (CDP service activation time) and T1, and the embodiment only needs to recover IO data between backup point 1 (one incremental backup point/full backup point located before T1 and closest to T1) and T1; when the user specifies a restore time point to be T2, the normal CDP scheme requires restoring IO data between Base to T2, and this embodiment requires restoring IO data between backup point 2 (primary incremental backup point/full backup point located before T2 and closest to T2) to T2.
In the following, a device for continuously protecting data of a virtual machine according to an embodiment of the present invention is introduced, and a device for continuously protecting data of a virtual machine described below and a method for continuously protecting data of a virtual machine described above may be referred to correspondingly.
As shown in fig. 7, the apparatus for continuously protecting virtual machine data of this embodiment includes:
the IO backup module 701: the method comprises the steps of writing IO data into a disk when the IO data of a virtual machine is detected; generating an IO log according to the IO data by adopting a continuous data protection technology, and storing the IO log to a first backup area;
the incremental backup module 702: the incremental backup device is used for performing incremental backup on the disk according to the incremental backup frequency and storing incremental backup data to a second backup area;
full backup module 703: the full backup is carried out on the disk according to the full backup frequency, and full backup data are stored in a third backup area, wherein the full backup frequency is lower than the incremental backup frequency;
IO clear module 704: the IO log clearing module is used for clearing the IO log on the first backup area when the backup duration of the first backup area exceeds the retention duration of the IO log;
the incremental clear module 705: and the incremental backup data in the second backup area is cleared when the backup duration of the second backup area exceeds the retention duration of the incremental backup data, wherein the retention duration of the incremental backup data is greater than the retention duration of the IO log.
Therefore, a specific implementation manner of the apparatus in this embodiment may be seen in the above-mentioned part of the method for continuously protecting virtual machine data, for example, the IO backup module 701, the incremental backup module 702, the full backup module 703, the IO purging module 704, and the incremental purging module 705, which are respectively used for implementing steps S101, S102, S103, S104, and S105 in the above-mentioned method for continuously protecting virtual machine data. Therefore, specific embodiments thereof may be referred to in the description of the corresponding respective partial embodiments, and will not be described herein.
In addition, since the apparatus for continuously protecting virtual machine data of this embodiment is used to implement the foregoing method for continuously protecting virtual machine data, the role of the apparatus corresponds to that of the foregoing method, and details are not described here.
In addition, the present application further provides an apparatus for continuously protecting virtual machine data, including:
a memory: for storing a computer program;
a processor: for executing the computer program to implement the method for continuously protecting virtual machine data as described above.
Finally, the present application provides a readable storage medium having stored thereon a computer program for implementing the method of continuously protecting virtual machine data as described above when executed by a processor.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above detailed descriptions of the solutions provided in the present application, and the specific examples applied herein are set forth to explain the principles and implementations of the present application, and the above descriptions of the examples are only used to help understand the method and its core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method for continuously protecting virtual machine data, comprising:
writing the IO data into a disk whenever the IO data of the virtual machine is detected; generating an IO log according to the IO data by adopting a continuous data protection technology, and storing the IO log to a first backup area;
performing incremental backup on the disk according to the incremental backup frequency, and storing incremental backup data to a second backup area;
performing full backup on the disk according to full backup frequency, and storing full backup data into a third backup area, wherein the full backup frequency is lower than the incremental backup frequency;
when the backup duration of the first backup area exceeds the retention duration of the IO log, clearing the IO log on the first backup area;
and when the backup duration of the second backup area exceeds the retention duration of incremental backup data, removing the incremental backup data on the second backup area, wherein the retention duration of the incremental backup data is greater than the retention duration of the IO log.
2. The method of claim 1, wherein the incremental backup frequency is once every N hours, and the incrementally backing up the disk according to the incremental backup frequency comprises:
performing incremental backup on the disk every N hours every day, wherein N is a factor of 24;
correspondingly, the full backup frequency is once a day, and performing full backup on the disk according to the full backup frequency includes:
and performing full backup on the disk at a target time point every day.
3. The method of claim 1, wherein before clearing the IO log on the first backup area when the backup duration of the first backup area exceeds the retention duration of the IO log, further comprising:
traversing the IO logs on the first backup area, and determining the earliest time stamp of all the IO logs; and calculating the difference between the current time and the earliest timestamp to serve as the backup time length of the first backup area.
4. The method of claim 1, further comprising:
setting a backup policy using a policy management component, wherein the backup policy comprises: the method comprises the steps of incremental backup frequency, full backup frequency, retention time of IO logs, retention time of incremental backup data, an address of a first backup area, an address of a second backup area and an address of a third backup area.
5. The method of claim 1, wherein the generating an IO log according to the IO data and storing the IO log in a first backup area comprises:
and generating an IO log according to the IO data by using an IO filter, and storing the IO log to a first backup area, wherein the IO log comprises SuperBlock and IO metadata.
6. The method of any one of claims 1-5, further comprising:
determining a recovery time point according to the data recovery instruction;
if the difference between the current time and the recovery time point is less than or equal to the retention time of the IO log, performing IO-level data recovery according to the IO log on the first backup area;
if the difference between the current time and the recovery time point is larger than the retention time of the IO log and smaller than or equal to the retention time of the incremental backup data, performing data recovery according to the incremental backup data on the second backup area;
and if the difference between the current time and the recovery time point is greater than the retention time of the incremental backup data, performing data recovery according to the full backup data on the third backup area.
7. The method of claim 6, wherein the performing IO level data recovery from the IO log on the first backup region comprises:
determining an incremental backup process closest to the recovery time point, determining the actual backup time of the incremental backup process, and acquiring corresponding incremental backup data;
obtaining an IO log generated between the actual backup time and a recovery time point;
and according to the incremental backup data and the IO log, performing IO-level data recovery.
8. An apparatus for continuously protecting virtual machine data, comprising:
an IO backup module: the method comprises the steps of writing IO data into a disk when the IO data of a virtual machine is detected; generating an IO log according to the IO data by adopting a continuous data protection technology, and storing the IO log to a first backup area;
an incremental backup module: the incremental backup device is used for performing incremental backup on the disk according to the incremental backup frequency and storing incremental backup data to a second backup area;
a full backup module: the full backup is carried out on the disk according to the full backup frequency, and full backup data are stored in a third backup area, wherein the full backup frequency is lower than the incremental backup frequency;
an IO clear module: the IO log clearing module is used for clearing the IO log on the first backup area when the backup duration of the first backup area exceeds the retention duration of the IO log;
an increment removal module: and the incremental backup data in the second backup area is cleared when the backup duration of the second backup area exceeds the retention duration of the incremental backup data, wherein the retention duration of the incremental backup data is greater than the retention duration of the IO log.
9. An apparatus for continuously protecting data of a virtual machine, comprising:
a memory: for storing a computer program;
a processor: for executing said computer program for implementing a method for continuously protecting virtual machine data according to any of claims 1 to 7.
10. A readable storage medium, having stored thereon a computer program for implementing a method of continuously protecting virtual machine data according to any one of claims 1 to 7 when executed by a processor.
CN202011112737.4A 2020-10-16 2020-10-16 Method, device and equipment for continuously protecting virtual machine data Pending CN112269681A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011112737.4A CN112269681A (en) 2020-10-16 2020-10-16 Method, device and equipment for continuously protecting virtual machine data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011112737.4A CN112269681A (en) 2020-10-16 2020-10-16 Method, device and equipment for continuously protecting virtual machine data

Publications (1)

Publication Number Publication Date
CN112269681A true CN112269681A (en) 2021-01-26

Family

ID=74338256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011112737.4A Pending CN112269681A (en) 2020-10-16 2020-10-16 Method, device and equipment for continuously protecting virtual machine data

Country Status (1)

Country Link
CN (1) CN112269681A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360322A (en) * 2021-06-25 2021-09-07 上海上讯信息技术股份有限公司 Method and equipment for recovering data based on backup system
CN113886143A (en) * 2021-10-19 2022-01-04 深圳市木浪云科技有限公司 Virtual machine continuous data protection method and device and data recovery method and device
CN114546276A (en) * 2022-02-23 2022-05-27 华云数据控股集团有限公司 High-availability data storage read-write method, system, device and equipment
CN114579368A (en) * 2022-05-07 2022-06-03 武汉四通信息服务有限公司 Backup management method for continuous data protection, computer equipment and storage medium
CN116225789A (en) * 2023-05-09 2023-06-06 深圳华锐分布式技术股份有限公司 Transaction system backup capability detection method, device, equipment and medium
CN118260815A (en) * 2024-05-31 2024-06-28 济南浪潮数据技术有限公司 Encryption disk backup method and device, electronic equipment, storage medium and product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102331955A (en) * 2011-09-14 2012-01-25 天津火星科技有限公司 Multiple time granularity data backup method
CN103365745A (en) * 2013-06-07 2013-10-23 上海爱数软件有限公司 Block level backup method based on content-addressed storage and system
CN106354582A (en) * 2016-08-18 2017-01-25 无锡华云数据技术服务有限公司 Continuous data protection method
CN110597661A (en) * 2019-09-11 2019-12-20 苏州浪潮智能科技有限公司 Virtual machine backup method and device
CN110825559A (en) * 2018-08-10 2020-02-21 华为技术有限公司 Data processing method and equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102331955A (en) * 2011-09-14 2012-01-25 天津火星科技有限公司 Multiple time granularity data backup method
CN103365745A (en) * 2013-06-07 2013-10-23 上海爱数软件有限公司 Block level backup method based on content-addressed storage and system
CN106354582A (en) * 2016-08-18 2017-01-25 无锡华云数据技术服务有限公司 Continuous data protection method
CN110825559A (en) * 2018-08-10 2020-02-21 华为技术有限公司 Data processing method and equipment
CN110597661A (en) * 2019-09-11 2019-12-20 苏州浪潮智能科技有限公司 Virtual machine backup method and device

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360322A (en) * 2021-06-25 2021-09-07 上海上讯信息技术股份有限公司 Method and equipment for recovering data based on backup system
CN113886143A (en) * 2021-10-19 2022-01-04 深圳市木浪云科技有限公司 Virtual machine continuous data protection method and device and data recovery method and device
CN113886143B (en) * 2021-10-19 2022-09-13 深圳市木浪云科技有限公司 Virtual machine continuous data protection method and device and data recovery method and device
CN114546276A (en) * 2022-02-23 2022-05-27 华云数据控股集团有限公司 High-availability data storage read-write method, system, device and equipment
CN114546276B (en) * 2022-02-23 2024-04-30 华云数据控股集团有限公司 High-availability data storage read-write method, system, device and equipment
CN114579368A (en) * 2022-05-07 2022-06-03 武汉四通信息服务有限公司 Backup management method for continuous data protection, computer equipment and storage medium
CN114579368B (en) * 2022-05-07 2022-08-02 武汉四通信息服务有限公司 Backup management method for continuous data protection, computer equipment and storage medium
CN116225789A (en) * 2023-05-09 2023-06-06 深圳华锐分布式技术股份有限公司 Transaction system backup capability detection method, device, equipment and medium
CN116225789B (en) * 2023-05-09 2023-08-11 深圳华锐分布式技术股份有限公司 Transaction system backup capability detection method, device, equipment and medium
CN118260815A (en) * 2024-05-31 2024-06-28 济南浪潮数据技术有限公司 Encryption disk backup method and device, electronic equipment, storage medium and product

Similar Documents

Publication Publication Date Title
CN112269681A (en) Method, device and equipment for continuously protecting virtual machine data
US7325159B2 (en) Method and system for data recovery in a continuous data protection system
US7426617B2 (en) Method and system for synchronizing volumes in a continuous data protection system
US7406488B2 (en) Method and system for maintaining data in a continuous data protection system
US7720817B2 (en) Method and system for browsing objects on a protected volume in a continuous data protection system
US7315965B2 (en) Method and system for storing data using a continuous data protection system
JP5346536B2 (en) Information backup / restore processing device and information backup / restore processing system
US7650533B1 (en) Method and system for performing a restoration in a continuous data protection system
US7490103B2 (en) Method and system for backing up data
US8225146B2 (en) Method for implementing continuous data protection utilizing allocate-on-write snapshots
US7516286B1 (en) Conversion between full-data and space-saving snapshots
US7802134B1 (en) Restoration of backed up data by restoring incremental backup(s) in reverse chronological order
JP4512638B2 (en) Computer hard disk system data protection apparatus and method using system area information table and mapping table
JP5669823B2 (en) System recovery method using change tracking
CN111221678B (en) Hbase data backup/recovery system, method and device and electronic equipment
CN103034592B (en) Data processing method and device
WO2007103141A2 (en) Method and apparatus for providing virtual machine backup
CN111506251A (en) Data processing method, data processing device, SMR storage system and storage medium
CN109710456B (en) Data recovery method and device
CN111338844A (en) Database backup management method and electronic equipment
US9336250B1 (en) Systems and methods for efficiently backing up data
CN104462148B (en) A kind of data storage and management method and device
CN110729014A (en) Method and device for backing up erase count table in SSD (solid State disk) storage, computer equipment and storage medium
CN105573862A (en) Method and equipment for recovering file systems
CN110351386B (en) Increment synchronization method and device between different copies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210126