CN114416431A - Agent-free continuous data protection method, system and storage medium based on KVM - Google Patents

Agent-free continuous data protection method, system and storage medium based on KVM Download PDF

Info

Publication number
CN114416431A
CN114416431A CN202210309607.2A CN202210309607A CN114416431A CN 114416431 A CN114416431 A CN 114416431A CN 202210309607 A CN202210309607 A CN 202210309607A CN 114416431 A CN114416431 A CN 114416431A
Authority
CN
China
Prior art keywords
data
function
file
kvm
index table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210309607.2A
Other languages
Chinese (zh)
Other versions
CN114416431B (en
Inventor
钱禹航
黄传波
龙星澧
谢俊峰
周科
谢卓伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Vinchin Science And Technology Co
Original Assignee
Chengdu Vinchin Science And Technology Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Vinchin Science And Technology Co filed Critical Chengdu Vinchin Science And Technology Co
Priority to CN202210309607.2A priority Critical patent/CN114416431B/en
Publication of CN114416431A publication Critical patent/CN114416431A/en
Application granted granted Critical
Publication of CN114416431B publication Critical patent/CN114416431B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method, a system and a storage medium for agent-free continuous data protection based on a KVM (keyboard, video and mouse), belonging to the field of computer disaster recovery backup. The method comprises the following steps: the method comprises a data cluster index table acquiring step, a dynamic mapping table acquiring step, an IO data capturing step, a first judging step and a second judging step. The system comprises: the device comprises a data cluster index table acquisition module, a dynamic mapping table acquisition module, an IO data acquisition module, a first judgment module and a second judgment module. According to the method, the data can be recovered to any time point by capturing the IO data of the KVM, continuous data protection is realized, the qcow2 format data is dynamically converted into raw format data, and the CDP technology is more widely applied to virtualization protection.

Description

Agent-free continuous data protection method, system and storage medium based on KVM
Technical Field
The invention belongs to the field of computer disaster recovery backup, and relates to a method, a system and a storage medium for agent-free continuous data protection based on a KVM (keyboard, video and mouse).
Background
Continuous Data Protection (CDP) is a new concept proposed in the field of disaster recovery backup in recent years. The conventional data protection method is generally a timed or manual backup, and when data is damaged, the conventional data protection method can only restore to the time point of data backup, and data in the period can be lost. And by adopting a continuous data protection technology, the change of the data can be continuously captured and stored, and when the data is damaged, the recovery target point can be any time, so that the data cannot be lost.
Virtualization refers to the act of creating a virtual version of something, including by a virtual computer hardware platform, virtual storage device, or virtual computer network resource, among others. Server virtualization is the most core technology in cloud computing, and currently, various virtualization technologies exist in the market, wherein a Kernel-based Virtual Machine (KVM) is one of the most mainstream server virtualization technologies, and is an open source virtualization technology solution built in Linux.
Cloud computing technology essentially integrates a large amount of storage and computing resources by using virtualization technology, so that data storage and computation are stored in a large amount of distributed computers, and the data security problem becomes more prominent due to the characteristic of the cloud computing technology, and the traditional data protection mode is difficult to adapt to the current computer security situation.
Currently, some CDP backup technologies install proxy backup software in a host for continuous data protection, capture and storage of IO data are realized through the proxy backup software, and finally, the captured IO data are stored in a raw format, so that CDP is realized, but the KVM virtual machine supports various different disk formats, including a raw format and a qcow2 format, and when facing a qcow2 disk, data in a raw format cannot be directly obtained, but the existing technologies capable of capturing and storing IO data in a qcow2 format are almost absent, so that the wide application of the CDP technology in virtualization protection is influenced.
Therefore, in the current KVM virtualization environment, how to help the user continuously, stably and widely backup the file data becomes a technical problem which needs to be solved at present.
Disclosure of Invention
In order to solve the technical problems in the background art, embodiments of the present invention provide a method, a system, and a storage medium for agent-less persistent data protection based on KVM. The technical scheme is as follows:
in a first aspect, a method for agent-free persistent data protection based on KVM is provided, which includes the steps of:
a data cluster index table obtaining step, reading a qcow2 disk, and obtaining a data cluster index table, wherein the data cluster index table comprises: l1 tables and L2 tables;
a dynamic mapping table obtaining step, namely analyzing the data cluster index table to obtain a dynamic mapping table;
an IO data interception step, wherein when an IO request is received, IO data is intercepted at the kernel layer;
a first judgment step, namely judging whether the IO data is target data or not, and if not, issuing the IO data; if so, acquiring the position of the cluster where the target data is located;
a second judgment step, judging whether the target data is metadata or not according to the dynamic mapping table and the position of the cluster where the target data is located, if not, dynamically converting the offset of the IO data according to the dynamic mapping table to obtain raw format data, and backing up the raw format data; if yes, the IO data is issued, and the IO data interception step is repeated to the second judgment step.
It is to be understood that the target data refers to data specified by the client that needs to be protected.
In one embodiment, the step of obtaining the data cluster index table includes:
reading a host machine, and acquiring a qcow2 file;
analyzing the qcow2 file to obtain file header information of the qcow2 file;
analyzing the header information of the qcow2 file to obtain basic information, wherein the basic information comprises: l1_ table _ offset and l1_ table _ size;
acquiring a data cluster index table according to the basic information, wherein the data cluster index table comprises: l1 tables and L2 tables.
In one embodiment, the IO data interception step includes:
establishing an IO filter driver in an inner core layer;
and when receiving IO data, intercepting and capturing the IO data through the IO filtering driver.
In one embodiment, the step of establishing the IO filtering driver at the kernel layer includes: performing Hook processing on a function in an IO system call, wherein the function comprises: open function, pwrite function, and close function.
In one embodiment, the Hook processing step performed on the function in the IO system call includes:
acquiring a system call table of a Linux kernel;
writing a Hook function, and replacing a corresponding system calling function address in the system calling table by using a Hook function address, wherein the system calling function comprises: open function, pwrite function, close function;
and after the replacement of the system calling function is completed, saving the address of the system calling function.
In one embodiment, the step of intercepting the IO data by the IO filter driver when the IO data is received further includes: and performing recording operation on the open function and the close function, and recording the disk file opened by the current process and the corresponding fd.
In a second aspect, there is also provided a KVM-based agentless persistence data protection system, the system comprising:
a data cluster index table obtaining module, configured to read a qcow2 disk, and obtain a data cluster index table, where the data cluster index table includes: l1 tables and L2 tables;
the dynamic mapping table acquisition module is used for analyzing the data cluster index table to acquire a dynamic mapping table;
the IO data interception module is used for intercepting IO data in the kernel layer when an IO request is received;
the first judgment module is used for judging whether the IO data is target data or not, and if not, the IO data is issued; if so, acquiring the position of the cluster where the target data is located according to the offset of the IO data;
the second judging module is used for judging whether the target data is metadata or not according to the dynamic mapping table and the position of the cluster where the target data is located, if not, performing dynamic conversion on the offset of the IO data according to the dynamic mapping table to obtain raw format data, and backing up the raw format data; if yes, the IO data is issued, and the IO data capturing module is repeated to the second judgment module.
In one embodiment, the data cluster index table obtaining module includes:
the qcow2 file acquisition unit is used for reading a host machine and acquiring a qcow2 file;
the file header information acquisition unit is used for analyzing the qcow2 file and acquiring file header information of the qcow2 file;
a basic information obtaining unit, configured to parse header information of the qcow2 file, and obtain basic information, where the basic information includes: l1_ table _ offset and l1_ table _ size;
a data cluster index table obtaining unit, configured to obtain a data cluster index table according to the basic information, where the data cluster index table includes: l1 tables and L2 tables.
In one embodiment, the IO data interception module includes:
the IO filter driver establishing unit is used for establishing the IO filter driver in the kernel layer;
and the IO data interception unit is used for intercepting the IO data through the IO filtering driver when the IO data is received.
In a third aspect, a computer-readable storage medium is also provided, on which a computer program is stored, which when executed by a processor implements the above-mentioned KVM-based agentless persistent data protection method.
The invention has the beneficial effects that:
(1) the invention generates a dynamic mapping table through the data cluster index table, helps to dynamically convert qcow2 data into raw format data, and enables the CDP technology to be more widely applied to virtualization protection;
(2) according to the method, the data can be recovered to any time point by capturing the IO data of the KVM, so that continuous data protection is realized;
(3) the filter driver is installed on the host machine, and software does not need to be installed in the virtual machine, so that the agent-free data protection is realized;
(4) in the face of multiple virtual machines, the filter driver only needs to be installed once, so that the occupied amount of resources is reduced, the operation is convenient, and the cost is saved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart illustrating a KVM-based method for agent-less persistent data protection according to the present invention.
FIG. 2 is a flow chart of the filter driver pass function intercept work of the present invention.
FIG. 3 is a schematic diagram of a KVM-based agent-less persistent data protection system according to the present invention.
Fig. 4 is a schematic structural diagram of a data cluster index table obtaining module according to the present invention.
FIG. 5 is a schematic structural diagram of an IO data capture module according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example one
In one embodiment, as shown in fig. 1, a KVM-based agentless persistent data protection method is provided, the method comprising:
s1, reading a qcow2 disk, and acquiring a data cluster index table, wherein the data cluster index table comprises: l1 tables and L2 tables.
Optionally, the step S1 includes:
s11, reading a host machine to obtain a qcow2 file;
s12, analyzing the qcow2 file to obtain file header information of the qcow2 file;
s13, analyzing file header information of the qcow2 file to obtain basic information, wherein the basic information comprises: l1_ table _ offset and l1_ table _ size;
s14, acquiring a data cluster index table according to the basic information, wherein the data cluster index table comprises: l1 tables and L2 tables.
All Qcow2 files start with a fixed format Header area, which is usually called Header area (i.e. file Header area), as shown in table 1, the Header area contains some key metadata information describing this Qcow2 file, such as type (map), version (version) used, offset of backing _ file (backing _ file _ offset), length of backing file name (backing _ file _ size), defined Cluster size (Cluster _ bits), encryption status (Crypt _ method), offset of table (l 1_ table _ offset), length of table (l 1_ size), etc., so by obtaining the file Header information, l1_ table _ offset and l1_ table _ size can be obtained. Then, L1_ table _ offset and L1_ table _ size can be used to obtain L1table, and then L2table is obtained according to L1table, so as to find out the required data cluster index table.
TABLE 1 Header of qcow2
Offset Name (R) Description of the invention
0-3 byte magic Qcow2 header fixed format character string "QFI \ xfb", through whichThe field determines whether a file is of the Qcow2 type
4-7 byte version Version number, currently Qcow2 has two versions, 2 and 3
8-15 byte backing_file_ offset An offset pointer pointing to the location of the file storing the backing file name, the backing file refers to the source file of the file, i.e. the dependent file, if the field value is 0, indicating that the current file has no source file
16-19 byte backing_file_ size Length of backing file name
20-23 byte Cluster_bits The method is used for indicating the size of a cluster in the current Qcow2 file, the default value is 9, the cluster size can only be 512B-2M, and the calculation formula is (1)<<cluster_bits)
32-35 byte Crypt_method And is used for indicating whether the current disk uses encryption and the adopted encryption mode, and when the current disk is not 0, the disk indicates that the current disk uses encryption.
36-39 byte l1_size Indicating how many pieces of data are in total in the l1table of the current disk file.
40-47 byte l1_table_offset The starting offset of the l1table in the file is recorded.
And S2, analyzing the data cluster index table to obtain a dynamic mapping table.
It should be noted that the data cluster index table stores offset information from a raw disk format to a qocw2 disk format, where the size of the L1table is not fixed, and the length of each table entry is 8 bytes; the size of the L2table is fixed to be one cluster size, and the length of each table entry is 8 bytes. One qcow2 can only have one valid L1table, but can have multiple L2 tables at the same time.
It should be further noted that, although each logical cluster sequence does not correspond to the Qcow2 file one to one, there is a corresponding relationship among the data cluster, the L1table, and the L2table, specifically, the corresponding relationship is: for example, if the data cluster represented by the 5 th entry in the L2table pointed by the 1 st entry in the L1table is the logical 5 th cluster seen by the virtual machine, the data cluster represented by the m th data in the L2table pointed by the nth L1table is the logical (n-1) × x + m cluster seen by the virtual machine. Meanwhile, the sequence of clusters pointed by the L2table of the qcow2 disk is the same as the sequence of the clusters pointed by the L2table of the disk after the clusters are converted into the raw format disk file, so that the qcow2 data can be dynamically converted by the write offset of the intercepted IO and the analysis of the L1table and the L2 table.
In addition, after the L1table and the L2table are analyzed, the position of the data area and the position of the metadata area on the disk may be obtained, and the related position information is recorded in the dynamic mapping table, where each entry of the dynamic mapping table is 8 bytes long, and the clusters of the raw disk corresponding to the clusters in each qcow2 file are sequentially stored.
And S3, when the IO request is received, the IO data carried by the IO request is intercepted and captured at the kernel layer.
Optionally, the step S3 includes:
s31, establishing IO filtering drive in a kernel layer;
and S32, when IO data are received, intercepting the IO data through the IO filtering driver.
Still optionally, the step S31 specifically includes: performing Hook processing on a function in an IO system call, wherein the function comprises: open function, pwrite function, and close function.
For the sake of understanding, we provide an example of the operation for the step S31, which includes:
s311, deriving a function kallsyms _ lookup _ name through a Linux kernel, and acquiring a system call list of the Linux kernel through the kallsyms _ lookup _ name;
s312, writing a Hook function, and replacing a corresponding system call function address in the system call table by using a Hook function address, wherein the system call function comprises: open function, pwrite function, close function;
s313, after the replacement of the system calling function is completed, the address of the system calling function is saved.
Through the Hook technology, the filter driver installed in the host machine intercepts and captures IO data generated by QEMU, realizes interception of KVM virtual machine data without installing plug-ins, and is simple and convenient.
S4, judging whether the IO data is target data or not, and if not, issuing the IO data; if so, obtaining the position of the cluster where the target data is located according to the offset of the IO data.
Optionally, the step S4 further includes: and performing recording operation on the open function and the close function, and recording the disk file opened by the current process and the corresponding fd.
As shown in fig. 2, the IO data issued by the QEMU process of the virtual machine in the application layer is intercepted from top to bottom, for example, the operation steps after the open function is intercepted are as follows: executing an open system call function; verifying the file name; a file descriptor is recorded. The operation steps for executing the intercepted pwrite/pwritev function are as follows: acquiring a file name; verifying the file name; converting and transmitting data; the pwrite/pwritev system call function is executed. The operation steps after the execution of the close function is intercepted are as follows in sequence: acquiring a file name, verifying the file name and executing a close system call function. After the kernel intercepts the open system call function, the kernel can directly judge which disk file the target file operated by the kernel is because the imported parameter of the kernel has the file path name. For the close function, the pwrite function and the pwrite function, because the upper-layer lower parameter does not include the path of the corresponding file, but instead includes a file description symbol fd, the corresponding file name is judged by the symbol fd, but the performance is seriously reduced because the corresponding file name is obtained from the context when the write data is intercepted each time, so that the recording operation is performed in the intercepted open and close functions, the disk file opened by the current process and the corresponding fd are recorded, and the efficiency of the intercepting operation can be effectively improved.
S5, judging whether the target data is metadata or not according to the dynamic mapping table and the position of the cluster where the target data is located, if not, dynamically converting the offset of the IO data according to the dynamic mapping table to obtain raw format data, and backing up the raw format data; if yes, the IO data is issued, and the steps S3 to S5 are repeated.
In the established dynamic mapping table, data can be quickly and accurately inquired according to the position of a cluster where target data is located, and if the data can be inquired, the data is indicated to be actual data; if the query cannot be carried out, the metadata is described, so that the metadata is screened.
When the kernel layer performs IO data interception, two KVM virtual machine disk formats, namely, a raw format and a qcow2 format, are generally supported. When the virtual machine adopts the qcow2 disk format, the data intercepted by the kernel layer not only contains disk data, but also contains metadata related to the disk format, and the metadata needs to be removed dynamically during interception, however, because the data is intercepted by the kernel layer and is converted in advance, whether the data is the metadata cannot be directly judged, and therefore, the qcow2 disk data can be converted through a dynamic mapping table generated by a data cluster index table, and then raw format data is obtained.
According to the technical scheme of the embodiment, the data can be recovered to any time point by capturing the IO data of the KVM; a dynamic mapping table is generated through the data cluster index table, so that the qcow2 data are dynamically converted into raw format data, and the CDP technology is more widely applied to virtualization protection.
Example 2
As shown in FIG. 3, in one embodiment, a KVM-based agentless persistence data protection system is provided, the system comprising:
a data cluster index table obtaining module 1001, configured to read a qcow2 disk, and obtain a data cluster index table, where the data cluster index table includes: l1 tables and L2 tables;
a dynamic mapping table obtaining module 1002, configured to analyze the data cluster index table to obtain a dynamic mapping table;
an IO data interception module 1003, configured to intercept IO data in the kernel layer when an IO request is received;
a first judging module 1004, configured to judge whether the IO data is target data, and if not, issue the IO data; if so, acquiring the position of the cluster where the target data is located according to the offset of the IO data;
a second determining module 1005, configured to determine whether the target data is metadata according to the dynamic mapping table and a location of a cluster where the target data is located, if not, perform dynamic conversion on an offset of IO data according to the dynamic mapping table to obtain raw format data, and backup the raw format data; if yes, the IO data is issued, and the IO data capturing module is repeated to the second judgment module.
Optionally, as shown in fig. 4, the data cluster index table obtaining module 1001 includes:
a qcow2 file obtaining unit 10011, configured to read a host and obtain a qcow2 file;
a header information acquiring unit 10012, configured to parse the qcow2 file, and acquire header information of the qcow2 file;
a basic information obtaining unit 10013, configured to parse header information of the qcow2 file to obtain basic information, where the basic information includes: l1_ table _ offset and l1_ table _ size;
a data cluster index table obtaining unit 10014, configured to obtain a data cluster index table according to the basic information, where the data cluster index table includes: l1 tables and L2 tables.
Optionally, as shown in fig. 5, the IO data interception module 1003 includes:
an IO filter driver establishing unit 10031 is configured to establish an IO filter driver in the kernel layer, and specifically, perform Hook processing on a function in an IO system call, where the function includes: open function, pwrite function, and close function;
the IO data capturing unit 10032 is configured to capture IO data through the IO filter driver when receiving the IO data.
In the following, we provide a set of experiments to further illustrate this example, as follows:
the experiment consists of a test server and a backup server, wherein the test server is used for operating the KVM virtual machine, and the backup server is used for storing the data captured by the data capture module.
The experiment firstly generates a data file at random, calculates the MD5 code of the file, stores the file, then starts the KVM on the test server, uses the qcow2 format as the test disk, writes the generated data file into the test disk, and then formats the disk. And finally, a recovery module is used for recovering the disk to a position before data formatting, a data file MD5 code is calculated, if the MD5 code is not changed, the data can be protected correctly, and the experimental result is specifically shown in Table 2.
Figure 886227DEST_PATH_IMAGE001
As can be seen from table 2, the codes of the file MD5 before recovery and the file MD5 after recovery are the same, which illustrates that the KVM-based agent-less persistent data protection system of this embodiment can correctly protect the disk data in qcow2 format.
In the technical scheme of this embodiment, the data cluster index table obtaining module 1001 is configured to read a qcow2 disk and obtain a data cluster index table; a dynamic mapping table obtaining module 1002, configured to analyze the data cluster index table to obtain a dynamic mapping table; an IO data interception module 1003, configured to intercept IO data in the kernel layer when an IO request is received; a first judging module 1004, configured to judge whether the IO data is target data, and if not, issue the IO data; if so, acquiring the position of the cluster where the target data is located according to the offset of the IO data; a second determining module 1005, configured to determine whether the target data is metadata according to the dynamic mapping table and a location of a cluster where the target data is located, if not, perform dynamic conversion on an offset of IO data according to the dynamic mapping table to obtain raw format data, and backup the raw format data; if yes, the IO data is issued, and the IO data capturing module is repeated to the second judgment module. During the running period of the KVM, the kernel layer can continuously capture data generated by the KVM, continuous data protection is realized, plug-ins are not required to be installed, and the operation is simple and convenient.
Example 3
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the KVM-based agentless persistent data protection method of an embodiment.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A agent-free continuous data protection method based on KVM is characterized by comprising the following steps:
a data cluster index table obtaining step, reading a qcow2 disk, and obtaining a data cluster index table, wherein the data cluster index table comprises: l1 tables and L2 tables;
a dynamic mapping table obtaining step, namely analyzing the data cluster index table to obtain a dynamic mapping table;
an IO data interception step, wherein when an IO request is received, IO data is intercepted at the kernel layer;
a first judgment step, namely judging whether the IO data is target data or not, and if not, issuing the IO data; if so, acquiring the position of the cluster where the target data is located;
a second judgment step, judging whether the target data is metadata or not according to the dynamic mapping table and the position of the cluster where the target data is located, if not, dynamically converting the offset of the IO data according to the dynamic mapping table to obtain raw format data, and backing up the raw format data; if yes, the IO data is issued, and the IO data interception step is repeated to the second judgment step.
2. The KVM-based agentless persistent data protection method of claim 1, wherein the data cluster index table obtaining step comprises:
reading a host machine, and acquiring a qcow2 file;
analyzing the qcow2 file to obtain file header information of the qcow2 file;
analyzing the header information of the qcow2 file to obtain basic information, wherein the basic information comprises: l1_ table _ offset and l1_ table _ size;
acquiring a data cluster index table according to the basic information, wherein the data cluster index table comprises: l1 tables and L2 tables.
3. The KVM-based agentless persistent data protection method of claim 1, wherein the IO data interception step comprises:
establishing an IO filter driver in an inner core layer;
and when receiving IO data, intercepting and capturing the IO data through the IO filtering driver.
4. The KVM-based agentless persistent data protection method of claim 3, wherein the step of establishing IO filtering driver at kernel level comprises: performing Hook processing on a function in an IO system call, wherein the function comprises: open function, pwrite function, and close function.
5. The KVM-based agentless persistence data protection method of claim 4, wherein the Hook processing step performed on the function in the IO system call comprises:
acquiring a system call table of a Linux kernel;
writing a Hook function, and replacing a corresponding system calling function address in the system calling table by using a Hook function address, wherein the system calling function comprises: open function, pwrite function, close function;
and after the replacement of the system calling function is completed, saving the address of the system calling function.
6. The KVM-based agentless persistent data protection method of claim 5, wherein the step of intercepting the IO data by the IO filter driver when the IO data is received further comprises: and performing recording operation on the open function and the close function, and recording the disk file opened by the current process and the corresponding fd.
7. A KVM-based agentless persistence data protection system, the system comprising:
a data cluster index table obtaining module, configured to read a qcow2 disk, and obtain a data cluster index table, where the data cluster index table includes: l1 tables and L2 tables;
the dynamic mapping table acquisition module is used for analyzing the data cluster index table to acquire a dynamic mapping table;
the IO data interception module is used for intercepting IO data in the kernel layer when an IO request is received;
the first judgment module is used for judging whether the IO data is target data or not, and if not, the IO data is issued; if so, acquiring the position of the cluster where the target data is located according to the offset of the IO data;
the second judging module is used for judging whether the target data is metadata or not according to the dynamic mapping table and the position of the cluster where the target data is located, if not, performing dynamic conversion on the offset of the IO data according to the dynamic mapping table to obtain raw format data, and backing up the raw format data; if yes, the IO data is issued, and the IO data capturing module is repeated to the second judgment module.
8. The KVM-based agentless persistence data protection system of claim 7, wherein the data cluster index table obtaining module comprises:
the qcow2 file acquisition unit is used for reading a host machine and acquiring a qcow2 file;
the file header information acquisition unit is used for analyzing the qcow2 file and acquiring file header information of the qcow2 file;
a basic information obtaining unit, configured to parse header information of the qcow2 file, and obtain basic information, where the basic information includes: l1_ table _ offset and l1_ table _ size;
a data cluster index table obtaining unit, configured to obtain a data cluster index table according to the basic information, where the data cluster index table includes: l1 tables and L2 tables.
9. The KVM-based agentless persistence data protection system of claim 7, wherein the IO data intercept module comprises:
the IO filter driver establishing unit is used for establishing the IO filter driver in the kernel layer;
and the IO data interception unit is used for intercepting the IO data through the IO filtering driver when the IO data is received.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the KVM-based agentless persistent data protection method according to any one of claims 1 to 6.
CN202210309607.2A 2022-03-28 2022-03-28 Agent-free continuous data protection method, system and storage medium based on KVM Active CN114416431B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210309607.2A CN114416431B (en) 2022-03-28 2022-03-28 Agent-free continuous data protection method, system and storage medium based on KVM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210309607.2A CN114416431B (en) 2022-03-28 2022-03-28 Agent-free continuous data protection method, system and storage medium based on KVM

Publications (2)

Publication Number Publication Date
CN114416431A true CN114416431A (en) 2022-04-29
CN114416431B CN114416431B (en) 2022-06-07

Family

ID=81264225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210309607.2A Active CN114416431B (en) 2022-03-28 2022-03-28 Agent-free continuous data protection method, system and storage medium based on KVM

Country Status (1)

Country Link
CN (1) CN114416431B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115454357A (en) * 2022-10-31 2022-12-09 广东睿江云计算股份有限公司 Method for storing qcow2 file and method for converting format
CN116149800A (en) * 2023-04-18 2023-05-23 成都云祺科技有限公司 KVM virtual machine application layer proxy-free CDP method, system and storage medium
CN116401020A (en) * 2023-06-07 2023-07-07 四川大学 KVM virtual machine I/O filter framework implementation method, system and storage medium
CN117762695A (en) * 2024-02-22 2024-03-26 成都云祺科技有限公司 Agent-free CDP method, system and storage medium for Hyper-V

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6934725B1 (en) * 2001-12-28 2005-08-23 Emc Corporation Management of file extent mapping to hasten mirror breaking in file level mirrored backups
US20070094464A1 (en) * 2001-12-26 2007-04-26 Cisco Technology, Inc. A Corporation Of California Mirror consistency checking techniques for storage area networks and network based virtualization
US20080086608A1 (en) * 2006-10-10 2008-04-10 Hitachi, Ltd. System and method for migration of CDP journal data between storage subsystems
US20090228651A1 (en) * 2001-12-26 2009-09-10 Cisco Technology, Inc. Mirroring Mechanisms For Storage Area Networks and Network Based Virtualization
CN104866435A (en) * 2015-06-06 2015-08-26 成都云祺科技有限公司 Continuous data protection method
CN104899161A (en) * 2015-06-12 2015-09-09 华中科技大学 Cache method based on continuous data protection of cloud storage environment
US20180039434A1 (en) * 2016-08-04 2018-02-08 Trilio Data, Inc. Prioritized backup operations for virtual machines
CN108536729A (en) * 2018-02-24 2018-09-14 国家计算机网络与信息安全管理中心 Across the subregion image file synchronous method of one kind and device
CN109766702A (en) * 2019-01-11 2019-05-17 北京工业大学 The credible starting method of inspection of overall process based on virtual machine state data
US20200110675A1 (en) * 2018-10-05 2020-04-09 Rubrik, Inc. Data backup and disaster recovery between environments
CN111625401A (en) * 2020-05-29 2020-09-04 浪潮电子信息产业股份有限公司 Data backup method and device based on cluster file system and readable storage medium
CN112130760A (en) * 2020-09-04 2020-12-25 苏州浪潮智能科技有限公司 Data writing method, device and medium based on True CDP
US20210055946A1 (en) * 2019-08-21 2021-02-25 Red Hat, Inc. Minimizing downtime when importing virtual machines from other platforms
CN112596950A (en) * 2020-12-23 2021-04-02 深圳市科力锐科技有限公司 Virtual machine data backup method, device, equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070094464A1 (en) * 2001-12-26 2007-04-26 Cisco Technology, Inc. A Corporation Of California Mirror consistency checking techniques for storage area networks and network based virtualization
US20090228651A1 (en) * 2001-12-26 2009-09-10 Cisco Technology, Inc. Mirroring Mechanisms For Storage Area Networks and Network Based Virtualization
US6934725B1 (en) * 2001-12-28 2005-08-23 Emc Corporation Management of file extent mapping to hasten mirror breaking in file level mirrored backups
US20080086608A1 (en) * 2006-10-10 2008-04-10 Hitachi, Ltd. System and method for migration of CDP journal data between storage subsystems
CN104866435A (en) * 2015-06-06 2015-08-26 成都云祺科技有限公司 Continuous data protection method
CN104899161A (en) * 2015-06-12 2015-09-09 华中科技大学 Cache method based on continuous data protection of cloud storage environment
US20180039434A1 (en) * 2016-08-04 2018-02-08 Trilio Data, Inc. Prioritized backup operations for virtual machines
CN108536729A (en) * 2018-02-24 2018-09-14 国家计算机网络与信息安全管理中心 Across the subregion image file synchronous method of one kind and device
US20200110675A1 (en) * 2018-10-05 2020-04-09 Rubrik, Inc. Data backup and disaster recovery between environments
CN109766702A (en) * 2019-01-11 2019-05-17 北京工业大学 The credible starting method of inspection of overall process based on virtual machine state data
US20210055946A1 (en) * 2019-08-21 2021-02-25 Red Hat, Inc. Minimizing downtime when importing virtual machines from other platforms
CN111625401A (en) * 2020-05-29 2020-09-04 浪潮电子信息产业股份有限公司 Data backup method and device based on cluster file system and readable storage medium
CN112130760A (en) * 2020-09-04 2020-12-25 苏州浪潮智能科技有限公司 Data writing method, device and medium based on True CDP
CN112596950A (en) * 2020-12-23 2021-04-02 深圳市科力锐科技有限公司 Virtual machine data backup method, device, equipment and storage medium

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
LIJIANXIN等: "Towards an efficient snapshot approach for virtual machines in clouds", 《INFORMATION SCIENCES》 *
LIJIANXIN等: "Towards an efficient snapshot approach for virtual machines in clouds", 《INFORMATION SCIENCES》, vol. 379, 10 February 2017 (2017-02-10), pages 3 - 22, XP029813460, DOI: 10.1016/j.ins.2016.08.008 *
YONGSEOK SON等: "SSD-Assisted Backup and Recovery for Database Systems", 《2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE)》 *
YONGSEOK SON等: "SSD-Assisted Backup and Recovery for Database Systems", 《2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE)》, 18 May 2017 (2017-05-18), pages 285 - 296 *
刘旺: "基于Xen的虚拟磁盘I/O监控方法及应用", 《中国优秀博硕士学位论文全文数据库(硕士)工程科技Ⅱ辑》 *
刘旺: "基于Xen的虚拟磁盘I/O监控方法及应用", 《中国优秀博硕士学位论文全文数据库(硕士)工程科技Ⅱ辑》, no. 8, 15 August 2016 (2016-08-15), pages 137 - 48 *
朱正义等: "高性能块级 CDP 系统的研究与设计", 《计算机工程与设计》 *
朱正义等: "高性能块级 CDP 系统的研究与设计", 《计算机工程与设计》, 28 December 2010 (2010-12-28), pages 5224 - 5226 *
李等: "一种块级连续数据保护系统的快速恢复方法", 《北京理工大学学报》 *
李等: "一种块级连续数据保护系统的快速恢复方法", 《北京理工大学学报》, no. 06, 15 June 2011 (2011-06-15), pages 53 - 58 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115454357A (en) * 2022-10-31 2022-12-09 广东睿江云计算股份有限公司 Method for storing qcow2 file and method for converting format
CN116149800A (en) * 2023-04-18 2023-05-23 成都云祺科技有限公司 KVM virtual machine application layer proxy-free CDP method, system and storage medium
CN116149800B (en) * 2023-04-18 2023-06-23 成都云祺科技有限公司 KVM virtual machine application layer proxy-free CDP method, system and storage medium
CN116401020A (en) * 2023-06-07 2023-07-07 四川大学 KVM virtual machine I/O filter framework implementation method, system and storage medium
CN116401020B (en) * 2023-06-07 2023-08-11 四川大学 KVM virtual machine I/O filter framework implementation method, system and storage medium
CN117762695A (en) * 2024-02-22 2024-03-26 成都云祺科技有限公司 Agent-free CDP method, system and storage medium for Hyper-V
CN117762695B (en) * 2024-02-22 2024-04-26 成都云祺科技有限公司 Agent-free CDP method, system and storage medium for Hyper-V

Also Published As

Publication number Publication date
CN114416431B (en) 2022-06-07

Similar Documents

Publication Publication Date Title
CN114416431B (en) Agent-free continuous data protection method, system and storage medium based on KVM
Ji et al. Enabling refinable {Cross-Host} attack investigation with efficient data flow tagging and tracking
US8185880B2 (en) Optimizing heap memory usage
US7797585B1 (en) System and method for handling trace data for analysis
US7831821B2 (en) System backup and recovery solution based on BIOS
US8990792B2 (en) Method for constructing dynamic call graph of application
US7243046B1 (en) System and method for preparing trace data for analysis
US10387066B1 (en) Providing data deduplication in a data storage system with parallelized computation of crypto-digests for blocks of host I/O data
US8972338B2 (en) Sampling transactions from multi-level log file records
CN112346647B (en) Data storage method, device, equipment and medium
CN114461456B (en) CDP backup method, system, storage medium and recovery method based on continuous writing
CN110865866B (en) Virtual machine safety detection method based on introspection technology
CN110134538B (en) Method, device, medium and electronic equipment for quickly positioning problem log
GB2497172A (en) Reserving space on a storage device for new data based on predicted changes in access frequencies of storage devices
CN114564457A (en) Storage space optimization method and system for database file
CN108647284B (en) Method and device for recording user behavior, medium and computing equipment
US20220342851A1 (en) File system event monitoring using metadata snapshots
CN104903865A (en) Restoring a previous version of a virtual machine image
CN109189652A (en) A kind of acquisition method and system of close network terminal behavior data
CN110134615B (en) Method and device for acquiring log data by application program
CN111176568B (en) Data analysis method and device
CN104637496A (en) Computer system and audio comparison method
CN109002710A (en) A kind of detection method, device and computer readable storage medium
KR102122968B1 (en) System and method for analyzing of application installation information
WO2020211371A1 (en) Image restoration method and apparatus, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant