CN112464044B

CN112464044B - File data block change information monitoring and management system and method thereof

Info

Publication number: CN112464044B
Application number: CN202011433430.4A
Authority: CN
Inventors: 郑忠慧; 高硕�
Original assignee: Shanghai Eisoo Information Technology Co Ltd
Current assignee: Shanghai Eisoo Information Technology Co Ltd
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2023-04-07
Anticipated expiration: 2040-12-09
Also published as: CN112464044A

Abstract

The invention relates to a file data block change information monitoring management system and a method thereof, wherein the system comprises a client and a driving unit, the client is connected with the driving unit through a CDP manager, the driving unit is respectively connected with an operating system data interface and a memory, and the CDP manager is used for realizing the transmission of data information between the client and the driving unit; the drive unit is used for capturing file data block change information from the operating system, storing the captured information into the memory and transmitting the captured information to the client through the CDP manager; the client is used for initiating data reading or catalog monitoring tasks, receiving information captured by the driving unit from the operating system and carrying out data backup processing. Compared with the prior art, the method and the device capture the change of the file data block through the file filtering driver, can track the specific information of the change of the file, and can realize the purpose of adapting to different database applications, thereby reducing the complexity of adapting to different database applications.

Description

File data block change information monitoring and management system and method thereof

Technical Field

The invention relates to the technical field of duplicate data management, in particular to a file data block change information monitoring and management system and a method thereof.

Background

In the current social informatization large environment, data is a basic source of all behaviors, and the importance of the data urges various technologies generated around the data, such as the traditional technologies of timing backup protection, copy data management and the like. The copy data management technology can reflect the value of the data most, and the copy data management can help a user to further improve the use value of the data and dig out effective information hidden in the data on the basis of finishing traditional regular backup and protection of the data. By separating out complete copy data, the method can be applied to daily development, testing and other works, and can transfer inquiry, testing, analysis and the like to a non-production system under the condition of not influencing business, thereby quickly utilizing data information and being beneficial to enhancing the competitiveness of users in a big data era.

The current copy data management technology mainly includes two aspects, one is a protection technology of application data, such as: the data protection is realized by constructing a database of the business system and applying full and incremental data, and on the other hand, the captured business data is stored by a data storage technology, so that complete duplicate data can be provided to realize the utilization of data by a user.

In the aspect of data storage technology, the prior art can realize the utilization of data, but for the capture of application data, especially for database application, because of numerous database manufacturers at present, service data can only be acquired by adapting to interfaces of different databases, which increases the complexity of adaptation, so that a user cannot quickly and conveniently acquire the change information of file data blocks, and cannot timely and reliably protect and utilize data.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a file data block change information monitoring and management system and a method thereof.

The purpose of the invention can be realized by the following technical scheme: a file Data block change information monitoring and management system comprises a client and a drive unit, wherein the client is connected with the drive unit through a Continuous Data Protection (CDP) manager, the drive unit is respectively connected with an operating system Data interface and a memory, and the CDP manager is used for realizing Data information transmission between the client and the drive unit;

the drive unit is used for capturing file data block change information from an operating system, storing the captured information into a memory, and transmitting the captured information to a client through a CDP manager;

the client is used for initiating a data reading or catalog monitoring task, receiving information captured by the driving unit from the operating system and carrying out data backup processing.

Further, the drive unit includes a memory allocation module and a directory binary tree generation unit, which are respectively connected to the memory, the directory binary tree generation unit is further connected to the data interface of the operating system, the memory allocation module is configured to acquire a space for storing file data block change information from the memory, and the directory binary tree generation unit is configured to capture name, position, and size change data of the file data block from the operating system, and generate a corresponding directory binary tree, which is stored in the memory as the file data block change information.

A file data block change information monitoring and management method comprises the following steps:

s1, a client initiates a directory monitoring task and transmits an initiated directory monitoring task request to a driving unit through a CDP manager;

s2, after receiving the directory monitoring task request, the driving unit captures the change information of the corresponding file data block from the operating system through the data interface of the operating system and stores the change information of the file data block into the memory;

s3, the client initiates a data reading task and transmits the initiated data reading task request to the driving unit through the CDP manager;

s4, the drive unit transmits the corresponding file data block change information in the memory to a CDP manager, the CDP manager extracts first address data of the file data block change information, and transmits the extracted first address data to a client;

and S5, according to the received first address data, the client finishes the backup operation of the corresponding file data.

Further, the directory monitoring task request includes a file path to be traced.

Further, the step S2 specifically includes the following steps:

s21, after receiving the directory monitoring task request, the drive unit stores the file path to be traced in a memory in a binary tree mode, and then starts a tracing mode of the file path to be traced;

s22, when IO under the path of the file to be tracked is operated in the operating system, the driving unit calculates the hash value of the path of the file to be tracked to store the hash value in the corresponding bitmap chain table to obtain the change information of the file data block, and stores the change information in the memory in a directory binary tree mode.

Furthermore, the bitmap linked list adopts a skip list structure to insert and acquire bitmaps quickly, and comprises a plurality of bitmap offsets and corresponding bitmap pointers, wherein the bitmap pointers form a file name pointer together.

Further, the step S22 specifically includes the following steps:

s221, when IO under the path of the file to be tracked is operated in the operating system, according to a preset hash table, the driving unit takes the full path name of the file to be tracked as input, and a corresponding hash value is obtained through calculation;

s222, according to the hash value obtained through calculation, the driving unit generates or updates a bitmap linked list;

s223, according to the bitmap linked list information, the drive unit combines the current and directory binary tree stored in the memory to update and generate a new directory binary tree;

and S224, applying for acquiring the allocated memory space from the memory by the drive unit by adopting a memory fragment processing mode, and storing the new binary directory tree into the corresponding memory space.

Further, the IO under the file path to be traced includes a file name modification, a start address of the data change, and a length of the data change.

Further, the specific process of memory fragmentation processing is as follows: the method comprises the steps of constructing a corresponding main array according to preset memory allocation space capacity, wherein each array entry node in the main array corresponds to a space block, each space block consists of an addressing head and a data memory, the space blocks jointly form an idle linked list, a pointer used for pointing to a first idle node in the idle linked list is further arranged in the main array, and when a driving unit applies for allocating memory space to the memory, the space blocks are allocated through the pointer.

Further, the specific structure of the binary directory tree includes a parent node, a left child node and a right child node, the left child node points to a child directory or a child file, and the right child node points to a sibling directory or a sibling file of the same level.

Compared with the prior art, the invention has the following advantages:

1. the invention uses the client to initiate the data reading or directory monitoring task, combines with the CDP manager to realize the data interaction between the client and the driving unit, and can capture the file data block change information from the operating system by arranging the driving unit connected with the operating system data interface, so that the subsequent client can directly and quickly obtain the specific information of the file change to complete the corresponding data backup operation, thereby realizing the purpose of adapting to different database applications.

2. In the process that the drive unit stores the captured file data block change information into the memory, the node can be quickly searched based on the bitmap chain table mode by combining the bitmap chain table, the memory fragment processing and the directory binary tree mode, the idle small memory of the memory can be effectively managed based on the memory fragment processing mode, the time for applying and releasing the memory is shortened, the modified file path information can be reliably and conveniently stored based on the directory binary tree mode, the memory consumption space can be reduced, the storage reliability of the file data block change information is comprehensively improved, and the accurate and quick execution of the directory monitoring task is ensured.

Drawings

FIG. 1 is a schematic diagram of the system of the present invention;

FIG. 2 is a schematic flow diagram of the process of the present invention;

FIG. 3 is a diagram of a data structure according to the present invention;

FIG. 4 is a diagram illustrating a binary tree structure of an embodiment of a directory;

FIG. 5 is a diagram illustrating memory fragmentation in an embodiment;

the notation in the figure is: 1. client, 2, drive unit, 3, CDP manager.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

Examples

As shown in fig. 1, a file data block change information monitoring and management system includes a client 1 and a drive unit 2, the client 1 is connected to the drive unit 2 through a CDP manager 3, the drive unit 2 is further connected to an operating system data interface and a memory, the client 1 and the CDP manager 3 belong to a user layer, the drive unit 2 belongs to a kernel layer, the drive unit 2 includes a memory allocation module and a directory binary tree generation unit, the memory allocation module is connected to the operating system data interface, the memory allocation module is used to obtain a space for storing file data block change information from the memory, and the directory binary tree generation unit is used to capture name, position, and size change data of a file data block from the operating system and generate a corresponding directory binary tree, which is stored in the memory as file data block change information.

The CDP manager 3 is mainly used for facilitating a user layer to perform data interaction with the drive unit 2 more simply and flexibly, and the CDP manager 3 is provided for the user layer in a library form and provides a reading/setting data interface, so that the application layer does not need to care about detailed reading of data and only executes processing operation of the data;

the drive unit 2 is used for capturing file data block change information from an operating system, storing the captured information into a memory, and transmitting the captured information to the client 1 through the CDP manager 3;

the client 1 is used for initiating a data reading or directory monitoring task, receiving information captured by the driving unit 2 from an operating system, and performing data backup processing.

As shown in (1) to (6) in fig. 1, in a first step, a user layer first initiates data reading or directory monitoring. And the second step, the CDP manager interacts with the drive unit to send a read data request. And thirdly, after the drive unit captures the file data block conversion information, returning the captured information to the CDP manager, and after receiving the captured information sent by the drive, extracting a corresponding first address by the CDP manager. And fourthly, the CDP manager returns the first address to the user layer, and the user layer completes the backup operation of corresponding data after receiving the first address data returned by the CDP manager. Fifthly, the user layer sends the result of the backup completion to the CDP manager. Sixthly, the CDP manager sends the backup completion response message to the driving unit, and the driving unit continues to read data after receiving the correct response and then transmits the data to the user layer.

The system is applied to practice, and a specific file data block change information monitoring and management method is shown in fig. 2 and comprises the following steps:

s1, a client initiates a directory monitoring task and transmits an initiated directory monitoring task request to a drive unit through a CDP manager, wherein the directory monitoring task request comprises a file path to be traced;

s2, after receiving the directory monitoring task request, the driving unit captures corresponding file data block change information from an operating system through an operating system data interface, and stores the file data block change information into a memory, specifically:

after receiving a directory monitoring task request, a drive unit firstly stores a file path to be tracked in a memory in a binary tree mode, and then starts a tracking mode of the file path to be tracked;

when IO under the path of the file to be tracked is operated in the operating system, the driving unit takes the full path name of the file to be tracked as input according to a preset hash table, and a corresponding hash value is obtained through calculation;

according to the hash value obtained by calculation, the driving unit generates or updates a bitmap linked list;

then according to the bitmap linked list information, the drive unit combines the current directory binary tree stored in the memory to update and generate a new directory binary tree;

finally, a memory fragment processing mode is adopted, the drive unit applies for acquiring the allocated memory space from the memory, and the new directory binary tree is stored into the corresponding memory space;

the bitmap linked list adopts a skip list structure to quickly insert and acquire a bitmap, and comprises a plurality of bitmap offsets and corresponding bitmap pointers, wherein the plurality of bitmap pointers jointly form a file name pointer;

IO under the file path to be tracked comprises file name modification, an initial address of data change and the length of the data change;

the specific process of memory fragmentation processing is as follows: constructing a corresponding main array according to preset memory allocation space capacity, wherein each array item node in the main array corresponds to a space block, one space block consists of an addressing head and a data memory, the space blocks jointly form an idle linked list, a pointer for pointing to a first idle node in the idle linked list is further arranged in the main array, and when a drive unit applies for allocating a memory space to the memory, the space blocks are allocated through the pointer;

the specific structure of the binary directory tree comprises a father node, a left child node and a right child node, wherein the left child node points to a child directory or a child file, and the right child node points to a sibling directory or a sibling file of the same level;

s4, the drive unit transmits the corresponding file data block change information in the memory to a CDP manager, the CDP manager extracts the first address data of the file data block change information, and transmits the extracted first address data to the client;

In the invention, the user layer sets the monitoring directory at the system starting stage, and when the monitored file data changes, the drive unit can capture the file data, including the file name, the initial address of the data change, the length of the data change and other information. As shown in fig. 3, the driving unit obtains the file name, converts the file name into a corresponding key value through a hash algorithm, and finds the corresponding hash table entry through the key value. In order to solve the situation of excessive file conflicts, a file item is inserted in a jumping mode, the file item contains bitmap data of files for identifying the changed positions of the files, meanwhile, in order to store the full paths of the files and save memory space, a binary directory tree structure (shown in fig. 3) is adopted, each node of the binary directory tree represents the name of each layer of directory or files, each directory node is provided with a left sub-tree and a right sub-tree, the left sub-tree represents a child file of the directory, the right sub-tree represents a peer file of the directory, in order to obtain the corresponding full path of the files with O (1) time complexity, a parent directory pointer is added to each node and used for tracing back the full path, so that the changed positions of the files and the full path of the files are all recorded. As shown in FIG. 4, under the root directory there are four files/x/xx/1. Txt,/x/xx/2. Txt,/x/xxx/1. Txt,/x/xx/sxx/1. Txt, x being at the root directory location, according to the first file name format, xx is a subdirectory of x and then it is placed on the left sub-tree, 1.Txt is a subdirectory of xx, 1.Txt is placed on the left sub-tree of xx. According to the format of the second file, the root directory x finds that the file already exists by querying the binary directory tree, the subdirectory xx can find that the file also exists, and 2.Txt is not found by searching the subdirectory xx, and then 2.Txt is inserted into the right subtree of 1. Txt. According to the format of the third file, if the subdirectory xxx of x cannot be found in the left sub-tree of x, then xxx is inserted into the left sub-tree of x, and 1.Txt is inserted into the left sub-tree of xxx similarly. The fourth file format storage form is shown in fig. 4 in the same way.

In addition, since the drive unit needs to store the file data block change information captured in real time into the memory, if a cache with a size of tens/tens of bytes is applied to the system every time, and if a large amount of applications are applied, the memory fragmentation is serious, which affects the system operation efficiency, the fragmentation processing scheme is specially designed to solve the problem, so that the small memory application can be completed with time complexity O (1), and the system efficiency is not substantially affected. In the embodiment, as shown in fig. 5, a single small space block is composed of 8+128 bytes, 8 bytes are occupied by an addressing head, and 128 bytes represent a small memory to be actually used, so that the large memory is divided into the small memories and is concatenated into a manageable linked list, a pointer in the array items always points to a first idle node of a rear idle linked list, and the large memory can be allocated from the pointer position when the memory needs to be allocated without consuming time.

In this embodiment, in order to efficiently use the transmitted data without retransmission, a 16MB cache space is also designed in the drive unit, and when the user receives the 16MB data, the user performs processing by himself; before the user layer does not confirm, the data cached in the 16MB in the kernel is always stored; if the user layer confirms that all the uploaded data are processed, the 16MB cache data can be deleted, and the 16MB cache is filled with new data; if the user layer exits halfway, the drive unit can recover the data in the 16MB cache under the condition of detecting that the application layer exits, and resend the data when the user layer restarts to read again, so that the process exiting exception of the user layer in the process of processing the data can be reliably processed.

In summary, in the data copy management, the protection of each database application needs to be specially adapted for the database application, but the storage mode of the file used in the bottom layer of the database application is utilized, so that the change of the database file is captured through the file filtering driver, an innovative scheme capable of adapting to an obstructed database application is realized, and the complexity of adapting to different database applications is reduced. The file filtering driver can track specific information of changes of the file, such as name modification of the file, modified position and modified size of the file, and the like, and the tracked data is sent to a storage execution copy snapshot of a user layer after being captured, so that data backup is completed, and protection and utilization of the data are realized.

The invention divides the file filter driver into an application layer and a kernel layer, the application layer firstly sets database file path and resource configuration information to be traced, after receiving the path to be traced, the kernel layer stores the traced path in a tree structure in a binary tree mode, and starts the tracing mode of the paths, when IO under the file path is operated, the change of the file can be automatically traced, and the link list adopts a skip list mode to achieve the capability of quickly inserting and obtaining the bitmap by calculating the hash value of the file path and storing the hash value in the corresponding bitmap link list. When the application layer needs to acquire the change of the file path, the change of the database file is acquired through interaction of the application layer module and the kernel module.

Therefore, the application of a certain database does not need to be independently adapted, and the file filtering driver can track the files on the bottom layer of the database, so that the universal effect is achieved.

It should be noted that the invention is based on the hash table, can be adapted to the scene of a large amount of small files, when the number of files reaches the level of ten million, the searching speed can be satisfied, and the key value of the hash table is calculated by taking the full path name of the file as the input;

the modified file path information can be stored by adopting a directory binary tree structure, a large amount of memory situations which are consumed by a large amount of changed files can be dealt with, nodes in the directory binary tree structure are divided into father nodes, left child nodes and right child nodes, each node is used for storing a directory name corresponding to a directory node, a right pointer of the node points to a sibling/brother node, a left pointer of the node points to a child file, and the sibling/brother node points to the father node, so that the whole path information can be quickly acquired.

Based on jump table structure, can be used for the fast node of looking for, look for the file name that corresponds under the directory fast promptly, there is the use in two aspects, the conflict node of hash table, need improve the speed of looking for the conflict linked list when file figure is too many, deposit and contain a large amount of files under the same directory, use jump table can accelerate the speed of looking for the file, bitmap wherein mainly is used for quick sign file to change the corresponding position of data, can save the file and change in a large number and lead to the too much condition of memory consumption simultaneously, represent the size change of the inside 4KB of file and set up the not dibit according to 1 bit.

The memory fragmentation structure is adopted to avoid the problem that the performance of the system is reduced due to the fact that a large number of cores apply for small memories, the memory fragmentation processing mainly adopts a chain table mode to manage the small idle memories, so that a large number of applications or releases of the memories can be completed in a very short time, and the time complexity is O (1).

In addition, in practical application, in combination with a multithreading technology, the embodiment adopts a read-write double-thread processing mode, so that the write operation is not influenced when a thread is read to operate a data table, and the change of data and the read data are recorded at the highest speed; in combination with a breakpoint resume mode, the method is used for processing the exception of process exit occurring in the process of processing data by the user layer, in order to efficiently utilize the transmitted data not to be retransmitted, a 16MB cache space is designed in the driver, when the user receives the 16MB data, the user processes the data, the data cached by the 16MB in the kernel is always stored before the user layer does not confirm, and if the user layer confirms that the uploaded data is completely processed, the 16MB cache data can be deleted, and the 16MB cache is filled with new data. If the user layer exits halfway, the driver will reclaim the data in the 16MB buffer memory if detecting the exit of the application layer, and resend the data when the user layer restarts reading again.

Claims

1. A file data block change information monitoring and management system is characterized by comprising a client (1) and a driving unit (2), wherein the client (1) is connected with the driving unit (2) through a CDP manager (3), the driving unit (2) is respectively connected with an operating system data interface and a memory, and the CDP manager (3) is used for realizing the transmission of data information between the client (1) and the driving unit (2);

the drive unit (2) is used for capturing file data block change information from an operating system, storing the captured information into a memory, and transmitting the captured information to the client (1) through the CDP manager (3);

the client (1) is used for initiating a data reading or catalog monitoring task, receiving information captured by the driving unit (2) from an operating system, and performing data backup processing;

the drive unit (2) comprises a memory allocation module and a directory binary tree generation unit which are respectively connected with the memory, the directory binary tree generation unit is also connected with an operating system data interface, the memory allocation module is used for acquiring a space for storing file data block change information from the memory, and the directory binary tree generation unit is used for capturing the name, position and size change data of the file data block from the operating system and generating a corresponding directory binary tree to be stored in the memory as the file data block change information.

2. A file data block change information monitoring and managing method applying the file data block change information monitoring and managing system according to claim 1, characterized by comprising the steps of:

s1, a client initiates a directory monitoring task and transmits an initiated directory monitoring task request to a drive unit through a CDP manager;

3. The method according to claim 2, wherein the request of the directory monitoring task includes a file path to be traced.

4. The method for monitoring and managing the file data block change information according to claim 3, wherein the step S2 specifically comprises the following steps:

s22, when IO under the path of the file to be traced is operated in the operating system, the driving unit calculates the hash value of the path of the file to be traced to store the hash value in the corresponding bitmap chain table to obtain the change information of the file data block, and stores the change information in the memory in a directory binary tree mode.

5. The method as claimed in claim 4, wherein the bitmap linked list adopts a skip list structure to insert and obtain bitmaps quickly, the bitmap linked list includes a plurality of bitmap offsets and corresponding bitmap pointers, and the plurality of bitmap pointers together form a file name pointer.

6. The method for monitoring and managing file data block change information according to claim 4, wherein the step S22 specifically includes the following steps:

7. The method according to claim 6, wherein the IO in the file path to be traced includes a file name modification, a start address of a data change, and a length of the data change.

8. The method according to claim 6, wherein the specific process of the memory fragmentation processing is as follows: the method comprises the steps of constructing a corresponding main array according to preset memory allocation space capacity, wherein each array entry node in the main array corresponds to a space block, each space block consists of an addressing head and a data memory, the space blocks jointly form an idle linked list, a pointer used for pointing to a first idle node in the idle linked list is further arranged in the main array, and when a driving unit applies for allocating memory space to the memory, the space blocks are allocated through the pointer.

9. The method for monitoring and managing the change information of the file data block according to claim 6, wherein the binary directory tree has a specific structure including a parent node, a left child node and a right child node, the left child node points to a child directory or a child file, and the right child node points to a sibling directory or a sibling file of the same level.