CN117076417B

CN117076417B - File snapshot implementation method and device, computer equipment and storage medium

Info

Publication number: CN117076417B
Application number: CN202311337820.5A
Authority: CN
Inventors: 孟祥奎; 李盈; 马德川
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2024-02-06
Anticipated expiration: 2043-10-16
Also published as: CN117076417A

Abstract

The invention relates to the technical field of computers, and discloses a file snapshot implementation method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring data sent by an upper layer, and analyzing the data; when the type of the data is determined to be the target type, performing secondary analysis to obtain the data input/output type; when the type is determined to be the data type, storing the data in a first preset storage area; or when the type is determined to be the metadata type, storing the data in a second preset storage area; when the storage space occupied by the data in the first preset storage area is determined to meet the preset condition, performing a disc-dropping operation on the data; after determining that the data in the first preset storage area is executed to finish the data disc-dropping operation, executing the disc-dropping operation on the metadata in the second preset storage area; and executing snapshot operation on the data in the first preset storage area according to the preset trigger time, and executing snapshot operation on the metadata and the data in the data storage device.

Description

File snapshot implementation method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for implementing a file snapshot, a computer device, and a storage medium.

Background

For network attached storage (Network Attached Storage, NAS) storage devices, the file system is the basic structure for providing services to the outside and is also a logically direct carrier of user data. It is important for users to ensure accuracy and high availability of data, as snapshot techniques have evolved. When performing snapshot operations, a freeze operation needs to be performed on the file system. After freezing, all Input/Output (IO) of the file system will be paused, thereby causing service interruption of the file system and zero drop of IO, and seriously affecting the use experience of users.

Disclosure of Invention

In view of the above, the present invention provides a method, an apparatus, a computer device, and a storage medium for implementing a file snapshot, which execute a snapshot operation of a file system on the premise of ensuring that the file system is not frozen.

In a first aspect, the present invention provides a method for implementing a file snapshot, where the method includes: after data sent by an upper layer are acquired, the data are analyzed;

When the type of the data is determined to be the target type, performing secondary analysis on the data to obtain the data input/output type;

when the data input/output type is determined to be the data type, storing the data in a first preset storage area;

or, when the data input/output type is determined to be the metadata type, storing the data in a second preset storage area;

when the storage space occupied by the data in the first preset storage area meets the preset condition, executing the disc-dropping operation on the data, and storing the data in the first preset storage area into a third preset storage area of the data storage device;

after the data in the first preset storage area is determined to be executed and the data disc-dropping operation is completed, executing the disc-dropping operation on the metadata in the second preset storage area, and storing the metadata in the second preset storage area into a fourth preset storage area of the data storage device;

and executing snapshot operation on the data in the first preset storage area according to the preset trigger time, and executing snapshot operation on the metadata and the data in the data storage device.

The file snapshot implementation method provided by the invention has the following advantages:

After the cache device obtains the data sent by the upper layer, analyzing the data to obtain a data type, and if the data type is a target type, performing secondary analysis on the data to obtain a data Input/Output (IO) type. When the data IO type is the data type, the data is stored in the first preset storage area, or when the data IO type is the metadata type, the data is stored in the second preset storage area. That is, when data caching is performed, data and metadata are stored separately. And preferentially acquire the data of the data type sent by the upper layer, and then receive the metadata corresponding to the data of the data type after receiving the data of the data type. And when the storage space occupied by the data in the first preset storage area meets the preset condition, executing the disc-dropping operation on the data, and storing the data in the first preset storage area into a third preset storage area of the data storage device. And after the data in the first preset storage area is determined to be executed and the data disc-dropping operation is completed, executing the disc-dropping operation on the metadata in the second preset storage area, and storing the metadata in the second preset storage area into a fourth preset storage area of the data storage device. And executing snapshot operation of the data in the first preset storage area according to the preset trigger time, and executing snapshot operation of the metadata and the data in the data storage device.

In the above manner, after the data to be written is completely flushed into the cache, metadata corresponding to the data to be written is flushed into the cache. In a similar manner, data to be written is preferentially dropped into the data storage device, and then metadata is dropped into the data storage device. And when executing the snapshot, the snapshot operation is only carried out on the data in the cache device and all the data in the data storage device, and by adopting the mode, the snapshot operation is executed at any time on the premise of not freezing the file system, the condition that the metadata is not brushed or is not brushed in the cache device because the data is brushed is avoided, and the condition that the metadata and the data in the snapshot cannot be kept consistent is naturally avoided. Furthermore, the method can not happen that the file system is frozen to cause the conditions of service suspension, IO zero drop and the like.

In an alternative embodiment, the target type is a network attached storage type.

In an optional implementation manner, after determining that the data in the first preset storage area is subjected to the data disc-dropping operation, performing the disc-dropping operation on the metadata in the second preset storage area, so as to store the metadata in the second preset storage area into a fourth preset storage area of the data storage device, and specifically includes:

After the data in the first preset storage area is determined to finish the data disc-dropping operation at the first moment, performing the disc-dropping operation on the metadata in the second preset storage area in a preset time range by taking the first moment as a reference, and storing the metadata in the second preset storage area into a fourth preset storage area of the data storage device, wherein the first moment is any moment;

or when it is determined that the data in the first preset storage area is subjected to the data disc-dropping operation at the first moment, immediately performing the disc-dropping operation on the metadata in the second preset storage area, so as to store the metadata in the second preset storage area into a fourth preset storage area of the data storage device.

Specifically, when it is determined that the data in the first preset storage area has completed the data disc-dropping operation at the first moment, taking the first moment as a reference, it is necessary to execute the disc-dropping operation on the metadata within a preset time range, so as to complete the disc-dropping operation on all the data in a short time, and ensure that the consistency of the data can be ensured when the snapshot operation is executed. In the same way, the metadata may be immediately subjected to the disc-dropping operation after the data in the first preset storage area is determined to be executed at the first moment to complete the disc-dropping operation.

In an alternative embodiment, the ratio between the storage space of the first preset storage area and the storage space of the second preset storage area is greater than or equal to a preset threshold.

In particular, the storage space occupied by the metadata itself is small, and the storage space occupied by the data is necessarily large. Therefore, metadata and data may be stored separately, but in order to ensure that there is sufficient storage space to store the data, it is required that the ratio between the storage space of the first preset storage area and the storage space of the second preset storage area is greater than or equal to a preset threshold. In addition, the data and the metadata are stored in an isolated mode, and the cache hit rate and the read-write efficiency of the metadata can be improved.

In an alternative embodiment, the data input/output type is determined based on the type of package of the data structure of the data.

In an alternative embodiment, the method further comprises:

periodically scanning the cache amount of the data stored in the first preset storage area by taking the first unit time period as a period; and periodically scanning the cache amount of the data stored in the second preset storage area by taking the second unit time period as a period; wherein the first unit time period is less than the second unit time period;

Executing the disc-dropping operation on the data until the storage space occupied by the data in the first preset storage area is determined to meet the preset condition according to the cache quantity of the data stored in the first preset storage area;

and after determining that the data in the first preset storage area is executed to finish the data disc-dropping operation, executing the disc-dropping operation on the metadata in the second preset storage area.

Specifically, when the data drop operation is performed, in order to enable the data in the cache to be dropped to the storage device as soon as possible, a polling time may be set, and the data storage amount in the storage device may be periodically checked. For example, the amount of buffering of data stored in the first preset storage area is scanned once every first unit time period, and the amount of buffering of metadata in the second preset storage area is scanned once every second unit time period. The first unit time period is smaller than the second unit time period, because the amount of data buffered is much larger than the amount of metadata buffered, and the amount of data buffered may reach the preset (landing) condition relatively quickly. So queries are more frequent.

In an alternative embodiment, the preset conditions include: the ratio of the memory space occupied by the buffer amount of the data stored in the first preset memory area to the total memory space of the first preset memory area is greater than or equal to a preset ratio threshold.

In an alternative embodiment, the method further includes, before acquiring the data sent by the upper layer and analyzing the data:

after the data to be written is obtained, the data to be written is analyzed, and metadata and data content are obtained;

creating a first data structure for the metadata, and generating identification information corresponding to the first data structure;

packaging the first data structure body and the identification information into metadata type, and storing the metadata type and the metadata type in a fifth preset storage area of the memory to generate an address pointer corresponding to the fifth preset storage area;

creating a second data structure body according to the address pointer, the identification information and the data corresponding to the fifth preset storage area;

packaging the second data structure body into a sixth preset storage area which is stored in the memory after the data type, and storing the data content into a seventh preset storage area in the memory;

after caching the data content in the seventh preset storage area, caching the metadata in the fifth preset storage area.

Specifically, after receiving the data to be written, the data to be written is analyzed to obtain metadata and data content. Metadata here refers to data of the metadata type described in the foregoing. And the data content refers to data of the data type described in the foregoing. A first data structure is then created for the metadata and identification information corresponding to the first data structure is generated. According to the first data structure, a memory space is applied for in the memory, namely, a fifth preset memory area is used for storing metadata and corresponding parameter information, namely, the first data structure is stored. Meanwhile, an address pointer corresponding to the fifth preset storage region is generated. A second data structure is created based on the address pointer, the identification information, and the data. And applying for a storage space in the memory by using the second data structure body, namely, a sixth preset storage area and a seventh preset storage area, so as to store the second metadata structure body and the data content respectively. An association relationship between data content and metadata is established through the first data structure and the second data structure. And preferentially sending the data content in the seventh preset storage area to the cache equipment, and then sending the metadata in the fifth preset storage area to the cache equipment.

In an alternative embodiment, the seventh preset storage area includes a plurality of unit storage spaces, each unit storage space stores one data block, and the data blocks in the plurality of unit storage spaces constitute data content; the method further comprises the steps of:

determining a first number of data blocks according to the number of unit storage spaces;

counting a second number of data blocks that have been currently sent to the caching device;

and constructing a third data structure body according to the identification information, the first quantity, the second quantity, the first indication information and the second indication information, wherein the first indication information is used for indicating whether a data block is sent to the cache device or not, and the second indication information is used for indicating whether metadata is sent to the cache device or not.

Specifically, whether all data blocks have been transmitted to the caching device is determined by counting the total number of data blocks and the number of databases transmitted to the caching device, so that after all data blocks are transmitted to the caching device, metadata transmission to the caching device is performed. The third data structure body also comprises first indication information for indicating whether the data block is sent to the cache device or not, and second indication information for indicating whether the metadata is sent to the cache device or not. The identification information is used for establishing a mapping relation between the metadata and the data content.

In an alternative embodiment, the method further comprises:

the value of the second number in the third data structure is updated in real time according to the number of data blocks that have been sent to the caching device.

Specifically, the second number of quantity values is updated in real time so as to be able to timely confirm whether the data content has been completely flushed to the caching device.

In an alternative embodiment, the first data structure includes at least: file structure information, a first address of the metadata written in the fifth preset storage area, length information of the metadata, and an address pointer to point to an object of each unit storage space.

In an alternative embodiment, the file structure information includes one or more of the following:

file name of file carrying metadata, creation time of file, size of file.

In an alternative embodiment, the second data structure includes at least:

a metadata file structure corresponding to metadata, an address pointer corresponding to each unit storage space, and a relationship link list for indicating a relationship between data blocks stored in different unit storage spaces.

In a second aspect, the present invention provides a device for implementing a snapshot of a file, where the device includes:

The acquisition module is used for acquiring data sent by an upper layer;

the analysis module is used for analyzing the data; when the type of the data is determined to be the target type, performing secondary analysis on the data to obtain the data input/output type;

the processing module is used for storing the data in the first preset storage area when the data input/output type is determined to be the data type; or, when the data input/output type is determined to be the metadata type, storing the data in a second preset storage area; when the storage space occupied by the data in the first preset storage area meets the preset condition, executing the disc-dropping operation on the data, and storing the data in the first preset storage area into a third preset storage area of the data storage device; after the data in the first preset storage area is determined to be executed and the data disc-dropping operation is completed, executing the disc-dropping operation on the metadata in the second preset storage area, and storing the metadata in the second preset storage area into a fourth preset storage area of the data storage device; and executing snapshot operation on the data in the first preset storage area according to the preset trigger time, and executing snapshot operation on the metadata and the data in the data storage device.

The file snapshot realizing device provided by the invention has the following advantages:

In an alternative embodiment, the processing module is specifically configured to:

In an alternative embodiment, the processing module is further configured to:

In an alternative embodiment, the apparatus further comprises: a creating module and a sending module;

the acquisition module is also used for acquiring data to be written;

the analysis module is also used for analyzing the data to be written to obtain metadata and data content;

the creation module is used for creating a first data structure body for the metadata and generating identification information corresponding to the first data structure body;

the processing module is further used for packaging the first data structure body and the identification information into metadata types and storing the metadata types in a fifth preset storage area of the memory to generate an address pointer corresponding to the fifth preset storage area;

the creation module is further used for creating a second data structure body according to the address pointer, the identification information and the data corresponding to the fifth preset storage area;

the processing module is also used for packaging the second data structure body into a sixth preset storage area which is stored in the memory after the data type, and storing the data content into a seventh preset storage area in the memory;

And the sending module is used for sending the metadata in the fifth preset storage area to the cache equipment after sending the data content in the seventh preset storage area to the cache equipment.

In an alternative embodiment, the seventh preset storage area includes a plurality of unit storage spaces, each unit storage space stores one data block, and the data blocks in the plurality of unit storage spaces constitute data content; the processing module is further used for:

the creation module is further configured to construct a third data structure according to the identification information, the first number, the second number, the first indication information and the second indication information, where the first indication information is used to indicate whether there is a data block to send to the cache device, and the second indication information is used to indicate whether the metadata is sent to the cache device.

Specifically, whether all data blocks have been transmitted to the caching device is determined by counting the total number of data blocks and the number of databases transmitted to the caching device, so that metadata transmission to the caching device is performed after all data blocks are transmitted to the caching device. The third data structure body also comprises first indication information for indicating whether the data block is sent to the cache device or not, and second indication information for indicating whether the metadata is sent to the cache device or not. The identification information is used for establishing a mapping relation between the metadata and the data content.

In an alternative embodiment, the processing module is further configured to:

The file structure information includes one or more of the following:

file name of file carrying metadata, creation time of file, size of file.

In an alternative embodiment, the second data structure includes at least:

In a third aspect, the present invention provides a computer device comprising: the memory and the processor are in communication connection with each other, the memory stores computer instructions, and the processor executes the computer instructions, so as to execute the file snapshot implementation method of the first aspect or any implementation mode corresponding to the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the file snapshot implementation method of the first aspect or any of the embodiments corresponding thereto.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a simple structure of data interaction among a file system, a cache device and a data storage device in the related art provided by the invention;

FIG. 2 is a schematic flow chart of a method for implementing a document snapshot according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating another method for implementing a file snapshot according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a simple structure of data interaction among a file system, a cache device and a data storage device according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a document snapshot implementation device according to an embodiment of the present invention;

fig. 6 is a schematic hardware structure of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

For NAS storage devices, the file system is a basic structure for providing services to the outside, and is also a logically direct carrier of user data. It is important for users to ensure accuracy and high availability of data, as snapshot techniques have evolved. A snapshot is to record the contents of a certain data set of the file system at a certain moment in time, similar to taking a snapshot of the instant impact of nature at a certain point in time using a camera. When the user wants to restore to the state when the snapshot is created, a restore operation can be performed by snapshot rollback.

For centralized NAS storage, it is often necessary to create a storage pool, then create a file system volume, and create a file system on the volume before using the file system. When writing data into a file system, the data will normally be written into a buffer area of the file system in the memory, and after being transferred through a protocol layer, the data will enter an upper layer buffer of a volume. And after the volume cache data reaches a certain specification, the data is dropped, and the data is written into the volume. For files in a file system, it is common to divide into two parts, a metadata part and a data part, a complete file only when the data of the two parts agree.

As shown in FIG. 1, when the data of the file system falls into the upper layer cache of the volume, the data can be considered to be written completely at this time, and can be fed back to the client, the data can be written completely, and then the cache automatically performs the data disk-dropping action. Therefore, the snapshot module generally performs a backup operation on data at an upper layer of the cache, performs a backup operation on the data in the cache and the data in the volumes at the same time, creates a data mapping table of the file system source volume and the snapshot target volume, and records the data state of the current file system volume.

For caching, metadata and data are not distinguished, and the data are only temporarily stored by receiving data transmitted by an upper layer. After the condition is met, the data is flushed down to the lower layer storage. Since the data of the file system is continuously transmitted, when a snapshot is created, a freezing action is usually required to be performed on the file system, and after freezing, all IOs of the file system are paused. The transmitted data falls into the cache, at the moment, a snapshot is created, the backup is the complete data of the file system, and when the snapshot rollback is carried out, the file system can continue to normally run. When the file system is not frozen, it may happen that metadata of the same file is already written from memory into the cache, since the data will be transferred continuously. And when the metadata is not written into the cache, or when the metadata written into the cache is incomplete, the data of the file system in the cache cannot keep semantic consistency, and the metadata and the data of the file system which are backed up at present cannot be guaranteed to be completely corresponding. When such a snapshot is used for rollback, metadata of the file exists and data does not exist, which may cause damage to the file system, and repair actions are required, resulting in data loss. However, the file system is frozen, which inevitably causes temporary service suspension and zero-drop of IO, and influences the use experience of users.

In order to solve the above-described problems, a file snapshot implementation embodiment is provided, and it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer method (computer device) including, for example, a set of computer executable instructions, and that although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different from that herein.

In this embodiment, a method for implementing a file snapshot is provided, and fig. 2 is a schematic structural diagram of a method for implementing a file snapshot provided in an embodiment of the present invention, as shown in fig. 2, where the method includes:

step S201, after data sent by an upper layer are acquired, the data are analyzed.

Specifically, after the data sent by the upper layer is obtained, the data is analyzed. The upper layer may be a memory in the file system, or may be other devices that may communicate with the cache device, etc. In a specific example of an embodiment of the present application, a file system and a caching device establish a communication connection. Thus, the upper layer referred to herein refers to memory in the file system. Then, after the memory has flushed the data to the cache device, the processor may parse the data using the cluster process and then determine the data type.

Step S202, when the type of the data is determined to be the target type, the data is subjected to secondary analysis, and the data input/output type is obtained.

In step S203, when it is determined that the data input/output type is the data type, the data is stored in the first preset storage area.

Or, in step S204, when it is determined that the data input/output type is the metadata type, the data is stored in the second preset storage area.

And preferentially acquiring the data of the data type, and receiving metadata corresponding to the data of the data type after the data of the data type is received.

In an alternative example, the target type is a NAS type. When the data type is determined to be the NAS type, the data is subjected to secondary analysis, and the IO type of the data is obtained.

In one particular example, the data IO type may include a metadata type and a data type. And when the IO type is the data type, storing the data in a first preset storage area. Or when the data IO type is the metadata type, the data is stored in a second preset storage area.

That is, the data and the metadata are stored separately. In the whole process of acquiring the data sent by the upper layer, it is required to understand that the memory can transmit the data content of the data type preferentially, and metadata can be transmitted after the transmission is finished.

The advantage of transmitting data preferentially and retransmitting metadata is that the snapshot operation is performed assuming that the metadata is not transmitted, because there is no metadata in the snapshot, and the data is not found from the metadata that is not backed up later naturally. There is no inconsistency between the metadata and the data. The metadata and the data are cached separately, and because the metadata data volume is small, the hit rate is higher and the read-write efficiency is higher when the metadata is searched later.

In step S205, when it is determined that the storage space occupied by the data in the first preset storage area satisfies the preset condition, a disc-drop operation is performed on the data, so as to store the data in the first preset storage area into a third preset storage area of the data storage device.

Step S206, after determining that the data in the first preset storage area is subjected to the data disc-dropping operation, performing the disc-dropping operation on the metadata in the second preset storage area, so as to store the metadata in the second preset storage area into a fourth preset storage area of the data storage device.

Specifically, when it is determined that the storage space occupied by the data in the first preset storage area meets the preset condition, the data is subjected to a disc-dropping operation, that is, the data is stored in a third preset storage area in the data storage device (for example, a volume or a storage pool). And after the data in the first preset storage area is determined to be executed and the data disc-dropping operation is completed, executing the disc-dropping operation on the metadata in the second preset storage area, so as to store the metadata in the second preset storage area in a fourth preset storage area of the data storage device. The reason for performing the data disc-dropping operation is that the data is preferentially brushed from the same memory, and the reason for brushing metadata is similar, and is not repeated here.

Step S207, performing snapshot operation on the data in the first preset storage area according to the preset trigger time, and performing snapshot operation on the metadata and the data in the data storage device.

Specifically, the trigger time of snapshot execution may be determined according to the actual situation, for example, snapshot operation may be periodically executed, or according to the actual needs, snapshot operation may be executed at any time or at any multiple unfixed times.

When creating the snapshot, since the metadata of the file is dropped behind the data portion, the metadata area is excluded, and only the data area and the dropped data in the volume are snapshot backed up. Because the metadata and the data are stored in the buffer memory in a distinguishing way, and the data part is earlier than the metadata part, the data of the snapshot backup cannot contain the existing metadata, but the data of the file system of the snapshot backup cannot be inconsistent without freezing under the condition of no data, and the influence of the freezing action of the file system on the service when the snapshot is created is avoided.

According to the file snapshot implementation method provided by the embodiment, after the cache equipment acquires the data sent by the upper layer, the data is analyzed to acquire the data type, and if the data type is the target type, the data is secondarily analyzed to acquire the data IO type. When the data IO type is the data type, the data is stored in the first preset storage area, or when the data IO type is the metadata type, the data is stored in the second preset storage area. That is, when data caching is performed, data and metadata are stored separately. And preferentially acquire the data of the data type sent by the upper layer, and then receive the metadata corresponding to the data of the data type after receiving the data of the data type. And when the storage space occupied by the data in the first preset storage area meets the preset condition, executing the disc-dropping operation on the data, and storing the data in the first preset storage area into a third preset storage area of the data storage device. And after the data in the first preset storage area is determined to be executed and the data disc-dropping operation is completed, executing the disc-dropping operation on the metadata in the second preset storage area, and storing the metadata in the second preset storage area into a fourth preset storage area of the data storage device. And executing snapshot operation of the data in the first preset storage area according to the preset trigger time, and executing snapshot operation of the metadata and the data in the data storage device.

In an alternative embodiment of the present invention, any one of the following trigger occasions may be employed when performing a landing operation on metadata. That is, after determining that the data in the first preset storage area is executed to complete the data disc-dropping operation, executing the disc-dropping operation on the metadata in the second preset storage area, so as to store the metadata in the second preset storage area into a fourth preset storage area of the data storage device, which specifically includes:

The preset time range is not set too large, and the purpose of the preset time range is to drop metadata onto the data storage device as soon as possible after the data is dropped, so that the snapshot operation can be performed on the metadata and the data corresponding to the metadata at the same time when the snapshot is performed. For example, a disc-drop operation of metadata is performed within 10 seconds starting with the first moment.

Of course, when the condition allows, it is also possible to immediately perform the disc-dropping operation on the metadata after determining that the data in the first preset storage area is executed at the first time to complete the data disc-dropping operation.

In an alternative embodiment of the present invention, on the one hand, the access frequency of the metadata is much higher than that of the data area, on the other hand, metadata and data of the file have great differences in volume, so that the metadata and the data are cached in an isolated manner, and the ratio between the storage space of the first preset storage area and the storage space of the second preset storage area is greater than or equal to the preset threshold. That is, the memory space of the first preset memory area storing the data is larger than the memory space of the second preset memory area storing the metadata. Moreover, in order to ensure that the metadata has sufficient buffer space in the case of ensuring that the data area has sufficient storage space, in an alternative example, the ratio between the storage space of the first preset storage area and the storage space of the second preset storage area may be set to 19:1. By the method, the cache hit rate of the metadata and the read-write efficiency can be improved.

In an alternative embodiment of the present invention, also because of the large difference in volume between the metadata and the data, a timer may be set for each of the metadata and the data to periodically scan the amount of buffering in the corresponding storage area when the data is dropped. Thus, the method further comprises:

In a specific example, for example, the amount of buffering of data stored in the first preset storage region is checked once every 100ms, and the amount of buffering of metadata stored in the second preset storage region is checked once every 2 s. Executing the disc-dropping operation on the data until the storage space occupied by the data in the first preset storage area is determined to meet the preset condition according to the cache quantity of the data stored in the first preset storage area; and after determining that the data in the first preset storage area is executed to finish the data disc-dropping operation, executing the disc-dropping operation on the metadata in the second preset storage area.

In an alternative embodiment of the present invention, the preset conditions may include: the ratio of the memory space occupied by the buffer amount of the data stored in the first preset memory area to the total memory space of the first preset memory area is greater than or equal to a preset ratio threshold.

As introduced above, the processor checks the cache amount of the data stored in the first preset storage area once every interval, for example, 100 ms. And executing the disc-dropping operation when the storage space occupied by the buffer quantity of the data in the first preset storage area exceeds a certain proportion value of the total capacity. In a specific example, the preset ratio threshold may be 75%, for example. I.e. when the buffer capacity of the data in the first preset storage area exceeds 75% of the total capacity of the first preset storage area, the data is actively flushed.

In this embodiment, a method for implementing a file snapshot is provided, and fig. 3 is a schematic structural diagram of another method for implementing a file snapshot provided in the embodiment of the present invention, as shown in fig. 3, before data sent by an upper layer is acquired and analyzed, the method further includes the following method steps:

step S301, after obtaining the data to be written, analyzing the data to be written to obtain metadata and data content.

The specific data analysis process is referred to in the existing mature technology, and is not described herein too much.

Step S302, a first data structure is created for metadata, and identification information corresponding to the first data structure is generated.

In a specific embodiment, the first data structure may include, for example:

file structure information, a first address of the metadata written in the fifth preset storage area, length information of the metadata, and an address pointer to point to an object of each unit storage space.

The file structure information may include, for example, one or more of the following:

file name of file carrying metadata, creation time of file, size of file.

The following is a specific example of the first metadata structure, see specifically below:

struct meta_data_file{

unsigned type；

unsigned64_t obj_id；

file_info_t file; information about the file structure, including file name, creation time, size, etc

unsigned64_t offset; position where data write begins

unsigned64_t length; length of data write

struct data_filedata; pointer to object recorded with data area

In step S303, the first data structure and the identification information are packaged into metadata type and then stored in a fifth preset storage area of the memory, so as to generate an address pointer corresponding to the fifth preset storage area.

Step S304, a second data structure body is created according to the address pointer, the identification information and the data corresponding to the fifth preset storage area.

Specifically, a mapping relationship between metadata and data is established through an address pointer and identification information corresponding to a fifth preset storage area. A second data structure is created based on the address pointer, the identification information, and the data.

In an alternative embodiment of the present invention, the second data structure includes at least:

In step S305, the second data structure is packaged as a sixth preset storage area in the memory after being stored as the data type, and the data content is stored in the seventh preset storage area in the memory.

In an alternative embodiment of the present invention, the seventh preset storage area includes a plurality of unit storage spaces, each of which stores one data block, and the data blocks in the plurality of unit storage spaces constitute data contents. In another alternative embodiment, the test It is considered that the seventh preset storage area includes a plurality of unit storage spaces each storing one data block, the data blocks in the plurality of unit storage spaces constituting data contents, and therefore a configuration parameter for constructing a doubly linked list for indicating a mapping relationship between metadata and each unit storage space is also included in the second metadata structure. Wherein void isdata is used to indicate an address pointer for each unit memory space.

The following is a specific example of the second metadata structure, see specifically below:

struct data_file{

unsigned type；

struct meta_data_filemeta_data; meta data structure of// direction

struct data_object{

struct data_objectnext，/>prev; use of// for building doubly linked lists

voiddata; pointer to data area

Data; structure for directing/pointing to data area

Wherein, struct meta_data_fileThe meta_data includes identification information in the metadata and information such as an address pointer corresponding to the fifth preset storage area.

In an alternative embodiment, to ensure that metadata is swiped after the completion of the data swipe, the method further comprises:

Specifically, one data block is stored in each unit storage space, and thus the first number of data blocks can be determined by the number of unit storage spaces. Then counting a second number of data blocks that have been currently sent to the caching device, and subsequently determining whether all data blocks have been sent to the caching device by comparing the second number with the first number. And after determining that the data block is already in the caching device, performing a swiping operation of the metadata.

In addition, the third data structure body further includes first indication information for indicating whether the data block is sent to the cache device, and second indication information for indicating whether the metadata is sent to the cache device. The identification information is used for establishing a mapping relation between the metadata and the data content.

In an alternative embodiment, the third data structure may be represented as follows:

structmeta_data_map{

string meta_id；

uint64_t data_count；

uint64_t cur_data_count；

bool meta_data_exist；

bool data_exist；

}

The meta_id records identification information of the first data structure body, and in a specific example, the identification information is id, and the id is randomly generated and is not repeated, so that metadata IO of the same data to be written and data IO are matched. The data_count is used for recording the number of data IOs in the current metadata IO, namely, the first number, the cur_data_count records the number of data IOs which are currently sent to the cache device, namely, the second number, the meta_data_exists indicates whether the metadata IO is sent to the cache or not, namely, the second indication information, the data_exists indicates whether the data IO is sent to the cache or not, namely, the first indication information, and both the meta_id and the data_count information are recorded when the metadata and the data are packaged.

In an alternative embodiment, the method further comprises:

The second quantity value is updated in real time, so that whether the data content is completely downloaded to the cache device can be timely confirmed.

Step S306, after the data content in the seventh preset storage area is sent to the cache device, the metadata in the fifth preset storage area is sent to the cache device.

That is, as introduced above, the data content is preferentially flushed down to the caching device, and then the metadata is flushed down to the caching device. After the file system analyzes the metadata and the data, the metadata and the data are stored in the memory in an isolated mode so that the metadata and the data can be separated and brushed later. The condition that metadata and data in the snapshot are inconsistent due to the fact that metadata is not brushed down or is not brushed down is prevented from happening when the snapshot is executed later.

Fig. 4 is a schematic diagram of a simple structure of data interaction among a file system, a cache device and a data storage device, specifically referring to fig. 4, the specific application example of the method for implementing a file snapshot provided in the present invention includes: file system, memory in file system, cache devices, and data storage devices (volumes, storage pools, etc.). As can be seen from fig. 4, in the file system, data and metadata are obtained after the data to be written is parsed; in the subsequent data transmission process, the data and the metadata are transmitted and stored separately. When performing the snapshot operation, then the snapshot operation is performed only on the data in the cache device and all the data in the data storage device. The specific implementation process is described in detail in the foregoing, so that no further description is given here.

In this embodiment, a device for implementing a file snapshot is provided, and fig. 5 is a schematic structural diagram of the device for implementing a file snapshot provided in the embodiment of the present invention, as shown in fig. 5, where the device includes: an acquisition module 501, a parsing module 502, and a processing module 503.

An obtaining module 501, configured to obtain data sent by an upper layer;

the parsing module 502 is configured to parse the data; when the type of the data is determined to be the target type, performing secondary analysis on the data to obtain the data input/output type;

a processing module 503, configured to store data in a first preset storage area when determining that the data input/output type is a data type; or, when the data input/output type is determined to be the metadata type, storing the data in a second preset storage area; when the storage space occupied by the data in the first preset storage area meets the preset condition, executing the disc-dropping operation on the data, and storing the data in the first preset storage area into a third preset storage area of the data storage device; after the data in the first preset storage area is determined to be executed and the data disc-dropping operation is completed, executing the disc-dropping operation on the metadata in the second preset storage area, and storing the metadata in the second preset storage area into a fourth preset storage area of the data storage device; and executing snapshot operation on the data in the first preset storage area according to the preset trigger time, and executing snapshot operation on the metadata and the data in the data storage device.

In an alternative embodiment, the processing module 503 is specifically configured to:

In an alternative embodiment, the processing module 503 is further configured to:

In an alternative embodiment, the apparatus further comprises: a creation module 504 and a transmission module 505;

the obtaining module 501 is further configured to obtain data to be written;

The parsing module 502 is further configured to parse the data to be written to obtain metadata and data content;

a creation module 504, configured to create a first data structure for metadata, and generate identification information corresponding to the first data structure;

the processing module 503 is further configured to package the first data structure and the identification information into a metadata type, and store the metadata type and the metadata type in a fifth preset storage area of the memory, and generate an address pointer corresponding to the fifth preset storage area;

the creating module 504 is further configured to create a second data structure according to the address pointer, the identification information, and the data corresponding to the fifth preset storage area;

the processing module 503 is further configured to encapsulate the second data structure body into a sixth preset storage area that is stored in the memory after the data type is set, and store the data content in the seventh preset storage area in the memory;

the sending module 505 is configured to send the metadata in the fifth preset storage area to the cache device after sending the data content in the seventh preset storage area to the cache device.

In an alternative embodiment, the seventh preset storage area includes a plurality of unit storage spaces, each unit storage space stores one data block, and the data blocks in the plurality of unit storage spaces constitute data content; the processing module 503 is further configured to:

the creation module 504 is further configured to construct a third data structure according to the identification information, the first number, the second number, the first indication information, and the second indication information, where the first indication information is used to indicate whether there is a data block to send to the cache device, and the second indication information is used to indicate whether the metadata is sent to the cache device.

The file structure information includes one or more of the following:

file name of file carrying metadata, creation time of file, size of file.

In an alternative embodiment, the second data structure includes at least:

The file snapshot implementing apparatus in this embodiment is presented in the form of functional modules, where the modules refer to application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASICs), processors and memories that execute one or more software or firmware programs, and/or other devices that provide the above described functions.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

According to the file snapshot realizing device provided by the embodiment of the invention, after the cache equipment acquires the data sent by the upper layer, the data is analyzed to acquire the data type, and if the data type is the target type, the data is secondarily analyzed to acquire the data IO type. When the data IO type is the data type, the data is stored in the first preset storage area, or when the data IO type is the metadata type, the data is stored in the second preset storage area. That is, when data caching is performed, data and metadata are stored separately. And preferentially acquire the data of the data type sent by the upper layer, and then receive the metadata corresponding to the data of the data type after receiving the data of the data type. And when the storage space occupied by the data in the first preset storage area meets the preset condition, executing the disc-dropping operation on the data, and storing the data in the first preset storage area into a third preset storage area of the data storage device. And after the data in the first preset storage area is determined to be executed and the data disc-dropping operation is completed, executing the disc-dropping operation on the metadata in the second preset storage area, and storing the metadata in the second preset storage area into a fourth preset storage area of the data storage device. And executing snapshot operation of the data in the first preset storage area according to the preset trigger time, and executing snapshot operation of the metadata and the data in the data storage device.

The embodiment of the invention also provides a computer device which is provided with the file snapshot realizing device shown in the figure 5.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 6, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multi-processor approach). One processor 10 is illustrated in fig. 6.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.

The memory 20 may include a storage program area that may store an operating method, an application program required for at least one function, and a storage data area; the storage data area may store data created from the use of the computer device of the presentation of a sort of applet landing page, and the like. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The computer device further comprises input means 30 and output means 40. The processor 10, memory 20, input device 30, and output device 40 may be connected by a bus or other means, for example in fig. 6.

The input device 30 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, a pointer stick, one or more mouse buttons, a trackball, a joystick, and the like. The output means 40 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. Such display devices include, but are not limited to, liquid crystal displays, light emitting diodes, displays and plasma displays. In some alternative implementations, the display device may be a touch screen.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. A method for implementing a document snapshot, the method comprising:

after data sent by an upper layer are acquired, the data are analyzed;

when the type of the data is determined to be the target type, carrying out secondary analysis on the data to obtain a data input/output type;

when the data input/output type is determined to be a data type, storing the data in a first preset storage area;

or when the data input/output type is determined to be a metadata type, storing the data in a second preset storage area;

when the storage space occupied by the data in the first preset storage area is determined to meet the preset condition, performing a disc-dropping operation on the data, wherein the disc-dropping operation is used for storing the data in the first preset storage area into a third preset storage area of a data storage device;

2. The method of claim 1, wherein the target type is a network attached storage type.

3. The method according to claim 1, wherein after determining that the data in the first preset storage area is subjected to the data placement operation, performing the data placement operation on the metadata in the second preset storage area to store the metadata in the second preset storage area in a fourth preset storage area of the data storage device, specifically includes:

after determining that the data in the first preset storage area completes the data disc-dropping operation at a first moment, performing the disc-dropping operation on the metadata in the second preset storage area in a preset time range by taking the first moment as a reference, and storing the metadata in the second preset storage area into a fourth preset storage area of the data storage device, wherein the first moment is any moment;

or when it is determined that the data in the first preset storage area is executed at the first moment to complete the data disc-dropping operation, immediately executing the disc-dropping operation on the metadata in the second preset storage area, so as to store the metadata in the second preset storage area into a fourth preset storage area of the data storage device.

4. The method of claim 1, wherein a ratio between a storage space of the first predetermined storage area and a storage space of the second predetermined storage area is greater than or equal to a predetermined threshold.

5. The method of claim 1, wherein the data input/output type is determined based on a package type of a data structure of the data.

6. The method according to any one of claims 1-5, further comprising:

periodically scanning the cache amount of the data stored in the first preset storage area by taking the first unit time period as a period; and periodically scanning the cache amount of the metadata stored in the second preset storage area by taking the second unit time period as a period; wherein the first unit time period is less than the second unit time period;

executing a disc-dropping operation on the data until the storage space occupied by the data in the first preset storage area is determined to meet the preset condition according to the cache quantity of the data stored in the first preset storage area;

7. The method according to any one of claims 1-5, wherein the preset conditions comprise: the ratio of the storage space occupied by the buffer quantity of the data stored in the first preset storage area to the total storage space of the first preset storage area is larger than or equal to a preset ratio threshold value.

8. The method according to any one of claims 1-5, wherein before the obtaining the data sent by the upper layer and parsing the data, the method further comprises:

after the data to be written is obtained, analyzing the data to be written to obtain metadata and data content;

packaging the first data structure body and the identification information into a fifth preset storage area of a memory after being of metadata type, and generating an address pointer corresponding to the fifth preset storage area;

creating a second data structure according to the address pointer corresponding to the fifth preset storage area, the identification information and the data;

a sixth preset storage area for storing the second data structure body into the memory after being packaged into the data type, and a seventh preset storage area for storing the data content into the memory;

And caching the metadata in the fifth preset storage area after caching the data content in the seventh preset storage area.

9. The method of claim 8, wherein the seventh predetermined storage area includes a plurality of unit storage spaces each storing one data block, the data blocks in the plurality of unit storage spaces constituting the data content; the method further comprises the steps of:

determining a first number of the data blocks according to the number of the unit storage spaces;

and constructing a third data structure body according to the identification information, the first quantity, the second quantity, the first indication information and the second indication information, wherein the first indication information is used for indicating whether data blocks are sent to the cache equipment or not, and the second indication information is used for indicating whether the metadata are sent to the cache equipment or not.

10. The method according to claim 9, wherein the method further comprises:

and updating the quantity value of the second quantity in the third data structure body in real time according to the quantity of the data blocks sent to the buffer equipment.

11. The method according to claim 9 or 10, wherein the first data structure comprises at least:

file structure information, a first address of the metadata written in the fifth preset storage area, length information of the metadata, and an address pointer to point to an object of each of the unit storage spaces.

12. The method of claim 11, wherein the file structure information includes one or more of the following:

file name of file carrying metadata, creation time of the file, and size of the file.

13. The method according to claim 9 or 10, characterized in that the second data structure comprises at least:

a metadata file structure corresponding to the metadata, an address pointer corresponding to each unit storage space, and a relation linked list for indicating the relation between the data blocks stored in different unit storage spaces.

14. A document snapshot realization apparatus, the apparatus comprising:

the acquisition module is used for acquiring data sent by an upper layer;

the analysis module is used for analyzing the data; when the type of the data is determined to be the target type, carrying out secondary analysis on the data to obtain the data input/output type;

The processing module is used for storing the data in a first preset storage area when the data input/output type is determined to be the data type; or when the data input/output type is determined to be a metadata type, storing the data in a second preset storage area; when the storage space occupied by the data in the first preset storage area is determined to meet the preset condition, performing a disc-dropping operation on the data, wherein the disc-dropping operation is used for storing the data in the first preset storage area into a third preset storage area of a data storage device; after the data in the first preset storage area is determined to be executed and the data disc-dropping operation is completed, executing the disc-dropping operation on the metadata in the second preset storage area, and storing the metadata in the second preset storage area into a fourth preset storage area of the data storage device; and executing snapshot operation on the data in the first preset storage area according to the preset trigger time, and executing snapshot operation on the metadata and the data in the data storage device.

15. A computer device, comprising: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the file snapshot implementation method of any of claims 1 to 13.

16. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the file snapshot implementation method of any of claims 1-13.