CN113254394A

CN113254394A - Snapshot processing method, system, equipment and storage medium

Info

Publication number: CN113254394A
Application number: CN202110529051.3A
Authority: CN
Inventors: 赵鑫
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-05-14
Filing date: 2021-05-14
Publication date: 2021-08-13
Anticipated expiration: 2041-05-14
Also published as: CN113254394B

Abstract

The invention discloses a snapshot processing method, a system, a device and a storage medium, comprising the following steps: the method comprises the steps that a file name list is built by using the names of all files in a data engine, the file name list is stored in a metadata file under a snapshot directory, and meanwhile, the log number of the current moment is recorded; when an operation instruction sent by a requester is detected, file deletion operation, file copying operation or file recovery operation of the distributed storage system using the consistency algorithm is performed based on the principle that the content of the data engine remains unchanged before and after the data engine performs replay logging.

Description

Snapshot processing method, system, equipment and storage medium

Technical Field

The invention belongs to the technical field of storage, and relates to a snapshot processing method, a snapshot processing system, snapshot processing equipment and a storage medium.

Background

A distributed storage system using a consistency algorithm generally divides replica groups within a cluster, and then each replica group uses the consistency algorithm to keep data carried by members in the same group completely the same, specifically, all client IO are encapsulated into logs, and then the logs are distributed within the replica groups, and a replica group member additionally writes the received logs into its own log file, and at the same time, restores the logs into specific operations and applies the operations to its own data engine (i.e., persistently stored data). Therefore, if one or more (less than the minimum majority of the copy groups) members are disconnected in the midway, only missing logs can be transmitted to the members after the members are restarted, and the logs can be replayed to catch up authoritative logs of the copy groups, so that the data can be leveled and the consistency is achieved.

But the log cannot grow indefinitely or otherwise takes up twice the space. Therefore, the consistency algorithm also uses the snapshot to solidify the data engine at a certain moment, the logs before the snapshot recording time point can be deleted after the data engine is snapshot, and space is released, so that once replication group members needing to restore data appear, whether the replication group members can be restored through the logs is judged firstly, if the vacancy is too large and exceeds the existing authoritative log length, the snapshots are required to be copied to the node in a full amount, a leader of a generally-default replication group is authoritative, the leader transmits data to the members when the data is restored, and then the data is restored through the logs. Therefore, a method for making a snapshot of the data engine is needed in the storage system using the consistency-like algorithm.

Traditional incremental snapshot algorithms, for example, algorithms such as copy-on-write or redirect-on-write are complex to implement, it is difficult to implement the algorithms by satisfying indexes such as high performance and small occupied space, and as long as snapshot is performed on data in a data engine, additional disk operation is required, the bandwidth occupation on a bottom file system is high, and further a certain occupation on a service bandwidth occurs, which is very disadvantageous in some service scenarios sensitive to performance.

Disclosure of Invention

The present invention is directed to overcome the drawbacks of the prior art, and provides a snapshot processing method, system, device, and storage medium, which can remove additional operations to a disk, occupy a lower bandwidth of a bottom file system, and avoid crowding a service bandwidth.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, the present invention provides a snapshot processing method, including:

the method comprises the steps that a file name list is built by utilizing the names of all files in a data engine, and the file name list is stored in a metadata file under a snapshot directory;

when a data copying operation instruction sent by a requester is detected, sending a metadata file to be copied to the requester, wherein the metadata file to be copied stores file names of all files to be copied in the data copying operation instruction; when a file pulling request sent by a requester is detected, sending a pulled file to be copied to the requester according to the file pulling request;

when a file recovery operation instruction sent by a requester is detected, downloading and copying a file from a data engine according to the data recovery operation instruction sent by the requester, creating a snapshot directory, and constructing a metadata file to be recovered, wherein the metadata file to be recovered stores the name of the file obtained by downloading and copying, and the metadata file to be recovered is stored in the created snapshot directory and then carries out log replay on the data engine.

Further comprising: when a deletion operation instruction sent by a requester is detected, a file with the same name as the file to be deleted in the deletion operation instruction and blank is created in the snapshot directory, and then the file to be deleted in the data engine is deleted.

And when a data copying operation instruction sent by a requester is detected, sending a metadata file to be copied to the requester according to the data copying operation instruction under the snapshot directory.

When a file pulling request sent by a requester is detected, a file is searched from the data engine according to the file name in the file pulling request, and then the searched file is sent to the requester.

When a file pulling request sent by a requester is detected, searching a file from a data engine according to the file name in the file pulling request, and when a file which is the same as the file name in the file pulling request is searched in the data engine, sending the searched file to the requester; when the file with the same name as the file in the file pulling request is not found in the data engine, the file with the same name as the file in the file pulling request and being blank is found in the snapshot directory, and then the file with the same name and being blank is sent to the requester.

When a data recovery operation request sent by a requester is detected, downloading a file requested to be recovered by the data recovery operation request from a data engine, copying the file, and modifying the name of a download directory in the file downloading process;

creating a snapshot directory, constructing a metadata file to be restored, and storing the metadata file to be restored in the newly-created snapshot directory;

and deleting the directory of the data engine, modifying the downloaded directory into the directory of the data engine, and then replaying the log of the data engine to finish the recovery of the data.

In a second aspect, the present invention provides a snapshot processing system, including:

the system comprises a creating module, a snapshot directory and a storage module, wherein the creating module is used for constructing a file name list by using the names of all files in a data engine and storing the file name list in a metadata file under the snapshot directory;

the file copying operation module is used for sending a metadata file to be copied to a requester when a data copying operation instruction sent by the requester is detected, wherein the metadata file to be copied stores file names of all files to be copied in the data copying operation instruction; when a file pulling request sent by a requester is detected, sending a pulled file to be copied to the requester according to the file pulling request;

the file recovery operation module is used for downloading and copying files from the data engine according to the data recovery operation instruction sent by the requester when the file recovery operation instruction sent by the requester is detected, then creating a snapshot directory, and constructing a metadata file to be recovered, wherein the name of the file obtained by downloading and copying is stored in the metadata file to be recovered, and the metadata file to be recovered is stored in the created snapshot directory and then log playback is carried out on the data engine.

Further comprising:

and the file deleting operation module is used for creating a blank file with the same name as the file to be deleted in the deleting operation instruction in the snapshot directory and then deleting the file to be deleted in the data engine when the deleting operation instruction sent by the requester is detected.

In a third aspect, the present invention provides a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the snapshot processing method when executing the computer program.

In a fourth aspect, the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the snapshot processing method.

The invention has the following beneficial effects:

the snapshot processing method, the system, the equipment and the storage medium construct the file name list by using the names of all files in the data engine during specific operation, store the file name list in the metadata file under the snapshot directory, and when carrying out file copying operation or file recovery operation, the file copying and recovery operation can be realized only by transmitting the snapshot file and then executing log playback, so that additional operation on a disk is not needed, namely the bandwidth occupation of a bottom layer file system is reduced to the minimum during snapshot, and the occupation of service bandwidth is reduced to the maximum extent.

Furthermore, when deleting operation is carried out, the files are deleted, meanwhile, the empty files with the same name are stored in the snapshot directory, and time consumption for downloading data during data recovery is reduced.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic structural view of the present invention;

FIG. 2 is a schematic diagram of a snapshot processing system in accordance with the present invention;

FIG. 3 is a flow chart of the present invention during a recovery operation.

Wherein, 1 is a creation module, 2 is an operation module, 21 is a file deletion operation module, 22 is a file copying operation module, 23 is a file recovery operation module, 221 is a first acquisition module, 222 is a second acquisition module, 223 is a push module, 231 is a third acquisition module, 232 is a storage module, and 233 is a log replay module.

Detailed Description

The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The following detailed description is exemplary in nature and is intended to provide further details of the invention. Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention.

As is known, a snapshot mainly performs online data backup and recovery, and when an application failure or a file damage occurs in a storage device, a fast data recovery can be performed to recover the data to a state at a certain available time point, and another snapshot is used to provide another data access channel for a storage user.

The principle of the invention is that the characteristic that the content is kept unchanged after the log is replayed by the data engine is utilized, and the log replay exists in the storage system using the consistency algorithm, so that the data content at each time point does not need to be strictly solidified, for example, a file in a copy group is executed with snapshot to generate a snapshot file, the snapshot file is completely the same as the original file at the time point, and then the engine file is updated by the service along with the service, so that the engine file is different from the snapshot file. When a member needing to restore data appears subsequently, the snapshot file is only required to be sent to the past, and then the log playback is executed, the method is completely the same as the method of passing the file in the data engine and then executing the log playback effect, and finally the obtained data engine file, namely the data engine file, is idempotent to the log playback operation.

Example one

Referring to fig. 1, the snapshot processing method according to the present invention includes:

the method comprises the steps that a file name list is built by using the names of all files in a data engine, the file name list is stored in a metadata file under a snapshot directory, and meanwhile, the log number of the current moment is recorded;

specifically, when a deletion operation instruction sent by a requester is detected, a file with the same name as the file to be deleted in the deletion operation instruction and a blank name is created in a snapshot directory, and then the file to be deleted in the data engine is deleted.

For example: and when the name of the file to be deleted is a file, creating the file in the snapshot directory, and deleting the file to be deleted in the data engine.

It should be noted that, the blank file with the same name is saved while the file is deleted, which is beneficial to reducing the time consumption of downloading data during data recovery.

Example two

the file copying operation of the distributed storage system using the consistency algorithm is carried out based on the principle that the content of the data engine remains unchanged before and after the data engine carries out the replay log, and the specific process is as follows:

when a data copying operation instruction sent by a requester is detected, sending a metadata file to be copied to the requester according to the data copying operation instruction under a snapshot directory;

And for the requester, the requester sequentially sends file pulling requests of the files according to the file names in the received metadata files to be copied, so that the files are pulled sequentially.

For example, the names of files that the requester needs to acquire are file1, file2, and file3, a data copy operation instruction is sent to the leader, the manager leader sends the metadata file to the requester, wherein the metadata file to be copied stores file names file1, file2, and file3, the requester obtains the metadata file to be copied, and sends file requests for pulling names file1, file2, and file3 to the manager leader in sequence, and after receiving the request, the manager leader searches file1, file2, and file3 in sequence in the data engine, and then sends the file requests to the requester in sequence.

It should be noted that, when the corresponding file is not found in the data engine, the file with the same file name and blank name is sent to the requesting party, wherein the content of the blank file is blank, which will not cause the following operation to be incorrect, and at the same time, the requesting party can be informed that the file is not found in the data engine, and in addition, since the file has been deleted, the file will be finally deleted after the replay log is replayed, and then the transmission of the blank file can save the network consumption.

EXAMPLE III

the file recovery operation of the distributed storage system using the consistency algorithm is carried out based on the principle that the content of the data engine remains unchanged before and after the data engine carries out the replay of the log, and the specific process is as follows:

when a data recovery operation request sent by a requester is acquired, downloading a file requested to be recovered by the data recovery operation request from a data engine, copying the file, and modifying the name of a download directory in the file downloading process;

Referring to fig. 3, for example, the administrator obtains a data recovery operation command sent by the requestor, where the operation command is used to recover the file1, the file2, and the file3, first downloads and copies the file1, the file2, and the file3 in the data engine, creates a snapshot directory, stores a file name list in which the file1, the file2, and the file3 are stored in the newly created snapshot meta file, deletes the directory of the data engine, modifies the download directory into the directory of the data engine, and performs log replay on the data engine, so as to write the file in the data engine.

It should be noted that the present invention is mainly a solution specially designed for a storage system using a distributed consistency algorithm in this scenario, and compared with a conventional snapshot solution, the present invention is simple and reliable to implement, and can basically eliminate consumption of a consistency algorithm on a bottom disk resource and disturbance on a service when a data engine is snapshot.

Example four

Referring to fig. 2, the snapshot processing system according to the present invention includes:

the system comprises a creating module 1, a snapshot directory and a storage module, wherein the creating module 1 is used for constructing a file name list by using the names of all files in a data engine and storing the file name list in a metadata file under the snapshot directory;

and the operation module 2 is used for performing file deletion operation, file copying operation or file recovery operation of the distributed storage system using the consistency algorithm based on the principle that the content of the data engine is kept unchanged before and after the data engine performs replay logging when the operation instruction sent by the requester is detected.

The operation module 2 includes:

the file deleting operation module 21 is configured to, when a deleting operation instruction sent by the requester is detected, create a file with the same name as the file to be deleted in the deleting operation instruction and being blank in the snapshot directory, and delete the file to be deleted in the data engine;

the file copying operation module 22 is configured to send a metadata file to be copied to the requester when a data copying operation instruction sent by the requester is detected, where file names of all files to be copied in the data copying operation instruction are stored in the metadata file to be copied; when a file pulling request sent by a requester is detected, sending a pulled file to be copied to the requester according to the file pulling request;

the file recovery operation module 23 is configured to, when a file recovery operation instruction sent by a requester is detected, perform downloading and copying of a file from the data engine according to the data recovery operation instruction sent by the requester, create a snapshot directory, and construct a metadata file to be recovered, where a name of the file obtained by downloading and copying is stored in the metadata file to be recovered, store the metadata file to be recovered in the created snapshot directory, and perform log playback on the data engine.

The file copy operation module 22 includes:

the first obtaining module 221 is configured to detect a data copy operation instruction sent by a requester, and send a metadata file to be copied to the requester according to the data copy operation instruction in a snapshot directory;

a second obtaining module 222, configured to obtain a file pulling request sent by a requestor;

the pushing module 223 is configured to search for a file from the data engine according to the file name in the pull file request, and when a file with the same file name as the file name in the pull file request is found in the data engine, send the found file to the requester; when the file with the same name as the file in the file pulling request is not found in the data engine, the file with the same name as the file in the file pulling request and being blank is found in the snapshot directory, and then the file with the same name and being blank is sent to the requester.

The file restore operation module 23 includes:

a third obtaining module 231, configured to detect a data recovery operation request sent by a requestor, download, from a data engine, a file requested to be recovered by the data recovery operation request, copy the file, and modify a name of a download directory in a file downloading process;

the storage module 232 is configured to create a snapshot directory, construct a metadata file to be restored, and store the metadata file to be restored in the newly created snapshot directory;

and a log replay module 233, configured to delete the directory of the data engine, modify the download directory into the directory of the data engine, and replay the log of the data engine to complete data recovery.

EXAMPLE five

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the snapshot processing method when executing the computer program, wherein the memory may comprise a memory, such as a high-speed random access memory, and may further comprise a non-volatile memory, such as at least one disk memory, etc.; the processor, the network interface and the memory are connected with each other through an internal bus, wherein the internal bus can be an industrial standard system structure bus, a peripheral component interconnection standard bus, an extended industrial standard structure bus and the like, and the bus can be divided into an address bus, a data bus, a control bus and the like. The memory is used for storing programs, and particularly, the programs can comprise program codes which comprise computer operation instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

EXAMPLE six

A computer-readable storage medium, storing a computer program which, when executed by a processor, implements the steps of the snapshot processing method, in particular including but not limited to volatile memory and/or non-volatile memory, for example. The volatile memory may include Random Access Memory (RAM) and/or cache memory (cache), among others. The non-volatile memory may include a Read Only Memory (ROM), hard disk, flash memory, optical disk, magnetic disk, and the like.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A snapshot processing method, comprising:

2. The snapshot processing method of claim 1, further comprising: when a deletion operation instruction sent by a requester is detected, a file with the same name as the file to be deleted in the deletion operation instruction and blank is created in the snapshot directory, and then the file to be deleted in the data engine is deleted.

3. The snapshot processing method according to claim 1, wherein when a data copy operation instruction sent by a requestor is detected, a metadata file to be copied is sent to the requestor according to the data copy operation instruction under a snapshot directory.

4. The snapshot processing method of claim 1, wherein when a file pulling request sent by a requestor is detected, a file is searched from a data engine according to a file name in the file pulling request, and the searched file is sent to the requestor.

5. The snapshot processing method according to claim 3, wherein when a file pulling request sent by a requestor is detected, a file is searched from the data engine according to a file name in the file pulling request, and when a file identical to the file name in the file pulling request is found in the data engine, the searched file is sent to the requestor; when the file with the same name as the file in the file pulling request is not found in the data engine, the file with the same name as the file in the file pulling request and being blank is found in the snapshot directory, and then the file with the same name and being blank is sent to the requester.

6. The snapshot processing method according to claim 1, wherein when a data recovery operation request sent by a requester is detected, a file requested to be recovered by the data recovery operation request is downloaded from a data engine and copied, and then a name of a download directory in a file download process is modified;

7. A snapshot processing system, comprising:

the system comprises a creating module (1) and a snapshot directory, wherein the creating module is used for constructing a file name list by using the names of all files in a data engine and storing the file name list in a metadata file under the snapshot directory;

the file copying operation module (22) is used for sending a metadata file to be copied to a requester when a data copying operation instruction sent by the requester is detected, wherein the metadata file to be copied stores file names of all files to be copied in the data copying operation instruction; when a file pulling request sent by a requester is detected, sending a pulled file to be copied to the requester according to the file pulling request;

and the file recovery operation module (23) is used for downloading and copying files from the data engine according to the data recovery operation instruction sent by the requester when the file recovery operation instruction sent by the requester is detected, then creating a snapshot directory, and constructing a metadata file to be recovered, wherein the name of the file obtained by downloading and copying is stored in the metadata file to be recovered, and the metadata file to be recovered is stored in the created snapshot directory and then carries out log replay on the data engine.

8. The snapshot processing system of claim 7, further comprising:

and the file deleting operation module (21) is used for creating a file with the same name as the file to be deleted in the deleting operation instruction and being blank in the snapshot directory and then deleting the file to be deleted in the data engine when the deleting operation instruction sent by the requester is detected.

9. Computer device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that said processor realizes the steps of the snapshot processing method according to any one of claims 1 to 6 when executing said computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the snapshot processing method of one of the claims 1 to 6.