WO2024021492A1

WO2024021492A1 - Data recycling method and apparatus, electronic device, and storage medium

Info

Publication number: WO2024021492A1
Application number: PCT/CN2022/141825
Authority: WO
Inventors: 邓宇羽; 赵真; 范哲豪; 姚永坤; 吴淮
Original assignee: 天翼云科技有限公司
Priority date: 2022-07-29
Filing date: 2022-12-26
Publication date: 2024-02-01
Also published as: CN115757269A

Abstract

The present application discloses a data recycling method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring at least one target file which is obtained by detecting a metadata database within the current detection period and carries a deletion mark, wherein the deletion mark is used for indicating that data slices to be recycled are present in a backend storage file corresponding to the target file; traversing the target file to obtain data recycling information corresponding to said data slices, and writing the data recycling information into a target hash queue; acquiring said data slices from the backend storage file by using the data recycling information in the target hash queue; and generating a merge file on the basis of said data slices, and executing a data recycling operation by means of the merge file. According to the present application, the performance consumption of a system during data recycling is lower, I/O resources of core services of a disaster recovery program are ensured, the hardware cost is reduced, and the user experience is improved. The problems of large data cleaning amount and high cleaning time consumption of conventional disaster recovery software during data recycling are effectively solved.

Description

A data recovery method, device, electronic equipment and storage medium

Technical field

The present application relates to the field of computer technology, and in particular, to a data recovery method, device, electronic equipment and storage medium.

Background technique

With the development of the digital economy and society, various large enterprises are facing more and more business continuity challenges. At the same time, a large number of disaster recovery products have been born to deal with business interruptions caused by various accidents.

technical problem

The overall framework of traditional disaster recovery products is shown in Figure 1. The data from the production center is sent to the disaster recovery center through the Agent, and then split into metadata and data parts and written to the database and back-end cloud storage respectively. Since the production center of enterprise users generates a large amount of data every moment, and when a disaster occurs, enterprise users generally only choose to restore to the latest point in time, so expired data in back-end storage will occupy a lot of space and increase user costs.

However, the current traditional data recovery methods will affect the backup/recovery business, thereby affecting the disaster recovery time objective (Recovery Time Object abbreviation: RTO) and data recovery point objective (Recovery Point) of the entire disaster recovery product. Objective abbreviation: RPO).

Technical solutions

In order to solve the above technical problems or at least partially solve the above technical problems, this application provides a data recovery method, device, electronic equipment and storage medium.

According to one aspect of the embodiment of the present application, a data recovery method is provided, including:

Obtain at least one target file carrying a deletion mark that is detected in the metadata database during the current detection cycle, where the deletion mark is used to indicate that there are data fragments to be recycled in the back-end storage file corresponding to the target file;

Traverse the target file to obtain the data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue;

Using the data recycling information in the target hash queue to obtain the data fragments to be recycled from the backend storage file;

A merged file is generated based on the data fragments to be recycled, and the data recycling operation is performed through the merged file.

Further, the method also includes:

Obtain the configuration file, where the configuration file is obtained based on the operation status of the target disaster recovery software;

Use the configuration file to configure the number of collection threads, merge threads, and update threads.

Further, traversing the target file to obtain data recovery information corresponding to the data fragments to be recovered includes:

Call the collection thread to traverse the reference counts of all data fragments corresponding to the target file;

Decrement the reference count to obtain the decremented reference count, and use the data fragments with the decremented reference count as 0 as the data fragments to be recycled;

Obtain the data fragment identification and metadata corresponding to the data fragment to be recycled from the target file;

The data fragment identification and the metadata are determined as the data recycling information.

Further, writing the data recovery information into the target hash queue includes:

Perform a hash calculation on the metadata to obtain the calculation result;

The calculation result and the data fragmentation identifier are stored and written into the target hash queue in a key-value pair structure, where the calculation result is used as the key name in the key-value pair structure, and the data fragmentation identification is used as the key-value pair structure. The key value in the key-value pair structure.

Further, using the data recycling information in the target hash queue to obtain the data fragments to be recycled from the back-end storage file includes:

Call the merge thread to sequentially extract data fragmentation identifiers from the target hash queue;

Obtain the back-end storage file corresponding to the data fragment identification from the backup database, and download all data fragments included in the back-end storage file;

The data fragments matching the data fragment identifiers are determined as the data fragments to be recycled.

Further, the generation of merged files based on the data fragments to be recycled includes:

Insert the data fragments to be recycled into the merge queue;

The current storage amount of the merge queue is detected, and when the current storage amount reaches the upper limit of the storage amount, the merge file is generated based on the merge queue.

Further, the data recovery operation performed through the merged file includes:

Mobilize the update thread to upload the merged file, and modify the original data fragmentation offset corresponding to the data to be recycled in the merged file to obtain the target data fragmentation offset;

Write the target data fragment offset into the metadata server, and delete the backend storage file corresponding to the data to be recycled.

According to yet another aspect of the embodiment of the present application, a data recovery device is also provided, including:

The acquisition module is used to acquire at least one target file carrying a deletion mark that is detected in the metadata database during the current detection cycle, wherein the deletion mark is used to indicate that there is a pending file in the back-end storage file corresponding to the target file. Recycle data fragments;

A traversal module, configured to traverse the target file to obtain the data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue;

An extraction module, configured to use the data recycling information in the target hash queue to obtain the data fragments to be recycled from the back-end storage file;

A processing module, configured to generate a merged file based on the data fragments to be recycled, and perform data recycling operations through the merged file.

According to another aspect of the embodiment of the present application, a storage medium is also provided. The storage medium includes a stored program, and the above steps are executed when the program is run.

According to another aspect of the embodiment of the present application, an electronic device is also provided, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein: The memory is used to store computer programs; the processor is used to execute the steps in the above method by running the program stored in the memory.

Embodiments of the present application also provide a computer program product containing instructions that, when run on a computer, cause the computer to perform the steps in the above method.

beneficial effects

The method provided by the embodiments of this application can make the disaster recovery program consume less system performance during data recovery, ensure the IO resources for the core business of the disaster recovery product, reduce hardware costs, and improve user experience. It effectively solves the problems faced by traditional disaster recovery programs during data recovery: large amounts of data to be cleaned, long cleaning time, difficulty in concurrency with backup and recovery, and easy file fragmentation.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

In order to more clearly explain the embodiments of the present application or the technical solutions in the prior art, the following will briefly introduce the drawings needed to describe the embodiments or the prior art. Obviously, for those of ordinary skill in the art, It is said that other drawings can be obtained based on these drawings without exerting creative labor.

Figure 1 is a flow chart of a data recovery method provided by an embodiment of the present application;

Figure 2 is a block diagram of the overall data recovery system of the disaster recovery software provided by the embodiment of the present application;

Figure 3 is a block diagram of a data recovery module provided by an embodiment of the present application;

Figure 4 is a schematic diagram of the storage of data fragmentation identifiers provided by the embodiment of the present application;

Figure 5 is a block diagram of a data recovery device provided by an embodiment of the present application;

FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Embodiments of the invention

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments They are part of the embodiments of the present application, rather than all the embodiments. The illustrative embodiments of the present application and their descriptions are used to explain the present application and do not constitute an improper limitation of the present application. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

It should be noted that in this article, relational terms such as “first” and “second” are only used to distinguish one entity or operation from another similar entity or operation, and do not necessarily require that either Any such actual relationship or ordering between these entities or operations is implied. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or apparatus that includes the stated element.

Embodiments of the present application provide a data recovery method, device, electronic equipment and storage medium. The method provided by the embodiment of the present invention can be applied to any required electronic equipment, for example, it can be a server, a terminal and other electronic equipment, which are not specifically limited here. For convenience of description, they will be referred to as electronic equipment for short in the following.

According to one aspect of the embodiments of the present application, a method embodiment of a data recovery method is provided. Figure 1 is a flow chart of a data recovery method provided by an embodiment of the present application. As shown in Figure 1, the method includes:

Step S11: Obtain at least one target file carrying a deletion mark detected in the metadata database during the current detection cycle, where the deletion mark is used to indicate that there are data fragments to be recycled in the back-end storage file corresponding to the target file.

The method provided by the embodiment of this application is applied to the data recovery module. The data recovery module is deployed in the overall data recovery system of the disaster recovery software. As shown in Figure 2, the overall data recovery system of the disaster recovery software includes: The top-level Disaster recovery system represents disaster recovery. The program user interface layer is where the user's cloud host data is connected to the disaster recovery product. Data and meta indicate that the protected files are split into data parts and metadata parts within the disaster recovery program, and are stored in the backup database storage and database respectively. The backup database storage can be connected to traditional NAS storage or S3 object storage. Taking s3 as an example, the stored data files are fixed-size blocks. file, each block file contains multiple data slices, and each data slice is fragmented after deduplication.

GC in Figure 2 (Garbage Collection) module is a data recycling module. This module interacts with the database Database to obtain the metadata to be cleaned and the data where the data slices to be cleaned in the backend are located. file path and offset, then interact with the Storage module to download the corresponding file for cleaning, and finally update the offset corresponding to the cleaned slice to the database.

In order to solve the problem of difficult concurrency between data recovery and backup/recovery tasks in traditional disaster recovery software, the embodiment of this application optimizes the structure of the data recovery module. As shown in Figure 3, the tomb mark module modifies the database record into a target file through the meta server. Add a deletion mark so that it appears to the user that the file has been deleted, thus not affecting backup/restore tasks. Therefore, the data recovery module can obtain at least one target file carrying a deletion mark obtained by detecting the metadata database during the current detection cycle.

In the embodiment of this application, the data recycling module mainly uses three threads to perform data recycling. The three threads are: collection thread (collection thread), merge thread (merge thread) and update thread (update thread). The number of three types of threads can be dynamically configured. The configuration method includes: obtaining the configuration file. The configuration file is obtained based on the running status of the target disaster recovery software. Use the configuration file to configure the number of collection threads, merge threads, and update threads.

It should be noted that the collection thread circularly checks data fragments with a reference count of 0. Such fragments will not be referenced by any files. Recycling these fragments has no impact on backup/recovery, so concurrency is fully supported. Secondly, the merging thread will merge the downloaded fragmented data and upload it, thereby reducing the generation of file fragments. Finally, the IO paths of these three threads are independent of the backup/recovery process, and the number of threads can be dynamically adjusted so that the core business of the disaster recovery product is not affected during peak periods.

As an example, taking the backup process as an example, the backup performance formula is as follows:

Formula (1) calculates the processing speed of the merge thread, where is the average number of slices received by the server per second within T seconds, and the coefficient represents the impact of data recycling on backup. For traditional data recycling, because it is based on time points, the IO path is not independent from the backup, so it is less than 1; for the data recycling proposed in this article, because it is directly based on data fragments with a reference count of 0, the backup is completely No effect, equal to 1.

Represents the number of merging threads within T seconds during backup. Through the two, the merging speed can be calculated, that is, the number of slices that each backup thread needs to merge per second.

Formula (2) assumes that each block file stored on the cloud is aggregated from data slices, and represents the number of block files generated by the merging thread within T seconds.

Generally, the number of merge threads and update threads is equal, that is, the merged files are uploaded immediately. Finally, the upload speed of the backup thread (number of fragments/second) can be calculated through formula (2) (3).

At the same time, the data recovery performance formula is as follows:

In formula (4), assuming that the recycling thread needs to process T seconds in total, it represents the number of data fragments with a reference count of 0 that the recycling thread receives at the i-th moment to be processed, and represents the total number of data fragments on the cloud that the recycling thread needs to download within T seconds. The number of slices. Since the recycling thread reads the slices to be recycled from the database, it changes with time. The average retained shard ratio for this data recycling can be calculated from the formula.

In formula (5), it represents the average number of data fragments with a reference count of 0 contained in each block file on the cloud.

In formula (6), it represents the average number of zero_slices received by the recycling thread per second, divided by the average number of zero_slices contained in each block file, to get the average number of block files that need to be processed per second; then multiplied by the retention The sharding ratio gives you the average number of shards that a recycling thread needs to merge per second; finally multiplied by the number of recycling threads, you get the number of slices that each recycling thread needs to merge per second.

Finally, the upload speed of the recycling thread (number of fragments/second) can be calculated through formulas (7) (8).

In formula (9), it is assumed that the upper limit of the number of blocks that the backend storage can receive within T seconds is C, so the total number of blocks that the backup thread and recycling thread can upload in T seconds must be less than or equal to C. During the peak period of the disaster recovery product business, the backup thread is likely to fill up the upload bandwidth; at this time, if the traditional data recovery method based on copy time points is used, it can be seen from formula (1) that it will directly affect the backup speed; but after optimization Data recovery can dynamically adjust the number of collect threads, merge threads, and update threads to reduce the impact on backup and overall reduce the disaster recovery time objective (Recovery Time Object abbreviation: RTO) and data recovery point of the disaster recovery product. Objective (Recovery Point Objective abbreviation: RPO).

Step S12: Traverse the target file to obtain data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue.

In the embodiment of this application, step S12 traverses the target file to obtain data recovery information corresponding to the data fragments to be recovered, including the following steps A1-A4:

Step A1: Call the collection thread to traverse the reference counts of all data fragments corresponding to the target file.

Step A2: Decrement the reference count to obtain the decremented reference count, and use the data fragment with the decremented reference count as 0 as the data fragment to be recycled.

Step A3: Obtain the data fragment identifier and metadata corresponding to the data fragment to be recycled from the target file.

Step A4: Determine the data fragment identification and metadata as data recycling information.

In this embodiment of the present application, the collection thread can periodically check whether there are files marked for deletion in the metadata server (meta server). If the metadata server returns a file that has been marked for deletion, it enters the query reference management module to decrement the reference count of the data slice corresponding to the file by one, and returns the metadata information of the data slice (zero slice) with a reference count of 0. , metadata includes: backend storage file path and offset where the data shards are located.

It should be noted that the collection thread circularly checks data fragments with a reference count of 0. Such fragments will not be referenced by any files. Recycling these fragments has no impact on backup/recovery, so concurrency is fully supported.

In the embodiment of this application, writing the data recovery information into the target hash queue includes: performing a hash calculation on the metadata to obtain the calculation result, and storing the calculation result and the data fragmentation identifier in a key-value pair structure and writing them into the target hash queue. Hash queue, in which the calculation result is used as the key name in the key-value pair structure, and the data shard identifier is used as the key value in the key-value pair structure.

As an example, as shown in Figure 4, the data slice identifier (zero slice) will be stored in a hash queue, and the hash calculation result of the metadata will be used as the key, and the data slice (zero slice) will be the value.

Step S13: Use the data recycling information in the target hash queue to obtain the data fragments to be recycled from the back-end storage file.

In the embodiment of this application, the data recycling information in the target hash queue is used to obtain the data fragments to be recycled from the back-end storage file, including the following steps B1-B3:

Step B1: Call the merge thread to sequentially extract data fragment identifiers from the target hash queue.

Step B2: Obtain the back-end storage file corresponding to the data fragment identifier from the backup database, and download all data fragments included in the back-end storage file.

Step B3: Determine the data fragments matching the data fragment identifiers as the data fragments to be recycled.

In the embodiment of this application, the merging thread will sequentially extract the data fragment identifiers from the target hash queue, and then obtain the back-end storage files corresponding to the data fragment identifiers from the backup database. The back-end storage files correspond to multiple data fragments. , at this time the merge process will actively download all data fragments corresponding to the back-end storage files. The data fragments are then matched with the data fragment identifiers extracted from the hash queue, thereby finding the data fragments matching the data fragment identifiers and determining them as data fragments to be recycled.

It should be noted that the merging thread will merge the downloaded fragmented data and then upload it, thereby reducing the generation of file fragments. At the same time, we have optimized the problem that traditional disaster recovery software data recovery faces in the large amount of data cleaned at a time, so that the data recovery granularity is as fine as data fragmentation, and invalid data fragments are periodically detected and recycled immediately, thereby improving the response speed of the disaster recovery program.

Step S14: Generate a merged file based on the data fragments to be recycled, and perform the data recycling operation through the merged file.

In the embodiment of this application, generating a merge file based on the data fragments to be recycled includes: inserting the data fragments to be recycled into the merge queue, detecting the current storage amount of the merge queue, and when the current storage amount reaches the upper limit of the storage amount, based on The merge queue generates merge files.

In the embodiment of this application, the merging process will insert the data fragments to be recycled into the merging queue, and then detect the amount of data to be recycled that has been inserted into the merging queue to obtain the current storage amount, and compare the current storage amount with the storage amount corresponding to the merging queue. If the current storage amount reaches the upper limit of the storage amount, all the data to be recycled in the merge queue will be used to generate a merge file.

In the embodiment of this application, performing data recovery operations by merging files includes the following steps C1-C2:

Step C1: mobilize the update thread to upload the merged file, and modify the original data fragmentation offset corresponding to the data to be recycled in the merged file to obtain the target data fragmentation offset.

Step C2: Write the target data fragment offset to the metadata server, and delete the backend storage file corresponding to the data to be recycled.

Figure 5 is a block diagram of a data recovery device provided by an embodiment of the present application. The device can be implemented as part or all of an electronic device through software, hardware, or a combination of both. As shown in Figure 5, the device includes:

The acquisition module 51 is used to acquire at least one target file carrying a deletion mark that is detected in the metadata database during the current detection cycle, where the deletion mark is used to indicate that there is data to be recycled in the back-end storage file corresponding to the target file. piece;

The traversal module 52 is used to traverse the target file to obtain the data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue;

The extraction module 53 is used to obtain the data fragments to be recycled from the back-end storage file by using the data recycling information in the target hash queue;

The processing module 54 is configured to generate a merged file based on the data fragments to be recycled, and perform data recycling operations through the merged file.

In the embodiment of the present application, the data recovery device includes: a configuration module for obtaining a configuration file, where the configuration file is obtained based on the operation status of the target disaster recovery software; the configuration file is used to configure the number of collection threads, merging threads, and update threads .

In the embodiment of this application, the traversal module 52 is used to call the collection thread to traverse the reference counts of all data fragments corresponding to the target file; decrement the reference count to obtain the decremented reference count, and add the decremented reference count to The data fragments with a count of 0 are used as data fragments to be recycled; the data fragment identifiers and metadata corresponding to the data fragments to be recycled are obtained from the target file; the data fragment identifiers and metadata are determined as data recycling information.

In the embodiment of this application, the traversal module 52 is used to perform hash calculation on metadata to obtain calculation results; store the calculation results and data fragmentation identifiers in a key-value pair structure and write them into the target hash queue, where the calculation results are as The key name in the key-value pair structure, and the data shard identifier is used as the key value in the key-value pair structure.

In the embodiment of this application, the extraction module 53 is used to call the merge thread to sequentially extract the data fragment identifiers from the target hash queue; obtain the back-end storage file corresponding to the data fragment identifier from the backup database, and download the back-end storage file All data fragments included; the data fragments matching the data fragment identifiers are determined as data fragments to be recycled.

In the embodiment of this application, the processing module 54 is used to insert the data fragments to be recycled into the merge queue; detect the current storage amount of the merge queue, and generate a merge file based on the merge queue when the current storage amount reaches the upper limit of the storage amount.

In the embodiment of this application, the processing module 54 is used to mobilize the update thread to upload the merged file, and modify the original data fragmentation offset corresponding to the data to be recycled in the merged file to obtain the target data fragmentation offset; The shard offset is written to the metadata server, and the backend storage file corresponding to the data to be recycled is deleted.

An embodiment of the present application also provides an electronic device. As shown in Figure 6, the electronic device may include: a processor 1501, a communication interface 1502, a memory 1503, and a communication bus 1504. The processor 1501, the communication interface 1502, and the memory 1503 pass The communication bus 1504 completes mutual communication.

Memory 1503, used to store computer programs;

The processor 1501 is used to implement the steps of the above embodiment when executing the computer program stored on the memory 1503.

The communication bus mentioned in the above terminal may be a peripheral component interconnection standard (Peripheral Component Interconnect (PCI for short) bus or Extended Industry Standard Architecture (EISA for short) bus, etc. The communication bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the above terminal and other devices.

Memory may include random access memory (Random Access Memory (RAM for short) may also include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one storage device located far away from the aforementioned processor.

The above-mentioned processor may be a general-purpose processor, including a central processing unit (Central Processing Unit). Processing Unit (referred to as CPU), network processor (Network Processor, referred to as NP), etc.; it can also be a digital signal processor (Digital Signal Processing (referred to as DSP), application specific integrated circuit (Application Specific Integrated Circuit (ASIC for short), Field-Programmable Gate Array (FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.

In yet another embodiment provided by this application, a computer-readable storage medium is also provided. The computer-readable storage medium stores instructions, which when run on a computer, cause the computer to execute any one of the above embodiments. The data recovery method described.

In yet another embodiment provided by this application, a computer program product containing instructions is also provided, which when run on a computer causes the computer to execute the data recovery method described in any of the above embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website site, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated. The available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk), etc.

The above descriptions are only preferred embodiments of the present application and are not intended to limit the protection scope of the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this application are included in the protection scope of this application.

The above descriptions are only specific embodiments of the present application, enabling those skilled in the art to understand or implement the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be practiced in other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features claimed herein.

Claims

A data recovery method, characterized by including:

Obtain at least one target file carrying a deletion mark that is detected in the metadata database during the current detection cycle, where the deletion mark is used to indicate that there are data fragments to be recycled in the back-end storage file corresponding to the target file;

Traverse the target file to obtain the data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue;

Using the data recycling information in the target hash queue to obtain the data fragments to be recycled from the backend storage file;

A merged file is generated based on the data fragments to be recycled, and the data recycling operation is performed through the merged file.
The method of claim 1, further comprising:

Obtain the configuration file, where the configuration file is obtained based on the operation status of the target disaster recovery software;

Use the configuration file to configure the number of collection threads, merge threads, and update threads.
The method according to claim 2, characterized in that traversing the target file to obtain data recovery information corresponding to the data fragments to be recovered includes:

Call the collection thread to traverse the reference counts of all data fragments corresponding to the target file;

Decrement the reference count to obtain the decremented reference count, and use the data fragments with the decremented reference count as 0 as the data fragments to be recycled;

Obtain the data fragment identification and metadata corresponding to the data fragment to be recycled from the target file;

The data fragment identification and the metadata are determined as the data recycling information.
The method according to claim 3, characterized in that writing the data recycling information into the target hash queue includes:

Perform a hash calculation on the metadata to obtain the calculation result;

The calculation result and the data fragmentation identifier are stored and written into the target hash queue in a key-value pair structure, where the calculation result is used as the key name in the key-value pair structure, and the data fragmentation identification is used as the key-value pair structure. The key value in the key-value pair structure.
The method according to claim 2, characterized in that using the data recycling information in the target hash queue to obtain the data fragments to be recycled from the back-end storage file includes:

Call the merge thread to sequentially extract data fragmentation identifiers from the target hash queue;

Obtain the back-end storage file corresponding to the data fragment identification from the backup database, and download all data fragments included in the back-end storage file;

The data fragments matching the data fragment identifiers are determined as the data fragments to be recycled.
The method according to claim 5, characterized in that generating a merged file based on data fragments to be recycled includes:

Insert the data fragments to be recycled into the merge queue;

The current storage amount of the merge queue is detected, and when the current storage amount reaches the upper limit of the storage amount, the merge file is generated based on the merge queue.
The method according to claim 2, wherein performing a data recovery operation through the merged file includes:

Mobilize the update thread to upload the merged file, and modify the original data fragmentation offset corresponding to the data to be recycled in the merged file to obtain the target data fragmentation offset;

Write the target data fragment offset into the metadata server, and delete the backend storage file corresponding to the data to be recycled.
A data recovery device, characterized by including:

The acquisition module is used to acquire at least one target file carrying a deletion mark that is detected in the metadata database during the current detection cycle, wherein the deletion mark is used to indicate that there is a pending file in the back-end storage file corresponding to the target file. Recycle data fragments;

A traversal module, configured to traverse the target file to obtain the data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue;

An extraction module, configured to use the data recycling information in the target hash queue to obtain the data fragments to be recycled from the back-end storage file;

A processing module, configured to generate a merged file based on the data fragments to be recycled, and perform data recycling operations through the merged file.
A storage medium, characterized in that the storage medium includes a stored program, wherein when the program is run, the method steps described in any one of claims 1 to 7 are executed.
An electronic device, characterized in that it includes a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein:

Memory, used to store computer programs;

A processor, configured to execute the method steps of any one of claims 1 to 7 by running a program stored on the memory.