WO2024021492A1 - Data recycling method and apparatus, electronic device, and storage medium - Google Patents

Data recycling method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2024021492A1
WO2024021492A1 PCT/CN2022/141825 CN2022141825W WO2024021492A1 WO 2024021492 A1 WO2024021492 A1 WO 2024021492A1 CN 2022141825 W CN2022141825 W CN 2022141825W WO 2024021492 A1 WO2024021492 A1 WO 2024021492A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
file
target
recycled
fragments
Prior art date
Application number
PCT/CN2022/141825
Other languages
French (fr)
Chinese (zh)
Inventor
邓宇羽
赵真
范哲豪
姚永坤
吴淮
Original Assignee
天翼云科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 天翼云科技有限公司 filed Critical 天翼云科技有限公司
Publication of WO2024021492A1 publication Critical patent/WO2024021492A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating

Definitions

  • the present application relates to the field of computer technology, and in particular, to a data recovery method, device, electronic equipment and storage medium.
  • FIG. 1 The overall framework of traditional disaster recovery products is shown in Figure 1.
  • the data from the production center is sent to the disaster recovery center through the Agent, and then split into metadata and data parts and written to the database and back-end cloud storage respectively. Since the production center of enterprise users generates a large amount of data every moment, and when a disaster occurs, enterprise users generally only choose to restore to the latest point in time, so expired data in back-end storage will occupy a lot of space and increase user costs.
  • this application provides a data recovery method, device, electronic equipment and storage medium.
  • a data recovery method including:
  • a merged file is generated based on the data fragments to be recycled, and the data recycling operation is performed through the merged file.
  • the method also includes:
  • traversing the target file to obtain data recovery information corresponding to the data fragments to be recovered includes:
  • the data fragment identification and the metadata are determined as the data recycling information.
  • writing the data recovery information into the target hash queue includes:
  • the calculation result and the data fragmentation identifier are stored and written into the target hash queue in a key-value pair structure, where the calculation result is used as the key name in the key-value pair structure, and the data fragmentation identification is used as the key-value pair structure.
  • the key value in the key-value pair structure is used as the key name in the key-value pair structure.
  • using the data recycling information in the target hash queue to obtain the data fragments to be recycled from the back-end storage file includes:
  • the data fragments matching the data fragment identifiers are determined as the data fragments to be recycled.
  • the generation of merged files based on the data fragments to be recycled includes:
  • the current storage amount of the merge queue is detected, and when the current storage amount reaches the upper limit of the storage amount, the merge file is generated based on the merge queue.
  • the data recovery operation performed through the merged file includes:
  • a data recovery device including:
  • the acquisition module is used to acquire at least one target file carrying a deletion mark that is detected in the metadata database during the current detection cycle, wherein the deletion mark is used to indicate that there is a pending file in the back-end storage file corresponding to the target file.
  • Recycle data fragments
  • a traversal module configured to traverse the target file to obtain the data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue;
  • An extraction module configured to use the data recycling information in the target hash queue to obtain the data fragments to be recycled from the back-end storage file;
  • a processing module configured to generate a merged file based on the data fragments to be recycled, and perform data recycling operations through the merged file.
  • a storage medium includes a stored program, and the above steps are executed when the program is run.
  • an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein:
  • the memory is used to store computer programs; the processor is used to execute the steps in the above method by running the program stored in the memory.
  • Embodiments of the present application also provide a computer program product containing instructions that, when run on a computer, cause the computer to perform the steps in the above method.
  • the method provided by the embodiments of this application can make the disaster recovery program consume less system performance during data recovery, ensure the IO resources for the core business of the disaster recovery product, reduce hardware costs, and improve user experience. It effectively solves the problems faced by traditional disaster recovery programs during data recovery: large amounts of data to be cleaned, long cleaning time, difficulty in concurrency with backup and recovery, and easy file fragmentation.
  • Figure 1 is a flow chart of a data recovery method provided by an embodiment of the present application.
  • FIG. 2 is a block diagram of the overall data recovery system of the disaster recovery software provided by the embodiment of the present application.
  • FIG. 3 is a block diagram of a data recovery module provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of the storage of data fragmentation identifiers provided by the embodiment of the present application.
  • Figure 5 is a block diagram of a data recovery device provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Embodiments of the present application provide a data recovery method, device, electronic equipment and storage medium.
  • the method provided by the embodiment of the present invention can be applied to any required electronic equipment, for example, it can be a server, a terminal and other electronic equipment, which are not specifically limited here. For convenience of description, they will be referred to as electronic equipment for short in the following.
  • Figure 1 is a flow chart of a data recovery method provided by an embodiment of the present application. As shown in Figure 1, the method includes:
  • Step S11 Obtain at least one target file carrying a deletion mark detected in the metadata database during the current detection cycle, where the deletion mark is used to indicate that there are data fragments to be recycled in the back-end storage file corresponding to the target file.
  • the method provided by the embodiment of this application is applied to the data recovery module.
  • the data recovery module is deployed in the overall data recovery system of the disaster recovery software.
  • the overall data recovery system of the disaster recovery software includes:
  • the top-level Disaster recovery system represents disaster recovery.
  • the program user interface layer is where the user's cloud host data is connected to the disaster recovery product.
  • Data and meta indicate that the protected files are split into data parts and metadata parts within the disaster recovery program, and are stored in the backup database storage and database respectively.
  • the backup database storage can be connected to traditional NAS storage or S3 object storage. Taking s3 as an example, the stored data files are fixed-size blocks. file, each block file contains multiple data slices, and each data slice is fragmented after deduplication.
  • GC in Figure 2 Garbage Collection
  • This module interacts with the database Database to obtain the metadata to be cleaned and the data where the data slices to be cleaned in the backend are located. file path and offset, then interact with the Storage module to download the corresponding file for cleaning, and finally update the offset corresponding to the cleaned slice to the database.
  • the embodiment of this application optimizes the structure of the data recovery module.
  • the tomb mark module modifies the database record into a target file through the meta server. Add a deletion mark so that it appears to the user that the file has been deleted, thus not affecting backup/restore tasks. Therefore, the data recovery module can obtain at least one target file carrying a deletion mark obtained by detecting the metadata database during the current detection cycle.
  • the data recycling module mainly uses three threads to perform data recycling.
  • the three threads are: collection thread (collection thread), merge thread (merge thread) and update thread (update thread).
  • the number of three types of threads can be dynamically configured.
  • the configuration method includes: obtaining the configuration file.
  • the configuration file is obtained based on the running status of the target disaster recovery software. Use the configuration file to configure the number of collection threads, merge threads, and update threads.
  • the collection thread circularly checks data fragments with a reference count of 0. Such fragments will not be referenced by any files. Recycling these fragments has no impact on backup/recovery, so concurrency is fully supported.
  • the merging thread will merge the downloaded fragmented data and upload it, thereby reducing the generation of file fragments.
  • the IO paths of these three threads are independent of the backup/recovery process, and the number of threads can be dynamically adjusted so that the core business of the disaster recovery product is not affected during peak periods.
  • the backup performance formula is as follows:
  • Formula (1) calculates the processing speed of the merge thread, where is the average number of slices received by the server per second within T seconds, and the coefficient represents the impact of data recycling on backup.
  • the IO path is not independent from the backup, so it is less than 1; for the data recycling proposed in this article, because it is directly based on data fragments with a reference count of 0, the backup is completely No effect, equal to 1.
  • Formula (2) assumes that each block file stored on the cloud is aggregated from data slices, and represents the number of block files generated by the merging thread within T seconds.
  • the number of merge threads and update threads is equal, that is, the merged files are uploaded immediately.
  • the upload speed of the backup thread (number of fragments/second) can be calculated through formula (2) (3).
  • the recycling thread In formula (4), assuming that the recycling thread needs to process T seconds in total, it represents the number of data fragments with a reference count of 0 that the recycling thread receives at the i-th moment to be processed, and represents the total number of data fragments on the cloud that the recycling thread needs to download within T seconds.
  • the number of slices Since the recycling thread reads the slices to be recycled from the database, it changes with time. The average retained shard ratio for this data recycling can be calculated from the formula.
  • formula (5) it represents the average number of data fragments with a reference count of 0 contained in each block file on the cloud.
  • formula (6) it represents the average number of zero_slices received by the recycling thread per second, divided by the average number of zero_slices contained in each block file, to get the average number of block files that need to be processed per second; then multiplied by the retention
  • the sharding ratio gives you the average number of shards that a recycling thread needs to merge per second; finally multiplied by the number of recycling threads, you get the number of slices that each recycling thread needs to merge per second.
  • the upload speed of the recycling thread (number of fragments/second) can be calculated through formulas (7) (8).
  • Step S12 Traverse the target file to obtain data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue.
  • step S12 traverses the target file to obtain data recovery information corresponding to the data fragments to be recovered, including the following steps A1-A4:
  • Step A1 Call the collection thread to traverse the reference counts of all data fragments corresponding to the target file.
  • Step A2 Decrement the reference count to obtain the decremented reference count, and use the data fragment with the decremented reference count as 0 as the data fragment to be recycled.
  • Step A3 Obtain the data fragment identifier and metadata corresponding to the data fragment to be recycled from the target file.
  • Step A4 Determine the data fragment identification and metadata as data recycling information.
  • the collection thread can periodically check whether there are files marked for deletion in the metadata server (meta server). If the metadata server returns a file that has been marked for deletion, it enters the query reference management module to decrement the reference count of the data slice corresponding to the file by one, and returns the metadata information of the data slice (zero slice) with a reference count of 0.
  • metadata includes: backend storage file path and offset where the data shards are located.
  • the collection thread circularly checks data fragments with a reference count of 0. Such fragments will not be referenced by any files. Recycling these fragments has no impact on backup/recovery, so concurrency is fully supported.
  • writing the data recovery information into the target hash queue includes: performing a hash calculation on the metadata to obtain the calculation result, and storing the calculation result and the data fragmentation identifier in a key-value pair structure and writing them into the target hash queue.
  • Hash queue in which the calculation result is used as the key name in the key-value pair structure, and the data shard identifier is used as the key value in the key-value pair structure.
  • the data slice identifier (zero slice) will be stored in a hash queue, and the hash calculation result of the metadata will be used as the key, and the data slice (zero slice) will be the value.
  • Step S13 Use the data recycling information in the target hash queue to obtain the data fragments to be recycled from the back-end storage file.
  • the data recycling information in the target hash queue is used to obtain the data fragments to be recycled from the back-end storage file, including the following steps B1-B3:
  • Step B1 Call the merge thread to sequentially extract data fragment identifiers from the target hash queue.
  • Step B2 Obtain the back-end storage file corresponding to the data fragment identifier from the backup database, and download all data fragments included in the back-end storage file.
  • Step B3 Determine the data fragments matching the data fragment identifiers as the data fragments to be recycled.
  • the merging thread will sequentially extract the data fragment identifiers from the target hash queue, and then obtain the back-end storage files corresponding to the data fragment identifiers from the backup database.
  • the back-end storage files correspond to multiple data fragments.
  • the merge process will actively download all data fragments corresponding to the back-end storage files.
  • the data fragments are then matched with the data fragment identifiers extracted from the hash queue, thereby finding the data fragments matching the data fragment identifiers and determining them as data fragments to be recycled.
  • the merging thread will merge the downloaded fragmented data and then upload it, thereby reducing the generation of file fragments.
  • Step S14 Generate a merged file based on the data fragments to be recycled, and perform the data recycling operation through the merged file.
  • generating a merge file based on the data fragments to be recycled includes: inserting the data fragments to be recycled into the merge queue, detecting the current storage amount of the merge queue, and when the current storage amount reaches the upper limit of the storage amount, based on The merge queue generates merge files.
  • the merging process will insert the data fragments to be recycled into the merging queue, and then detect the amount of data to be recycled that has been inserted into the merging queue to obtain the current storage amount, and compare the current storage amount with the storage amount corresponding to the merging queue. If the current storage amount reaches the upper limit of the storage amount, all the data to be recycled in the merge queue will be used to generate a merge file.
  • performing data recovery operations by merging files includes the following steps C1-C2:
  • Step C1 mobilize the update thread to upload the merged file, and modify the original data fragmentation offset corresponding to the data to be recycled in the merged file to obtain the target data fragmentation offset.
  • Step C2 Write the target data fragment offset to the metadata server, and delete the backend storage file corresponding to the data to be recycled.
  • the method provided by the embodiments of this application can make the disaster recovery program consume less system performance during data recovery, ensure the IO resources for the core business of the disaster recovery product, reduce hardware costs, and improve user experience. It effectively solves the problems faced by traditional disaster recovery programs during data recovery: large amounts of data to be cleaned, long cleaning time, difficulty in concurrency with backup and recovery, and easy file fragmentation.
  • Figure 5 is a block diagram of a data recovery device provided by an embodiment of the present application.
  • the device can be implemented as part or all of an electronic device through software, hardware, or a combination of both. As shown in Figure 5, the device includes:
  • the acquisition module 51 is used to acquire at least one target file carrying a deletion mark that is detected in the metadata database during the current detection cycle, where the deletion mark is used to indicate that there is data to be recycled in the back-end storage file corresponding to the target file. piece;
  • the traversal module 52 is used to traverse the target file to obtain the data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue;
  • the extraction module 53 is used to obtain the data fragments to be recycled from the back-end storage file by using the data recycling information in the target hash queue;
  • the processing module 54 is configured to generate a merged file based on the data fragments to be recycled, and perform data recycling operations through the merged file.
  • the data recovery device includes: a configuration module for obtaining a configuration file, where the configuration file is obtained based on the operation status of the target disaster recovery software; the configuration file is used to configure the number of collection threads, merging threads, and update threads .
  • the traversal module 52 is used to call the collection thread to traverse the reference counts of all data fragments corresponding to the target file; decrement the reference count to obtain the decremented reference count, and add the decremented reference count to
  • the data fragments with a count of 0 are used as data fragments to be recycled; the data fragment identifiers and metadata corresponding to the data fragments to be recycled are obtained from the target file; the data fragment identifiers and metadata are determined as data recycling information.
  • the traversal module 52 is used to perform hash calculation on metadata to obtain calculation results; store the calculation results and data fragmentation identifiers in a key-value pair structure and write them into the target hash queue, where the calculation results are as The key name in the key-value pair structure, and the data shard identifier is used as the key value in the key-value pair structure.
  • the extraction module 53 is used to call the merge thread to sequentially extract the data fragment identifiers from the target hash queue; obtain the back-end storage file corresponding to the data fragment identifier from the backup database, and download the back-end storage file All data fragments included; the data fragments matching the data fragment identifiers are determined as data fragments to be recycled.
  • the processing module 54 is used to insert the data fragments to be recycled into the merge queue; detect the current storage amount of the merge queue, and generate a merge file based on the merge queue when the current storage amount reaches the upper limit of the storage amount.
  • the processing module 54 is used to mobilize the update thread to upload the merged file, and modify the original data fragmentation offset corresponding to the data to be recycled in the merged file to obtain the target data fragmentation offset;
  • the shard offset is written to the metadata server, and the backend storage file corresponding to the data to be recycled is deleted.
  • the electronic device may include: a processor 1501, a communication interface 1502, a memory 1503, and a communication bus 1504.
  • the processor 1501, the communication interface 1502, and the memory 1503 pass The communication bus 1504 completes mutual communication.
  • Memory 1503 used to store computer programs
  • the processor 1501 is used to implement the steps of the above embodiment when executing the computer program stored on the memory 1503.
  • the communication bus mentioned in the above terminal may be a peripheral component interconnection standard (Peripheral Component Interconnect (PCI for short) bus or Extended Industry Standard Architecture (EISA for short) bus, etc.
  • PCI peripheral component interconnection standard
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used for communication between the above terminal and other devices.
  • Memory may include random access memory (Random Access Memory (RAM for short) may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • RAM Random Access Memory
  • non-volatile memory such as at least one disk memory.
  • the memory may also be at least one storage device located far away from the aforementioned processor.
  • the above-mentioned processor may be a general-purpose processor, including a central processing unit (Central Processing Unit). Processing Unit (referred to as CPU), network processor (Network Processor, referred to as NP), etc.; it can also be a digital signal processor (Digital Signal Processing (referred to as DSP), application specific integrated circuit (Application Specific Integrated Circuit (ASIC for short), Field-Programmable Gate Array (FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • a computer-readable storage medium stores instructions, which when run on a computer, cause the computer to execute any one of the above embodiments. The data recovery method described.
  • a computer program product containing instructions is also provided, which when run on a computer causes the computer to execute the data recovery method described in any of the above embodiments.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website site, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated.
  • the available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk), etc.

Abstract

The present application discloses a data recycling method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring at least one target file which is obtained by detecting a metadata database within the current detection period and carries a deletion mark, wherein the deletion mark is used for indicating that data slices to be recycled are present in a backend storage file corresponding to the target file; traversing the target file to obtain data recycling information corresponding to said data slices, and writing the data recycling information into a target hash queue; acquiring said data slices from the backend storage file by using the data recycling information in the target hash queue; and generating a merge file on the basis of said data slices, and executing a data recycling operation by means of the merge file. According to the present application, the performance consumption of a system during data recycling is lower, I/O resources of core services of a disaster recovery program are ensured, the hardware cost is reduced, and the user experience is improved. The problems of large data cleaning amount and high cleaning time consumption of conventional disaster recovery software during data recycling are effectively solved.

Description

一种数据回收方法、装置、电子设备及存储介质A data recovery method, device, electronic equipment and storage medium 技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种数据回收方法、装置、电子设备及存储介质。The present application relates to the field of computer technology, and in particular, to a data recovery method, device, electronic equipment and storage medium.
背景技术Background technique
着数字经济和数字社会的发展,各种大型企业面临越来越多的业务连续性挑战,与此同时也诞生了大量的灾备产品用于应对各种意外导致的业务中断。With the development of the digital economy and society, various large enterprises are facing more and more business continuity challenges. At the same time, a large number of disaster recovery products have been born to deal with business interruptions caused by various accidents.
技术问题technical problem
传统的灾备产品整体框架如图1所示。生产中心的数据通过Agent代理发送到灾备中心,然后拆分成元数据部分和数据部分分别写入数据库和后端云存储。由于企业用户的生产中心每时每刻都会产生大量的数据,而灾难发生时企业用户一般只会选择恢复到最近的时间点,因此后端存储里面的过期数据会占用大量空间,增加用户成本。The overall framework of traditional disaster recovery products is shown in Figure 1. The data from the production center is sent to the disaster recovery center through the Agent, and then split into metadata and data parts and written to the database and back-end cloud storage respectively. Since the production center of enterprise users generates a large amount of data every moment, and when a disaster occurs, enterprise users generally only choose to restore to the latest point in time, so expired data in back-end storage will occupy a lot of space and increase user costs.
然而,目前传统数据回收方法会影响备份/恢复业务,进而影响整个灾备产品的灾难恢复时间目标(Recovery Time Object缩写:RTO)和数据恢复点目标(Recovery Point Objective缩写:RPO)。However, the current traditional data recovery methods will affect the backup/recovery business, thereby affecting the disaster recovery time objective (Recovery Time Object abbreviation: RTO) and data recovery point objective (Recovery Point) of the entire disaster recovery product. Objective abbreviation: RPO).
技术解决方案Technical solutions
为了解决上述技术问题或者至少部分地解决上述技术问题,本申请提供了一种数据回收方法、装置、电子设备及存储介质。In order to solve the above technical problems or at least partially solve the above technical problems, this application provides a data recovery method, device, electronic equipment and storage medium.
根据本申请实施例的一个方面,提供了一种数据回收方法,包括:According to one aspect of the embodiment of the present application, a data recovery method is provided, including:
获取当前检测周期对元数据数据库进行检测得到的携带有删除标记的至少一个目标文件,其中,所述删除标记用于表示所述目标文件所对应的后端存储文件中存在待回收数据分片;Obtain at least one target file carrying a deletion mark that is detected in the metadata database during the current detection cycle, where the deletion mark is used to indicate that there are data fragments to be recycled in the back-end storage file corresponding to the target file;
遍历所述目标文件得到所述待回收数据分片对应的数据回收信息,并将所述数据回收信息写入目标哈希队列;Traverse the target file to obtain the data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue;
利用所述目标哈希队列中的数据回收信息从所述后端存储文件中获取待回收数据分片;Using the data recycling information in the target hash queue to obtain the data fragments to be recycled from the backend storage file;
基于待回收数据分片生成合并文件,并通过所述合并文件执行数据回收操作。A merged file is generated based on the data fragments to be recycled, and the data recycling operation is performed through the merged file.
进一步的,所述方法还包括:Further, the method also includes:
获取配置文件,其中,所述配置文件是依据目标灾备软件的运行情况得到;Obtain the configuration file, where the configuration file is obtained based on the operation status of the target disaster recovery software;
利用所述配置文件配置收集线程、合并线程以及更新线程的数量。Use the configuration file to configure the number of collection threads, merge threads, and update threads.
进一步的,所述遍历所述目标文件得到所述待回收数据分片对应的数据回收信息,包括:Further, traversing the target file to obtain data recovery information corresponding to the data fragments to be recovered includes:
调用所述收集线程遍历所述目标文件对应所有数据分片的引用计数;Call the collection thread to traverse the reference counts of all data fragments corresponding to the target file;
对所述引用计数进行自减,得到自减后的引用计数,并将自减后的引用计数为0的数据分片作为待回收数据分片;Decrement the reference count to obtain the decremented reference count, and use the data fragments with the decremented reference count as 0 as the data fragments to be recycled;
从目标文件中获取所述待回收数据分片对应的数据分片标识以及元数据;Obtain the data fragment identification and metadata corresponding to the data fragment to be recycled from the target file;
将所述数据分片标识以及所述元数据确定为所述数据回收信息。The data fragment identification and the metadata are determined as the data recycling information.
进一步的,所述并将所述数据回收信息写入目标哈希队列,包括:Further, writing the data recovery information into the target hash queue includes:
对所述元数据进行哈希计算得到计算结果;Perform a hash calculation on the metadata to obtain the calculation result;
将所述计算结果以及所述数据分片标识以键值对结构存储写入目标哈希队列,其中,所述计算结果作为键值对结构中的键名,所述数据分片标识作为所述键值对结构中的键值。The calculation result and the data fragmentation identifier are stored and written into the target hash queue in a key-value pair structure, where the calculation result is used as the key name in the key-value pair structure, and the data fragmentation identification is used as the key-value pair structure. The key value in the key-value pair structure.
进一步的,所述利用所述目标哈希队列中的数据回收信息从所述后端存储文件中获取待回收数据分片,包括:Further, using the data recycling information in the target hash queue to obtain the data fragments to be recycled from the back-end storage file includes:
调用所述合并线程从所述目标哈希队列依次提取数据分片标识;Call the merge thread to sequentially extract data fragmentation identifiers from the target hash queue;
从备份数据库中获取所述数据分片标识对应的后端存储文件,并下载所述后端存储文件所包括的所有数据分片;Obtain the back-end storage file corresponding to the data fragment identification from the backup database, and download all data fragments included in the back-end storage file;
将与所述数据分片标识相匹配的数据分片确定为所述待回收数据分片。The data fragments matching the data fragment identifiers are determined as the data fragments to be recycled.
进一步的,所述基于待回收数据分片生成合并文件,包括:Further, the generation of merged files based on the data fragments to be recycled includes:
将所述待回收数据分片插入合并队列;Insert the data fragments to be recycled into the merge queue;
检测所述合并队列的当前存储量,在所述当前存储量达到存储量上限的情况下,基于所述合并队列生成所述合并文件。The current storage amount of the merge queue is detected, and when the current storage amount reaches the upper limit of the storage amount, the merge file is generated based on the merge queue.
进一步的,所述通过所述合并文件执行数据回收操作,包括:Further, the data recovery operation performed through the merged file includes:
调动所述更新线程上传合并文件,并对合并文件中待回收数据对应的原始数据分片偏移进行修改,得到目标数据分片偏移;Mobilize the update thread to upload the merged file, and modify the original data fragmentation offset corresponding to the data to be recycled in the merged file to obtain the target data fragmentation offset;
将所述目标数据分片偏移写入所述元数据服务器,并删除所述待回收数据对应的后端存储文件。Write the target data fragment offset into the metadata server, and delete the backend storage file corresponding to the data to be recycled.
根据本申请实施例的再一个方面,还提供了一种数据回收装置,包括:According to yet another aspect of the embodiment of the present application, a data recovery device is also provided, including:
获取模块,用于获取当前检测周期对元数据数据库进行检测得到的携带有删除标记的至少一个目标文件,其中,所述删除标记用于表示所述目标文件所对应的后端存储文件中存在待回收数据分片;The acquisition module is used to acquire at least one target file carrying a deletion mark that is detected in the metadata database during the current detection cycle, wherein the deletion mark is used to indicate that there is a pending file in the back-end storage file corresponding to the target file. Recycle data fragments;
遍历模块,用于遍历所述目标文件得到所述待回收数据分片对应的数据回收信息,并将所述数据回收信息写入目标哈希队列;A traversal module, configured to traverse the target file to obtain the data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue;
提取模块,用于利用所述目标哈希队列中的数据回收信息从所述后端存储文件中获取待回收数据分片;An extraction module, configured to use the data recycling information in the target hash queue to obtain the data fragments to be recycled from the back-end storage file;
处理模块,用于基于待回收数据分片生成合并文件,并通过所述合并文件执行数据回收操作。A processing module, configured to generate a merged file based on the data fragments to be recycled, and perform data recycling operations through the merged file.
根据本申请实施例的另一方面,还提供了一种存储介质,该存储介质包括存储的程序,程序运行时执行上述的步骤。According to another aspect of the embodiment of the present application, a storage medium is also provided. The storage medium includes a stored program, and the above steps are executed when the program is run.
根据本申请实施例的另一方面,还提供了一种电子装置,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;其中:存储器,用于存放计算机程序;处理器,用于通过运行存储器上所存放的程序来执行上述方法中的步骤。According to another aspect of the embodiment of the present application, an electronic device is also provided, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein: The memory is used to store computer programs; the processor is used to execute the steps in the above method by running the program stored in the memory.
本申请实施例还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述方法中的步骤。Embodiments of the present application also provide a computer program product containing instructions that, when run on a computer, cause the computer to perform the steps in the above method.
有益效果beneficial effects
本申请实施例提供的方法可以使灾备程序在数据回收时对系统的性能消耗更低,保证了灾备产品核心业务的IO资源,降低了硬件成本且提升了用户体验。有效解决传统灾备程序在数据回收时面临的清理数据量大,清理耗时长,与备份恢复并发困难,容易产生文件碎片的问题。The method provided by the embodiments of this application can make the disaster recovery program consume less system performance during data recovery, ensure the IO resources for the core business of the disaster recovery product, reduce hardware costs, and improve user experience. It effectively solves the problems faced by traditional disaster recovery programs during data recovery: large amounts of data to be cleaned, long cleaning time, difficulty in concurrency with backup and recovery, and easy file fragmentation.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the embodiments of the present application or the technical solutions in the prior art, the following will briefly introduce the drawings needed to describe the embodiments or the prior art. Obviously, for those of ordinary skill in the art, It is said that other drawings can be obtained based on these drawings without exerting creative labor.
图1为本申请实施例提供的一种数据回收方法的流程图;Figure 1 is a flow chart of a data recovery method provided by an embodiment of the present application;
图2为本申请实施例提供的灾备软件的数据回收整体系统的框图;Figure 2 is a block diagram of the overall data recovery system of the disaster recovery software provided by the embodiment of the present application;
图3为本申请实施例提供的数据回收模块的框图;Figure 3 is a block diagram of a data recovery module provided by an embodiment of the present application;
图4为本申请实施例提供的数据分片标识的存储示意图;Figure 4 is a schematic diagram of the storage of data fragmentation identifiers provided by the embodiment of the present application;
图5为本申请实施例提供的一种数据回收装置的框图;Figure 5 is a block diagram of a data recovery device provided by an embodiment of the present application;
图6为本申请实施例提供的一种电子设备的结构示意图。FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
本发明的实施方式Embodiments of the invention
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请的一部分实施例,而不是全部的实施例,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments They are part of the embodiments of the present application, rather than all the embodiments. The illustrative embodiments of the present application and their descriptions are used to explain the present application and do not constitute an improper limitation of the present application. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个类似的实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as “first” and “second” are only used to distinguish one entity or operation from another similar entity or operation, and do not necessarily require that either Any such actual relationship or ordering between these entities or operations is implied. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or apparatus that includes the stated element.
本申请实施例提供了一种数据回收方法、装置、电子设备及存储介质。本发明实施例所提供的方法可以应用于任意需要的电子设备,例如,可以为服务器、终端等电子设备,在此不做具体限定,为描述方便,后续简称为电子设备。Embodiments of the present application provide a data recovery method, device, electronic equipment and storage medium. The method provided by the embodiment of the present invention can be applied to any required electronic equipment, for example, it can be a server, a terminal and other electronic equipment, which are not specifically limited here. For convenience of description, they will be referred to as electronic equipment for short in the following.
根据本申请实施例的一方面,提供了一种数据回收方法的方法实施例。图1为本申请实施例提供的一种数据回收方法的流程图,如图1所示,该方法包括:According to one aspect of the embodiments of the present application, a method embodiment of a data recovery method is provided. Figure 1 is a flow chart of a data recovery method provided by an embodiment of the present application. As shown in Figure 1, the method includes:
步骤S11,获取当前检测周期对元数据数据库进行检测得到的携带有删除标记的至少一个目标文件,其中,删除标记用于表示目标文件所对应的后端存储文件中存在待回收数据分片。Step S11: Obtain at least one target file carrying a deletion mark detected in the metadata database during the current detection cycle, where the deletion mark is used to indicate that there are data fragments to be recycled in the back-end storage file corresponding to the target file.
本申请实施例提供的方法应用于数据回收模块,数据回收模块部署在灾备软件的数据回收整体系统,如图2所示,灾备软件的数据回收整体系统包括:顶层Disaster recovery system表示灾备程序用户接口层,用户的云主机数据从该层接入到灾备产品。data和 meta表示灾备程序内部将被保护的文件拆分成数据部分和元数据部分,分别存储到备份数据库storage和数据库database。其中备份数据库storage可以对接传统的NAS存储或者S3对象存储。以s3为例,存储的数据文件为固定大小的block file,每个block file又包含多个数据分片slice,每个数据分片都是重删之后分片。The method provided by the embodiment of this application is applied to the data recovery module. The data recovery module is deployed in the overall data recovery system of the disaster recovery software. As shown in Figure 2, the overall data recovery system of the disaster recovery software includes: The top-level Disaster recovery system represents disaster recovery. The program user interface layer is where the user's cloud host data is connected to the disaster recovery product. Data and meta indicate that the protected files are split into data parts and metadata parts within the disaster recovery program, and are stored in the backup database storage and database respectively. The backup database storage can be connected to traditional NAS storage or S3 object storage. Taking s3 as an example, the stored data files are fixed-size blocks. file, each block file contains multiple data slices, and each data slice is fragmented after deduplication.
图2中的GC(Garbage Collection)模块为数据回收模块,该模块与数据库Database交互获取要清理的元数据以及后端要清理的数据分片slice 所在的data file路径和偏移,然后与Storage模块交互下载对应的文件进行清理,最后更新清理后slice分片对应的偏移到数据库中。GC in Figure 2 (Garbage Collection) module is a data recycling module. This module interacts with the database Database to obtain the metadata to be cleaned and the data where the data slices to be cleaned in the backend are located. file path and offset, then interact with the Storage module to download the corresponding file for cleaning, and finally update the offset corresponding to the cleaned slice to the database.
为了解决传统灾备软件数据回收与备份/恢复任务并发困难的问题,本申请实施例对数据回收模块的结构进行了优化,如图3所示,tomb mark模块通过meta server修改数据库记录为目标文件添加删除标记,对用户来说该文件已删除,从而不影响备份/恢复任务。因此数据回收模块可以在当前检测周期内获取对元数据数据库进行检测得到的携带有删除标记的至少一个目标文件。In order to solve the problem of difficult concurrency between data recovery and backup/recovery tasks in traditional disaster recovery software, the embodiment of this application optimizes the structure of the data recovery module. As shown in Figure 3, the tomb mark module modifies the database record into a target file through the meta server. Add a deletion mark so that it appears to the user that the file has been deleted, thus not affecting backup/restore tasks. Therefore, the data recovery module can obtain at least one target file carrying a deletion mark obtained by detecting the metadata database during the current detection cycle.
在本申请实施例中,数据回收模块主要应用三个线程执行数据回收,三个线程分别为:收集线程(collection线程)、合并线程(merge线程)以及更新线程( update 线程)。三种线程的数量可以动态配置,配置方法包括:获取配置文件,其中,配置文件是依据目标灾备软件的运行情况得到,利用配置文件配置收集线程、合并线程以及更新线程的数量。In the embodiment of this application, the data recycling module mainly uses three threads to perform data recycling. The three threads are: collection thread (collection thread), merge thread (merge thread) and update thread (update thread). The number of three types of threads can be dynamically configured. The configuration method includes: obtaining the configuration file. The configuration file is obtained based on the running status of the target disaster recovery software. Use the configuration file to configure the number of collection threads, merge threads, and update threads.
需要说明的是,收集线程循环检查引用计数为0的数据分片,这样的分片不会被任何文件引用,回收这些分片对备份/恢复的没有影响,因此完全支持并发。其次,合并线程会将下载的分片数据合并后上传,从而减少了文件碎片的产生。最后,这三种线程的IO路径与备份/恢复流程相互独立,可以动态调节线程数量使得灾备产品高峰期核心业务不受影响。It should be noted that the collection thread circularly checks data fragments with a reference count of 0. Such fragments will not be referenced by any files. Recycling these fragments has no impact on backup/recovery, so concurrency is fully supported. Secondly, the merging thread will merge the downloaded fragmented data and upload it, thereby reducing the generation of file fragments. Finally, the IO paths of these three threads are independent of the backup/recovery process, and the number of threads can be dynamically adjusted so that the core business of the disaster recovery product is not affected during peak periods.
作为一个示例,以备份流程为例,备份性能公式如下:As an example, taking the backup process as an example, the backup performance formula is as follows:
公式(1)计算了合并线程的处理速度,其中为server端在T秒内平均每秒接受的slice数量,其中系数代表数据回收对备份的影响。对于传统的数据回收,由于是基于时间点进行导致IO路径与备份不独立,因此小于1;而对于本文提出的数据回收,由于是直接基于引用计数为0的数据分片进行,因此对备份完全没有影响,等于1。Formula (1) calculates the processing speed of the merge thread, where is the average number of slices received by the server per second within T seconds, and the coefficient represents the impact of data recycling on backup. For traditional data recycling, because it is based on time points, the IO path is not independent from the backup, so it is less than 1; for the data recycling proposed in this article, because it is directly based on data fragments with a reference count of 0, the backup is completely No effect, equal to 1.
代表备份时合并线程在T秒内的数量,通过两者可以算出合并速度,即每个备份线程每秒需要合并的slice数量。Represents the number of merging threads within T seconds during backup. Through the two, the merging speed can be calculated, that is, the number of slices that each backup thread needs to merge per second.
公式(2)假设每一个云上存储的block文件都是由个数据分片slice聚合而成,则是代表合并线程在T秒内生成的block文件数量。Formula (2) assumes that each block file stored on the cloud is aggregated from data slices, and represents the number of block files generated by the merging thread within T seconds.
一般合并线程与更新线程数量相等,即合并后的文件立即上传。最后通过公式(2)(3)可以计算出备份线程的上传速度(分片个数/秒)。Generally, the number of merge threads and update threads is equal, that is, the merged files are uploaded immediately. Finally, the upload speed of the backup thread (number of fragments/second) can be calculated through formula (2) (3).
同时,数据回收性能公式如下:At the same time, the data recovery performance formula is as follows:
公式(4)中,假设回收线程一共需要处理T秒,代表第i时刻回收线程接受到要处理的引用计数为0的数据分片个数,代表T秒内回收线程需要下载的云上总的slice分片个数。由于回收线程是从数据库读取要回收的slice分片,因此是随时间变化的。由公式可以计算得到这次数据回收的平均保留分片比例。In formula (4), assuming that the recycling thread needs to process T seconds in total, it represents the number of data fragments with a reference count of 0 that the recycling thread receives at the i-th moment to be processed, and represents the total number of data fragments on the cloud that the recycling thread needs to download within T seconds. The number of slices. Since the recycling thread reads the slices to be recycled from the database, it changes with time. The average retained shard ratio for this data recycling can be calculated from the formula.
公式(5)中,代表每个云上block文件中平均含有多少个引用计数为0的数据分片。In formula (5), it represents the average number of data fragments with a reference count of 0 contained in each block file on the cloud.
公式(6)中,代表平均每秒回收线程接受到的zero_slice个数,除以平均每个block文件中含有的zero_slice个数,就得到平均每秒需要处理的block文件个数;再乘以保留分片比例,就得到平均每秒回收线程需要合并的分片个数;最后乘以回收线程个数就得到,即每个回收线程每秒需要合并的slice数量。In formula (6), it represents the average number of zero_slices received by the recycling thread per second, divided by the average number of zero_slices contained in each block file, to get the average number of block files that need to be processed per second; then multiplied by the retention The sharding ratio gives you the average number of shards that a recycling thread needs to merge per second; finally multiplied by the number of recycling threads, you get the number of slices that each recycling thread needs to merge per second.
最后通过公式(7)(8)可以计算出回收线程的上传速度(分片个数/秒)。Finally, the upload speed of the recycling thread (number of fragments/second) can be calculated through formulas (7) (8).
公式(9)中,假定后端存储T秒内能够接收的block数量上限是C,因此备份线程和回收线程在T秒可以上传的block总数一定小于等于C。在灾备产品业务高峰期,备份线程很可能把上传带宽跑满;此时如果采用传统的基于副本时间点的数据回收方法,由公式(1)可知会直接影响备份的速度;但是通过优化后的数据回收可以动态调节collect线程、merge线程、update线程的数量减小,从而降低对备份的影响,整体上降低了灾备产品的灾难恢复时间目标(Recovery Time Object缩写:RTO)和数据恢复点目标(Recovery Point Objective缩写:RPO)。In formula (9), it is assumed that the upper limit of the number of blocks that the backend storage can receive within T seconds is C, so the total number of blocks that the backup thread and recycling thread can upload in T seconds must be less than or equal to C. During the peak period of the disaster recovery product business, the backup thread is likely to fill up the upload bandwidth; at this time, if the traditional data recovery method based on copy time points is used, it can be seen from formula (1) that it will directly affect the backup speed; but after optimization Data recovery can dynamically adjust the number of collect threads, merge threads, and update threads to reduce the impact on backup and overall reduce the disaster recovery time objective (Recovery Time Object abbreviation: RTO) and data recovery point of the disaster recovery product. Objective (Recovery Point Objective abbreviation: RPO).
步骤S12,遍历目标文件得到待回收数据分片对应的数据回收信息,并将数据回收信息写入目标哈希队列。Step S12: Traverse the target file to obtain data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue.
在本申请实施例中,步骤S12,遍历目标文件得到待回收数据分片对应的数据回收信息,包括以下步骤A1-A4:In the embodiment of this application, step S12 traverses the target file to obtain data recovery information corresponding to the data fragments to be recovered, including the following steps A1-A4:
步骤A1,调用收集线程遍历目标文件对应所有数据分片的引用计数。Step A1: Call the collection thread to traverse the reference counts of all data fragments corresponding to the target file.
步骤A2,对引用计数进行自减,得到自减后的引用计数,并将自减后的引用计数为0的数据分片作为待回收数据分片。Step A2: Decrement the reference count to obtain the decremented reference count, and use the data fragment with the decremented reference count as 0 as the data fragment to be recycled.
步骤A3,从目标文件中获取待回收数据分片对应的数据分片标识以及元数据。Step A3: Obtain the data fragment identifier and metadata corresponding to the data fragment to be recycled from the target file.
步骤A4,将数据分片标识以及元数据确定为数据回收信息。Step A4: Determine the data fragment identification and metadata as data recycling information.
在本申请实施例中,可以收集线程周期性检查元数据服务器(meta server)中是否有标记为删除的文件。如果元数据服务器返回了被标记删除的文件,则进入到查询引用管理模块将文件对应的数据分片的引用计数减一,并返回引用计数为0的数据分片(zero slice)的元数据信息,元数据包括:数据分片所在的后端存储文件路径和偏移。In this embodiment of the present application, the collection thread can periodically check whether there are files marked for deletion in the metadata server (meta server). If the metadata server returns a file that has been marked for deletion, it enters the query reference management module to decrement the reference count of the data slice corresponding to the file by one, and returns the metadata information of the data slice (zero slice) with a reference count of 0. , metadata includes: backend storage file path and offset where the data shards are located.
需要说明的是,收集线程循环检查引用计数为0的数据分片,这样的分片不会被任何文件引用,回收这些分片对备份/恢复的没有影响,因此完全支持并发。It should be noted that the collection thread circularly checks data fragments with a reference count of 0. Such fragments will not be referenced by any files. Recycling these fragments has no impact on backup/recovery, so concurrency is fully supported.
在本申请实施例中,并将数据回收信息写入目标哈希队列,包括:对元数据进行哈希计算得到计算结果,将计算结果以及数据分片标识以键值对结构存储写入目标哈希队列,其中,计算结果作为键值对结构中的键名,数据分片标识作为键值对结构中的键值。In the embodiment of this application, writing the data recovery information into the target hash queue includes: performing a hash calculation on the metadata to obtain the calculation result, and storing the calculation result and the data fragmentation identifier in a key-value pair structure and writing them into the target hash queue. Hash queue, in which the calculation result is used as the key name in the key-value pair structure, and the data shard identifier is used as the key value in the key-value pair structure.
作为一个示例,如图4所示,数据分片标识(zero slice)会存放到一个hash队列中,以元数据进行哈希计算得到计算结果作为key,数据分片(zero slice)为value。As an example, as shown in Figure 4, the data slice identifier (zero slice) will be stored in a hash queue, and the hash calculation result of the metadata will be used as the key, and the data slice (zero slice) will be the value.
步骤S13,利用目标哈希队列中的数据回收信息从后端存储文件中获取待回收数据分片。Step S13: Use the data recycling information in the target hash queue to obtain the data fragments to be recycled from the back-end storage file.
在本申请实施例中,利用目标哈希队列中的数据回收信息从后端存储文件中获取待回收数据分片,包括以下步骤B1-B3:In the embodiment of this application, the data recycling information in the target hash queue is used to obtain the data fragments to be recycled from the back-end storage file, including the following steps B1-B3:
步骤B1,调用合并线程从目标哈希队列依次提取数据分片标识。Step B1: Call the merge thread to sequentially extract data fragment identifiers from the target hash queue.
步骤B2,从备份数据库中获取数据分片标识对应的后端存储文件,并下载后端存储文件所包括的所有数据分片。Step B2: Obtain the back-end storage file corresponding to the data fragment identifier from the backup database, and download all data fragments included in the back-end storage file.
步骤B3,将与数据分片标识相匹配的数据分片确定为待回收数据分片。Step B3: Determine the data fragments matching the data fragment identifiers as the data fragments to be recycled.
在本申请实施例中,合并线程会从目标哈希队列中依次提取数据分片标识,然后从备份数据库中获取数据分片标识对应的后端存储文件,后端存储文件对应多个数据分片,此时合并进程会主动下载后端存储文件对应的所有数据分片。然后将数据分片分别与从哈希队列中提取的数据分片标识进行匹配,从而找到与该数据分片标识相匹配的数据分片,并将其确定为待回收数据分片。In the embodiment of this application, the merging thread will sequentially extract the data fragment identifiers from the target hash queue, and then obtain the back-end storage files corresponding to the data fragment identifiers from the backup database. The back-end storage files correspond to multiple data fragments. , at this time the merge process will actively download all data fragments corresponding to the back-end storage files. The data fragments are then matched with the data fragment identifiers extracted from the hash queue, thereby finding the data fragments matching the data fragment identifiers and determining them as data fragments to be recycled.
需要说明的是,合并线程会将下载的分片数据合并后上传,从而减少了文件碎片的产生。同时针对传统灾备软件数据回收面临单次清理数据量大的问题进行优化,使得数据回收粒度精细到数据分片,且周期性检测无效数据分片立即回收,从而提高灾备程序的响应速度,It should be noted that the merging thread will merge the downloaded fragmented data and then upload it, thereby reducing the generation of file fragments. At the same time, we have optimized the problem that traditional disaster recovery software data recovery faces in the large amount of data cleaned at a time, so that the data recovery granularity is as fine as data fragmentation, and invalid data fragments are periodically detected and recycled immediately, thereby improving the response speed of the disaster recovery program.
步骤S14,基于待回收数据分片生成合并文件,并通过合并文件执行数据回收操作。Step S14: Generate a merged file based on the data fragments to be recycled, and perform the data recycling operation through the merged file.
在本申请实施例中,基于待回收数据分片生成合并文件,包括:将待回收数据分片插入合并队列,检测合并队列的当前存储量,在当前存储量达到存储量上限的情况下,基于合并队列生成合并文件。In the embodiment of this application, generating a merge file based on the data fragments to be recycled includes: inserting the data fragments to be recycled into the merge queue, detecting the current storage amount of the merge queue, and when the current storage amount reaches the upper limit of the storage amount, based on The merge queue generates merge files.
在本申请实施例中,合并进程会将待回收的数据分片插入合并队列,然后检测合并队列当前已插入待回收数据的数量,得到当前存储量,对比当前存储量和合并队列对应的存储量上限,如果当前存储量达到存储量上限,则利用合并队列中的全部待回收数据生成合并文件。In the embodiment of this application, the merging process will insert the data fragments to be recycled into the merging queue, and then detect the amount of data to be recycled that has been inserted into the merging queue to obtain the current storage amount, and compare the current storage amount with the storage amount corresponding to the merging queue. If the current storage amount reaches the upper limit of the storage amount, all the data to be recycled in the merge queue will be used to generate a merge file.
在本申请实施例中,通过合并文件执行数据回收操作,包括以下步骤C1-C2:In the embodiment of this application, performing data recovery operations by merging files includes the following steps C1-C2:
步骤C1,调动更新线程上传合并文件,并对合并文件中待回收数据对应的原始数据分片偏移进行修改,得到目标数据分片偏移。Step C1: mobilize the update thread to upload the merged file, and modify the original data fragmentation offset corresponding to the data to be recycled in the merged file to obtain the target data fragmentation offset.
步骤C2,将目标数据分片偏移写入元数据服务器,并删除待回收数据对应的后端存储文件。Step C2: Write the target data fragment offset to the metadata server, and delete the backend storage file corresponding to the data to be recycled.
本申请实施例提供的方法可以使灾备程序在数据回收时对系统的性能消耗更低,保证了灾备产品核心业务的IO资源,降低了硬件成本且提升了用户体验。有效解决传统灾备程序在数据回收时面临的清理数据量大,清理耗时长,与备份恢复并发困难,容易产生文件碎片的问题。The method provided by the embodiments of this application can make the disaster recovery program consume less system performance during data recovery, ensure the IO resources for the core business of the disaster recovery product, reduce hardware costs, and improve user experience. It effectively solves the problems faced by traditional disaster recovery programs during data recovery: large amounts of data to be cleaned, long cleaning time, difficulty in concurrency with backup and recovery, and easy file fragmentation.
图5为本申请实施例提供的一种数据回收装置的框图,该装置可以通过软件、硬件或者两者的结合实现成为电子设备的部分或者全部。如图5所示,该装置包括:Figure 5 is a block diagram of a data recovery device provided by an embodiment of the present application. The device can be implemented as part or all of an electronic device through software, hardware, or a combination of both. As shown in Figure 5, the device includes:
获取模块51,用于获取当前检测周期对元数据数据库进行检测得到的携带有删除标记的至少一个目标文件,其中,删除标记用于表示目标文件所对应的后端存储文件中存在待回收数据分片;The acquisition module 51 is used to acquire at least one target file carrying a deletion mark that is detected in the metadata database during the current detection cycle, where the deletion mark is used to indicate that there is data to be recycled in the back-end storage file corresponding to the target file. piece;
遍历模块52,用于遍历目标文件得到待回收数据分片对应的数据回收信息,并将数据回收信息写入目标哈希队列;The traversal module 52 is used to traverse the target file to obtain the data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue;
提取模块53,用于利用目标哈希队列中的数据回收信息从后端存储文件中获取待回收数据分片;The extraction module 53 is used to obtain the data fragments to be recycled from the back-end storage file by using the data recycling information in the target hash queue;
处理模块54,用于基于待回收数据分片生成合并文件,并通过合并文件执行数据回收操作。The processing module 54 is configured to generate a merged file based on the data fragments to be recycled, and perform data recycling operations through the merged file.
在本申请实施例中,数据回收装置包括:配置模块,用于获取配置文件,其中,配置文件是依据目标灾备软件的运行情况得到;利用配置文件配置收集线程、合并线程以及更新线程的数量。In the embodiment of the present application, the data recovery device includes: a configuration module for obtaining a configuration file, where the configuration file is obtained based on the operation status of the target disaster recovery software; the configuration file is used to configure the number of collection threads, merging threads, and update threads .
在本申请实施例中,遍历模块52,用于调用收集线程遍历目标文件对应所有数据分片的引用计数;对引用计数进行自减,得到自减后的引用计数,并将自减后的引用计数为0的数据分片作为待回收数据分片;从目标文件中获取待回收数据分片对应的数据分片标识以及元数据;将数据分片标识以及元数据确定为数据回收信息。In the embodiment of this application, the traversal module 52 is used to call the collection thread to traverse the reference counts of all data fragments corresponding to the target file; decrement the reference count to obtain the decremented reference count, and add the decremented reference count to The data fragments with a count of 0 are used as data fragments to be recycled; the data fragment identifiers and metadata corresponding to the data fragments to be recycled are obtained from the target file; the data fragment identifiers and metadata are determined as data recycling information.
在本申请实施例中,遍历模块52,用于对元数据进行哈希计算得到计算结果;将计算结果以及数据分片标识以键值对结构存储写入目标哈希队列,其中,计算结果作为键值对结构中的键名,数据分片标识作为键值对结构中的键值。In the embodiment of this application, the traversal module 52 is used to perform hash calculation on metadata to obtain calculation results; store the calculation results and data fragmentation identifiers in a key-value pair structure and write them into the target hash queue, where the calculation results are as The key name in the key-value pair structure, and the data shard identifier is used as the key value in the key-value pair structure.
在本申请实施例中,提取模块53,用于调用合并线程从目标哈希队列依次提取数据分片标识;从备份数据库中获取数据分片标识对应的后端存储文件,并下载后端存储文件所包括的所有数据分片;将与数据分片标识相匹配的数据分片确定为待回收数据分片。In the embodiment of this application, the extraction module 53 is used to call the merge thread to sequentially extract the data fragment identifiers from the target hash queue; obtain the back-end storage file corresponding to the data fragment identifier from the backup database, and download the back-end storage file All data fragments included; the data fragments matching the data fragment identifiers are determined as data fragments to be recycled.
在本申请实施例中,处理模块54,用于将待回收数据分片插入合并队列;检测合并队列的当前存储量,在当前存储量达到存储量上限的情况下,基于合并队列生成合并文件。In the embodiment of this application, the processing module 54 is used to insert the data fragments to be recycled into the merge queue; detect the current storage amount of the merge queue, and generate a merge file based on the merge queue when the current storage amount reaches the upper limit of the storage amount.
在本申请实施例中,处理模块54,用于调动更新线程上传合并文件,并对合并文件中待回收数据对应的原始数据分片偏移进行修改,得到目标数据分片偏移;将目标数据分片偏移写入元数据服务器,并删除待回收数据对应的后端存储文件。In the embodiment of this application, the processing module 54 is used to mobilize the update thread to upload the merged file, and modify the original data fragmentation offset corresponding to the data to be recycled in the merged file to obtain the target data fragmentation offset; The shard offset is written to the metadata server, and the backend storage file corresponding to the data to be recycled is deleted.
本申请实施例还提供一种电子设备,如图6所示,电子设备可以包括:处理器1501、通信接口1502、存储器1503和通信总线1504,其中,处理器1501,通信接口1502,存储器1503通过通信总线1504完成相互间的通信。An embodiment of the present application also provides an electronic device. As shown in Figure 6, the electronic device may include: a processor 1501, a communication interface 1502, a memory 1503, and a communication bus 1504. The processor 1501, the communication interface 1502, and the memory 1503 pass The communication bus 1504 completes mutual communication.
存储器1503,用于存放计算机程序;Memory 1503, used to store computer programs;
处理器1501,用于执行存储器1503上所存放的计算机程序时,实现上述实施例的步骤。The processor 1501 is used to implement the steps of the above embodiment when executing the computer program stored on the memory 1503.
上述终端提到的通信总线可以是外设部件互连标准(Peripheral Component Interconnect,简称PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,简称EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The communication bus mentioned in the above terminal may be a peripheral component interconnection standard (Peripheral Component Interconnect (PCI for short) bus or Extended Industry Standard Architecture (EISA for short) bus, etc. The communication bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
通信接口用于上述终端与其他设备之间的通信。The communication interface is used for communication between the above terminal and other devices.
存储器可以包括随机存取存储器(Random Access Memory,简称RAM),也可以包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。Memory may include random access memory (Random Access Memory (RAM for short) may also include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one storage device located far away from the aforementioned processor.
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(Digital Signal Processing,简称DSP)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)、现场可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor may be a general-purpose processor, including a central processing unit (Central Processing Unit). Processing Unit (referred to as CPU), network processor (Network Processor, referred to as NP), etc.; it can also be a digital signal processor (Digital Signal Processing (referred to as DSP), application specific integrated circuit (Application Specific Integrated Circuit (ASIC for short), Field-Programmable Gate Array (FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
在本申请提供的又一实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述实施例中任一所述的数据回收方法。In yet another embodiment provided by this application, a computer-readable storage medium is also provided. The computer-readable storage medium stores instructions, which when run on a computer, cause the computer to execute any one of the above embodiments. The data recovery method described.
在本申请提供的又一实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述实施例中任一所述的数据回收方法。In yet another embodiment provided by this application, a computer program product containing instructions is also provided, which when run on a computer causes the computer to execute the data recovery method described in any of the above embodiments.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk)等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website site, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated. The available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk), etc.
以上所述仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本申请的保护范围内。The above descriptions are only preferred embodiments of the present application and are not intended to limit the protection scope of the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this application are included in the protection scope of this application.
以上所述仅是本申请的具体实施方式,使本领域技术人员能够理解或实现本申请。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所申请的原理和新颖特点相一致的最宽的范围。The above descriptions are only specific embodiments of the present application, enabling those skilled in the art to understand or implement the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be practiced in other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features claimed herein.

Claims (10)

  1. 一种数据回收方法,其特征在于,包括:A data recovery method, characterized by including:
    获取当前检测周期对元数据数据库进行检测得到的携带有删除标记的至少一个目标文件,其中,所述删除标记用于表示所述目标文件所对应的后端存储文件中存在待回收数据分片;Obtain at least one target file carrying a deletion mark that is detected in the metadata database during the current detection cycle, where the deletion mark is used to indicate that there are data fragments to be recycled in the back-end storage file corresponding to the target file;
    遍历所述目标文件得到所述待回收数据分片对应的数据回收信息,并将所述数据回收信息写入目标哈希队列;Traverse the target file to obtain the data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue;
    利用所述目标哈希队列中的数据回收信息从所述后端存储文件中获取待回收数据分片;Using the data recycling information in the target hash queue to obtain the data fragments to be recycled from the backend storage file;
    基于待回收数据分片生成合并文件,并通过所述合并文件执行数据回收操作。A merged file is generated based on the data fragments to be recycled, and the data recycling operation is performed through the merged file.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, further comprising:
    获取配置文件,其中,所述配置文件是依据目标灾备软件的运行情况得到;Obtain the configuration file, where the configuration file is obtained based on the operation status of the target disaster recovery software;
    利用所述配置文件配置收集线程、合并线程以及更新线程的数量。Use the configuration file to configure the number of collection threads, merge threads, and update threads.
  3. 根据权利要求2所述的方法,其特征在于,所述遍历所述目标文件得到所述待回收数据分片对应的数据回收信息,包括:The method according to claim 2, characterized in that traversing the target file to obtain data recovery information corresponding to the data fragments to be recovered includes:
    调用所述收集线程遍历所述目标文件对应所有数据分片的引用计数;Call the collection thread to traverse the reference counts of all data fragments corresponding to the target file;
    对所述引用计数进行自减,得到自减后的引用计数,并将自减后的引用计数为0的数据分片作为待回收数据分片;Decrement the reference count to obtain the decremented reference count, and use the data fragments with the decremented reference count as 0 as the data fragments to be recycled;
    从目标文件中获取所述待回收数据分片对应的数据分片标识以及元数据;Obtain the data fragment identification and metadata corresponding to the data fragment to be recycled from the target file;
    将所述数据分片标识以及所述元数据确定为所述数据回收信息。The data fragment identification and the metadata are determined as the data recycling information.
  4. 根据权利要求3所述的方法,其特征在于,所述并将所述数据回收信息写入目标哈希队列,包括:The method according to claim 3, characterized in that writing the data recycling information into the target hash queue includes:
    对所述元数据进行哈希计算得到计算结果;Perform a hash calculation on the metadata to obtain the calculation result;
    将所述计算结果以及所述数据分片标识以键值对结构存储写入目标哈希队列,其中,所述计算结果作为键值对结构中的键名,所述数据分片标识作为所述键值对结构中的键值。The calculation result and the data fragmentation identifier are stored and written into the target hash queue in a key-value pair structure, where the calculation result is used as the key name in the key-value pair structure, and the data fragmentation identification is used as the key-value pair structure. The key value in the key-value pair structure.
  5. 根据权利要求2所述的方法,其特征在于,所述利用所述目标哈希队列中的数据回收信息从所述后端存储文件中获取待回收数据分片,包括:The method according to claim 2, characterized in that using the data recycling information in the target hash queue to obtain the data fragments to be recycled from the back-end storage file includes:
    调用所述合并线程从所述目标哈希队列依次提取数据分片标识;Call the merge thread to sequentially extract data fragmentation identifiers from the target hash queue;
    从备份数据库中获取所述数据分片标识对应的后端存储文件,并下载所述后端存储文件所包括的所有数据分片;Obtain the back-end storage file corresponding to the data fragment identification from the backup database, and download all data fragments included in the back-end storage file;
    将与所述数据分片标识相匹配的数据分片确定为所述待回收数据分片。The data fragments matching the data fragment identifiers are determined as the data fragments to be recycled.
  6. 根据权利要求5所述的方法,其特征在于,所述基于待回收数据分片生成合并文件,包括:The method according to claim 5, characterized in that generating a merged file based on data fragments to be recycled includes:
    将所述待回收数据分片插入合并队列;Insert the data fragments to be recycled into the merge queue;
    检测所述合并队列的当前存储量,在所述当前存储量达到存储量上限的情况下,基于所述合并队列生成所述合并文件。The current storage amount of the merge queue is detected, and when the current storage amount reaches the upper limit of the storage amount, the merge file is generated based on the merge queue.
  7. 根据权利要求2所述的方法,其特征在于,所述通过所述合并文件执行数据回收操作,包括:The method according to claim 2, wherein performing a data recovery operation through the merged file includes:
    调动所述更新线程上传合并文件,并对合并文件中待回收数据对应的原始数据分片偏移进行修改,得到目标数据分片偏移;Mobilize the update thread to upload the merged file, and modify the original data fragmentation offset corresponding to the data to be recycled in the merged file to obtain the target data fragmentation offset;
    将所述目标数据分片偏移写入所述元数据服务器,并删除所述待回收数据对应的后端存储文件。Write the target data fragment offset into the metadata server, and delete the backend storage file corresponding to the data to be recycled.
  8. 一种数据回收装置,其特征在于,包括:A data recovery device, characterized by including:
    获取模块,用于获取当前检测周期对元数据数据库进行检测得到的携带有删除标记的至少一个目标文件,其中,所述删除标记用于表示所述目标文件所对应的后端存储文件中存在待回收数据分片;The acquisition module is used to acquire at least one target file carrying a deletion mark that is detected in the metadata database during the current detection cycle, wherein the deletion mark is used to indicate that there is a pending file in the back-end storage file corresponding to the target file. Recycle data fragments;
    遍历模块,用于遍历所述目标文件得到所述待回收数据分片对应的数据回收信息,并将所述数据回收信息写入目标哈希队列;A traversal module, configured to traverse the target file to obtain the data recovery information corresponding to the data fragments to be recovered, and write the data recovery information into the target hash queue;
    提取模块,用于利用所述目标哈希队列中的数据回收信息从所述后端存储文件中获取待回收数据分片;An extraction module, configured to use the data recycling information in the target hash queue to obtain the data fragments to be recycled from the back-end storage file;
    处理模块,用于基于待回收数据分片生成合并文件,并通过所述合并文件执行数据回收操作。A processing module, configured to generate a merged file based on the data fragments to be recycled, and perform data recycling operations through the merged file.
  9. 一种存储介质,其特征在于,所述存储介质包括存储的程序,其中,所述程序运行时执行上述权利要求1至7中任一项所述的方法步骤。A storage medium, characterized in that the storage medium includes a stored program, wherein when the program is run, the method steps described in any one of claims 1 to 7 are executed.
  10. 一种电子设备,其特征在于,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;其中:An electronic device, characterized in that it includes a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein:
    存储器,用于存放计算机程序;Memory, used to store computer programs;
    处理器,用于通过运行存储器上所存放的程序来执行权利要求1至7中任一项所述的方法步骤。A processor, configured to execute the method steps of any one of claims 1 to 7 by running a program stored on the memory.
PCT/CN2022/141825 2022-07-29 2022-12-26 Data recycling method and apparatus, electronic device, and storage medium WO2024021492A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210908251.4 2022-07-29
CN202210908251.4A CN115757269A (en) 2022-07-29 2022-07-29 Data recovery method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2024021492A1 true WO2024021492A1 (en) 2024-02-01

Family

ID=85349091

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/141825 WO2024021492A1 (en) 2022-07-29 2022-12-26 Data recycling method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN115757269A (en)
WO (1) WO2024021492A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170031772A1 (en) * 2015-07-31 2017-02-02 Netapp Inc. Incremental transfer with unused data block reclamation
CN109976668A (en) * 2019-03-14 2019-07-05 北京达佳互联信息技术有限公司 Data-erasure method, data deletion apparatus and computer readable storage medium
CN112328549A (en) * 2020-10-29 2021-02-05 无锡先进技术研究院 Small file storage method, electronic device and storage medium
CN113434465A (en) * 2021-06-25 2021-09-24 北京金山云网络技术有限公司 Data processing method and device and electronic equipment
CN113900991A (en) * 2021-10-11 2022-01-07 北京青云科技股份有限公司 Data interaction method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170031772A1 (en) * 2015-07-31 2017-02-02 Netapp Inc. Incremental transfer with unused data block reclamation
CN109976668A (en) * 2019-03-14 2019-07-05 北京达佳互联信息技术有限公司 Data-erasure method, data deletion apparatus and computer readable storage medium
CN112328549A (en) * 2020-10-29 2021-02-05 无锡先进技术研究院 Small file storage method, electronic device and storage medium
CN113434465A (en) * 2021-06-25 2021-09-24 北京金山云网络技术有限公司 Data processing method and device and electronic equipment
CN113900991A (en) * 2021-10-11 2022-01-07 北京青云科技股份有限公司 Data interaction method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115757269A (en) 2023-03-07

Similar Documents

Publication Publication Date Title
CN110045912B (en) Data processing method and device
US9830101B2 (en) Managing data storage in a set of storage systems using usage counters
US9952940B2 (en) Method of operating a shared nothing cluster system
US9305056B1 (en) Results cache invalidation
US8825598B2 (en) Media file synchronization
US20170123935A1 (en) Cloud object data layout (codl)
US20110196822A1 (en) Method and System For Uploading Data Into A Distributed Storage System
US10657150B2 (en) Secure deletion operations in a wide area network
US10884926B2 (en) Method and system for distributed storage using client-side global persistent cache
CN108540510B (en) Cloud host creation method and device and cloud service system
US10262024B1 (en) Providing consistent access to data objects transcending storage limitations in a non-relational data store
US10031948B1 (en) Idempotence service
WO2014056145A1 (en) Method and system for making web application obtain database change
US11100129B1 (en) Providing a consistent view of associations between independently replicated data objects
US10664349B2 (en) Method and device for file storage
WO2020192663A1 (en) Data management method and related device
WO2024021492A1 (en) Data recycling method and apparatus, electronic device, and storage medium
WO2024082525A1 (en) File snapshot method and system, electronic device, and storage medium
JP7221652B6 (en) External change detection
US10452637B1 (en) Migration of mutable data sets between data stores
WO2015049719A1 (en) Storage system and storage method
WO2021104100A1 (en) Url refreshing method, apparatus and device in cdn, and cdn node
US11886439B1 (en) Asynchronous change data capture for direct external transmission
US20170091253A1 (en) Interrupted synchronization detection and recovery
CN111966845A (en) Picture management method and device, storage node and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22952921

Country of ref document: EP

Kind code of ref document: A1