WO2023246754A1 - 一种数据重删方法及相关系统 - Google Patents

一种数据重删方法及相关系统 Download PDF

Info

Publication number
WO2023246754A1
WO2023246754A1 PCT/CN2023/101303 CN2023101303W WO2023246754A1 WO 2023246754 A1 WO2023246754 A1 WO 2023246754A1 CN 2023101303 W CN2023101303 W CN 2023101303W WO 2023246754 A1 WO2023246754 A1 WO 2023246754A1
Authority
WO
WIPO (PCT)
Prior art keywords
partition
data block
data
fingerprint
metadata
Prior art date
Application number
PCT/CN2023/101303
Other languages
English (en)
French (fr)
Inventor
朱洪德
董如良
陈泽晖
罗斯哲
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023246754A1 publication Critical patent/WO2023246754A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system

Definitions

  • the present application relates to the field of storage technology, and in particular, to a data deduplication method, device, storage system, computer-readable storage medium, and computer program product.
  • Data duplication deletion can also be referred to as deduplication. Specifically, it divides the data into blocks, calculates the fingerprints of the data blocks based on the contents of the data blocks, and then compares the fingerprints of different data blocks to identify and delete data blocks with duplicate content. , thereby achieving the goal of eliminating data redundancy.
  • DD Data Deduplication
  • the fingerprint of the data block is usually written to the log file in the append writing mode.
  • the fingerprints in the log file are sorted by manual or periodic triggering, and the sorted fingerprints are merged with the fingerprints in the fingerprint file, and data blocks with duplicate content are deleted based on the merging results.
  • the data stored in the data center can be divided into frequently updated data and infrequently updated data according to the update frequency.
  • the proportion of frequently updated data is usually higher than the proportion of infrequently updated data.
  • Frequently updated data is usually difficult to be deduplicated, that is, the proportion of data that is difficult to be deduplicated is high. In this case, resource congestion will occur, and the proportion of space allocated to infrequently updated data will be lower. This leads to easy elimination, resulting in a loss of deduplication rate.
  • This application provides a data deduplication method, which actively partitions the metadata management structure and writes metadata such as the fingerprint and address information of the data block into partitions corresponding to the characteristics of the data block, thereby avoiding incomplete updates. Frequently updated data takes up resources and is eliminated, increasing the deduplication rate.
  • This application also provides devices, storage systems, computer-readable storage media and computer program products corresponding to the above methods.
  • this application provides a data deduplication method.
  • This method can be applied to storage systems, including centralized storage systems or distributed storage systems.
  • centralized storage systems can also be divided into centralized storage systems with integrated disk and control or separated disk and control
  • distributed storage systems can also be divided into distributed storage systems with integrated storage and computing or distributed storage systems with separated storage and computing.
  • the centralized storage system has an engine.
  • the engine includes a controller.
  • the controller may include a processor and a memory.
  • the processor may load the program code in the memory to execute the data deduplication method of the present application.
  • a distributed storage system includes computing nodes and storage nodes.
  • the computing nodes include processors and memory. The processor can load the program code in the memory to execute the data deduplication method of the present application.
  • the storage system receives a write request, the write request includes the first data block, then the storage system writes the first data block to the storage device (for example, a hard disk), and then writes the metadata of the first data block into the metadata.
  • the metadata of the first data block includes fingerprint and address information of the first data block.
  • the storage system deletes the metadata of the first data block in the first partition and deletes the first data from the storage device according to the address information of the first data block. piece.
  • the metadata of data blocks with different characteristics can be written into different partitions of the metadata management structure.
  • the metadata of data blocks that are updated frequently can be written into partitions with smaller capacity
  • the metadata of data blocks that are updated infrequently can be written into partitions with smaller capacity.
  • Data can be written to larger capacity partitions, thus This prevents data that is updated infrequently from occupying resources by data that is updated frequently and is eliminated, thereby improving the deduplication rate.
  • the characteristic of the first data block is a fingerprint of the first data block. It should be noted that different data blocks can correspond to the same fingerprint. For example, during backup, there can be multiple data blocks corresponding to the same fingerprint.
  • the storage system When the storage system writes the metadata of the first data block into the metadata management structure, it can first determine the heat of the fingerprint corresponding to the first data block, and determine the metadata corresponding to the heat based on the heat of the fingerprint corresponding to the first data block. A first partition of a plurality of partitions of the data management structure, and then writing the metadata of the first data block to the first partition.
  • the storage system determines the popularity of the fingerprint corresponding to the first data block and writes the first data block into the corresponding first partition based on the popularity, thereby preventing infrequently updated data from being occupied by frequently updated data. Then they are eliminated and the deduplication rate increases.
  • the address information of the first data block includes the logical address of the first data block.
  • the storage system can also write the metadata of the first data block to the first partition.
  • the popularity of the fingerprint corresponding to the data block is updated.
  • the storage system may determine the popularity of the logical address of the first data block, and add the popularity of the logical address to the popularity of the fingerprint corresponding to the first data block to update the popularity of the fingerprint corresponding to the first data block.
  • the popularity of the fingerprint is updated, which can provide a reference for subsequent writing of metadata.
  • the write request may also include a second data block, and the popularity of the fingerprint corresponding to the second data block may be higher than the popularity of the fingerprint corresponding to the first data block.
  • the storage system may also write the second data block to the storage device, and write the metadata of the second data block to the second partition among the multiple partitions of the metadata management structure. The capacity of the second partition is smaller than the capacity of the first partition.
  • This method writes data blocks with different fingerprint popularity into different partitions of the metadata management structure, thereby preventing the metadata of data blocks with low popularity from being occupied by the metadata of data blocks with high popularity and being eliminated, thus improving the efficiency of the process. Deduplication rate.
  • the write request may also include a third data block, and the fingerprint of the third data block is the same as the fingerprint of the first data block.
  • the heat of the fingerprint corresponding to the first data block is less than the preset heat.
  • the heat of the fingerprint corresponding to the third data block is greater than the preset heat. hotness, the storage system can write the metadata of the third data block to the second partition, and move the metadata of the data block in the first partition that has the same fingerprint as the third data block to the second partition.
  • the storage location of the metadata can be adjusted.
  • the metadata of the data blocks with higher fingerprint popularity can be moved to the second partition, so that the fingerprint's metadata can be stored in the first partition.
  • Storage space is reserved for the metadata of less popular data blocks, and the metadata of data blocks with the same fingerprint is stored in the second partition to support the second partition to trigger deduplication and further improve the deduplication rate.
  • the storage system can also write the metadata of the third data block to the second partition, and eliminate the metadata of the data block with the same fingerprint as the third data block in the first partition without moving it to
  • the second partition can, on the one hand, reduce the movement overhead, and on the other hand, it can reserve storage space in the first partition for the metadata of data blocks with lower fingerprint popularity, further improving the deduplication rate.
  • the capacity of multiple partitions of the metadata management structure is determined according to a partition decision model.
  • the partition decision model is used to predict the corresponding partition revenue after each partition capacity combination in the preset partition capacity combination is applied to the metadata management structure, and determine the partition capacity combination with the largest partition revenue as the metadata management structure
  • the capacity of multiple partitions, the partition revenue is determined based on at least one of a deduplication rate and a partition adjustment cost.
  • This method builds a partition decision-making model and actively partitions the metadata management structure through the partition decision-making model, thus avoiding frequently updated data from occupying the resources of infrequently updated data, thereby preventing infrequently updated data from being eliminated and causing losses.
  • Deduplication rate This method builds a partition decision-making model and actively partitions the metadata management structure through the partition decision-making model, thus avoiding frequently updated data from occupying the resources of infrequently updated data, thereby preventing infrequently updated data from being eliminated and causing losses. Deduplication rate.
  • the partition revenue may be a deduplication rate
  • the preset partition capacity combination may include the first capacity combination.
  • the partition decision model can predict the deduplication rate after the partition capacity combination is applied to the data management structure by estimating the hit rate of the data.
  • the partition decision model predicts the corresponding deduplication rate after the first partition capacity combination is applied to the metadata management structure in the following way: obtaining the result formed by applying the first partition capacity combination to the metadata management structure.
  • the workload characteristics corresponding to each partition in the multiple partitions according to the workload characteristics corresponding to each partition, the data distribution corresponding to each partition is obtained, according to the data distribution corresponding to each partition and the capacity of each partition , to obtain the deduplication rate.
  • This method is based on the workload characteristics corresponding to each partition and fits the data distribution corresponding to each partition. Based on the data distribution and partition capacity corresponding to each partition, the hit rate can be predicted, and then the corresponding deduplication rate after applying the partition capacity combination is predicted. By actually running the storage system, the partition capacity combination with the highest deduplication rate can be predicted at a lower cost, which can meet business needs.
  • the capacity of a partition can also be adjusted to account for possible changes in workload.
  • storage system The system can periodically adjust the capacity of the partition.
  • partition adjustment requires re-initialization of partitions and other operations, resulting in partition adjustment costs.
  • the partition decision model can predict the partition adjustment cost based on the partition capacity ratio before and after adjustment.
  • the partition decision model can predict partition benefits based on yields and partition adjustment costs. For example, the partition decision model can take the difference between the predicted return and the predicted partition adjustment cost as the predicted partition benefit.
  • This method can make the evaluation of partition revenue more accurate and reasonable by reconstructing partition revenue.
  • the determined partition capacity combination has more reference value and can achieve the deduplication rate and partition adjustment cost. of equilibrium.
  • the storage system can periodically adjust the capacity of multiple partitions of the metadata management structure.
  • the storage system can adjust the capacity according to the partition revenue, partition capacity combination, or individual partition corresponding to the period before the adjustment time.
  • the workload characteristics corresponding to the partition determine whether to adjust the capacity of multiple partitions.
  • This method uses feedback information corresponding to the period before the adjustment moment, such as partition revenue, partition capacity combination or workload characteristics corresponding to each partition, to decide whether to adjust the capacity of the partition, so that the partition can be flexibly adjusted based on changes in workload, as long as It may guarantee better zoning income at different stages.
  • the storage system can store fingerprints that are the same as the fingerprints of the first data block in the first partition, and when the number of fingerprints that are the same as the fingerprints of the first data block in the first partition reaches a preset threshold. , delete the metadata of the first data block, and delete the first data block from the storage device according to the address information of the first data block.
  • the preset threshold can be set based on empirical values. For example, the preset threshold can be set to 1. If the same fingerprint as the first data block exists in the first partition, the storage system will delete the metadata of the first data block and the first data block. For another example, the preset threshold can be set to 2. If the fingerprints of two data blocks in the first partition are the same as the fingerprints of the first data block, the storage system deletes the metadata of the first data block and the first data block, and further Otherwise, the storage system retains one of the two data blocks and its metadata that has the same fingerprint as the first data block, and deletes the other data block and its metadata.
  • the preset threshold When the preset threshold is set to a smaller value, redundant data blocks and metadata can be deleted in time. When the preset threshold is set to a larger value, the number of deduplications can be reduced and frequent deduplications can be avoided to occupy a large amount of resources and affect the business. normal operation.
  • the address information of the first data block is the logical address of the first data block.
  • the storage system may also write the logical address and physical address of the first data block into the address mapping table.
  • the storage system can obtain the physical address of the first data block from the address mapping table based on the logical address of the first data block, and then obtain the physical address from the first data block based on the physical address.
  • the first data block is found in the storage device and the first data block is deleted.
  • the storage system directly locates the first data block based on the single-hop mapping from the logical address to the physical address in the address mapping table, which shortens the search time and improves the deduplication efficiency.
  • the storage system can also modify the physical address of the first data block in the forward mapping table to the physical address of the first data block. fingerprint.
  • This method can realize the relocation of the deduplicated data block, so that the physical address of the data block with the same fingerprint can be found later based on the fingerprint, and the data block can be accessed by accessing the physical address.
  • the storage system may also eliminate the metadata in the at least one partition when the at least one partition in the inverse mapping table meets the elimination condition.
  • This method eliminates metadata in the inverse mapping table to reduce the scale of metadata, thereby reducing memory overhead and ensuring system performance.
  • the present application provides a data deduplication device.
  • the device includes:
  • a communication module configured to receive a write request, where the write request includes the first data block
  • a data writing module used to write the first data block into a storage device
  • the data writing module is also configured to write metadata of the first data block into a first partition among multiple partitions of the metadata management structure, where the first partition is based on the characteristics of the first data block. It is determined that the metadata of the first data block includes the fingerprint and address information of the first data block;
  • a deduplication module configured to delete the metadata of the first data block in the first partition when the same fingerprint as the fingerprint of the first data block exists in the first partition, and delete the metadata of the first data block in the first partition according to the first data block. Address information of a data block deletes the first data block from the storage device.
  • the characteristic of the first data block is the fingerprint corresponding to the first data block
  • the data writing module is specifically used to:
  • the data writing module is also used to:
  • the popularity of the logical address is accumulated to the popularity of the fingerprint corresponding to the first data block to update the popularity of the fingerprint corresponding to the first data block.
  • the write request also includes a second data block, and the fingerprint corresponding to the second data block has a higher popularity than the fingerprint corresponding to the first data block.
  • the write data module Also used for:
  • the capacity of multiple partitions of the metadata management structure is determined according to a partition decision model, and the partition decision model is used to predict that each of the preset partition capacity combinations should be used in the The corresponding partition revenue after the metadata management structure is determined, and the partition capacity combination with the largest partition revenue is determined as the capacity of multiple partitions of the metadata management structure.
  • the partition revenue is determined based on at least one of the deduplication rate and the partition adjustment cost. .
  • the partition revenue is a deduplication rate
  • the preset partition capacity combination includes a first partition capacity combination
  • the partition decision model predicts that the first partition capacity combination is used in The corresponding deduplication rate after the metadata management structure is:
  • the deduplication rate is obtained according to the data distribution corresponding to each partition and the capacity of each partition.
  • the device further includes a partitioning module, the partitioning module is used for:
  • the deduplication module is specifically used to:
  • the address information of the first data block is the logical address of the first data block
  • the data writing module is also used to:
  • the deduplication module is specifically used for:
  • the deduplication module is also used to:
  • the device further includes:
  • An elimination module configured to eliminate the metadata in the at least one partition when at least one partition in the inverse mapping table meets the elimination condition.
  • this application provides a computer cluster.
  • the computer cluster includes at least one computer including at least one processor and at least one memory.
  • the at least one processor and the at least one memory communicate with each other.
  • the at least one processor is configured to execute instructions stored in the at least one memory, so that the computer or computer cluster executes the data deduplication method as described in the first aspect or any implementation of the first aspect.
  • the present application provides a computer-readable storage medium that stores instructions instructing a computer or a computer cluster to execute the above-mentioned first aspect or any implementation of the first aspect. Described data deduplication method.
  • the present application provides a computer program product containing instructions that, when run on a computer or computer cluster, causes the computer or computer cluster to execute the above-mentioned first aspect or any implementation of the first aspect.
  • Data deduplication method
  • Figure 1 is a system architecture diagram of a centralized storage system provided by an embodiment of the present application.
  • Figure 2 is a system architecture diagram of a distributed storage system provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of appending to a log file for metadata management provided by an embodiment of the present application
  • Figure 4 is a schematic diagram of metadata management through log files and inverse mapping tables provided by an embodiment of the present application
  • Figure 5 is a flow chart of a data deduplication method provided by an embodiment of the present application.
  • Figure 6 is a flow chart of a data deduplication method provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of system resource feature extraction provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of feature merging provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of obtaining structured features provided by an embodiment of the present application.
  • Figure 10 is a schematic flow chart of partition decision modeling provided by an embodiment of the present application.
  • Figure 11 is a schematic flowchart of an evaluation strategy selection provided by an embodiment of the present application.
  • Figure 12 is a schematic diagram of an application scenario of a data deduplication method provided by an embodiment of the present application.
  • Figure 13 is a schematic flowchart of applying a data deduplication method to global cache according to an embodiment of the present application
  • Figure 14 is a schematic structural diagram of a data deduplication device provided by an embodiment of the present application.
  • first and second in the embodiments of this application are only used for descriptive purposes and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, features defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • Data Deduplication also referred to as deduplication, is a data reduction solution that deletes duplicate data so that only one copy of the same data is stored in the storage medium, thereby saving data storage space.
  • Deduplication can be achieved by dividing the data into blocks, calculating the fingerprint (FP) of the data block based on the content of the data block, and then comparing the fingerprints of different data blocks to identify and delete data blocks with duplicate content. The goal of eliminating data redundancy.
  • Fingerprint refers to the identity information used to identify the data block based on the content of the data block.
  • the fingerprint of the data block may be a message digest calculated on the contents of the data block using a message digest algorithm.
  • the message digest algorithm is usually implemented based on a hash function, that is, a hash function. Therefore, the fingerprint of the data block can also be a hash value or hash value determined by a hash function or hash function.
  • Deduplication can also be classified by execution time.
  • deduplication may include pre-deduplication or post-deduplication.
  • Pre-deduplication means that data is deduplicated before it is written to the storage medium (referred to as storage, which can be a hard disk or other device).
  • Pre-deduplication can also be called online deduplication.
  • Post-deduplication means that the data is deduplicated after it is written to the storage medium (such as a hard disk and other devices).
  • Post-deduplication is also called background deduplication and offline deduplication.
  • the data deduplication method provided by the embodiments of this application can be applied to different application scenarios, for example, it can be applied to centralized storage systems or distributed storage systems.
  • the so-called centralized storage system refers to a central node composed of one or more main devices. Data is centrally stored in this central node, and all data processing services of the entire system are centrally deployed on this central node. In other words, in a centralized storage system, the terminal or client is only responsible for data input and output, while the storage and control processing of data are completely handed over to the central node.
  • the biggest feature of the centralized system is that the deployment structure is simple. There is no need to consider how to deploy services on multiple nodes, and there is no need to consider distributed collaboration issues between multiple nodes.
  • the centralized storage system can include a centralized storage system with integrated disk and control, or a centralized storage system with separate disk and control.
  • disk control Integration means that the storage medium (such as a hard disk) and the controller are integrated, while disk-control separation means that the storage medium and the controller are separated.
  • FIG 1 is a system architecture diagram of a centralized storage system applied in an embodiment of the present application.
  • the computers that run these applications are called "application servers.”
  • the application server 100 may be a physical machine or a virtual machine formed by virtualizing a physical machine. Physical machines include, but are not limited to, desktop computers, servers, laptops, and mobile devices.
  • the application server accesses the storage system through the fiber switch 110 to access data.
  • the switch 110 is only an optional device, and the application server 100 can also directly communicate with the storage system 120 through the network.
  • the optical fiber switch 110 can also be replaced with an Ethernet switch, an InfiniBand (IB) switch, a Remote Direct Memory Access (RDMA over Converged Ethernet, RoCE) switch based on Converged Ethernet, etc.
  • IB InfiniBand
  • RDMA over Converged Ethernet RoCE
  • the storage system 120 shown in Figure 1 is a centralized storage system.
  • the characteristic of the centralized storage system is that it has a unified entrance. All data from external devices must pass through this entrance.
  • This entrance is the engine 121 of the centralized storage system.
  • Engine 121 is the most core component in the centralized storage system, and many advanced functions of the storage system are implemented in it.
  • FIG. 1 there are one or more controllers in the engine 121.
  • Figure 1 illustrates this by taking the engine including two controllers as an example.
  • controller 0 writes a copy of data to its memory 124, it can send a copy of the data to controller 1 through the mirror channel.
  • Controller 1 The copy is stored in own local memory 124. Therefore, controller 0 and controller 1 are mutual backups.
  • controller 0 fails, controller 1 can take over the services of controller 0.
  • controller 1 fails, controller 0 can take over the services of controller 1. services, thereby preventing hardware failure from causing the entire storage system 120 to become unavailable.
  • four controllers are deployed in the engine 121, there are mirror channels between any two controllers, so any two controllers are backups of each other.
  • the engine 121 also includes a front-end interface 125 and a back-end interface 126, where the front-end interface 125 is used to communicate with the application server 100 to provide storage services for the application server 100.
  • the backend interface 126 is used to communicate with the hard disk 134 to expand the capacity of the storage system. Through the backend interface 126, the engine 121 can connect more hard disks 134, thereby forming a very large storage resource pool.
  • the controller 0 at least includes a processor 123 and a memory 124 .
  • the processor 123 is a central processing unit (CPU), used to process data access requests (such as read requests or write requests) from outside the storage system (servers or other storage systems), and also used to process data within the storage system. Generated request. For example, when the processor 123 receives write requests sent by the application server 100 through the front-end port 125, the data in these write requests will be temporarily stored in the memory 124. When the total amount of data in the memory 124 reaches a certain threshold, the processor 123 sends the data stored in the memory 124 to the hard disk 134 for persistent storage through the back-end port.
  • Memory 124 refers to the internal memory that directly exchanges data with the processor. It can read and write data at any time and very quickly, and serves as a temporary data storage for the operating system or other running programs.
  • Memory includes at least two types of memory.
  • memory can be either random access memory or read-only memory (ROM).
  • ROM read-only memory
  • random access memory is dynamic random access memory (Dynamic Random Access Memory, DRAM), or storage class memory (Storage Class Memory, SCM).
  • DRAM Dynamic Random Access Memory
  • SCM Storage Class Memory
  • DRAM is a semiconductor memory that, like most Random Access Memory (RAM), is a volatile memory device.
  • SCM is a composite storage technology that combines the characteristics of traditional storage devices and memory. Storage-level memory can provide faster read and write speeds than hard disks, but is slower than DRAM in terms of access speed and cheaper than DRAM in cost.
  • DRAM and SCM are only exemplary illustrations in this embodiment, and the memory may also include other random access memories, such as static random access memory (Static Random Access Memory, SRAM), etc.
  • static random access memory Static Random Access Memory, SRAM
  • read-only memory for example, it can be programmable read-only memory (Programmable Read Only Memory, PROM), erasable programmable read-only memory (Erasable Programmable Read Only Memory, EPROM), etc.
  • the memory 124 can also be a dual in-line memory module or a dual in-line memory module (Dual In-line Memory Module, DIMM for short), that is, a module composed of dynamic random access memory (DRAM), or a solid state drive. (Solid State Disk, SSD).
  • the controller 0 may be configured with multiple memories 124 and different types of memories 124 . This embodiment does not limit the number and type of memories 113 .
  • the memory 124 can be configured to have a power-saving function.
  • the power-guaranteing function means that the data stored in the memory 124 will not be lost when the system is powered off and then on again.
  • Memory with a power-saving function is called non-volatile memory.
  • the memory 124 stores software programs, and the processor 123 runs the software programs in the memory 124 to manage the hard disk.
  • the hard disk is abstracted into a storage resource pool, and then divided into logical unit devices (logic unit number devices, LUNs) for server use.
  • LUN here is actually the hard disk seen on the server.
  • some centralized storage systems are also file servers themselves, Can provide shared file services to the server.
  • controller 1 (and other controllers not shown in Figure 1) are similar to controller 0 and will not be described again here.
  • Figure 1 shows a centralized storage system with separate disk and control.
  • the engine 121 may not have a hard disk slot, and the hard disk 134 may be placed in the hard disk frame 130 .
  • the hard disk enclosure 130 may be a Serial Attached Small Computer System Interface (SAS) hard disk enclosure, or it may be a non-volatile memory host controller interface Standard (non-volatile memory express, NVMe) disk enclosures, Internet Protocol (Internet Protocol, IP) disk enclosures and other types of disk enclosures.
  • SAS hard disk enclosures adopt the SAS3.0 protocol, and each enclosure supports 25 SAS hard disks.
  • the engine 121 is connected to the hard disk enclosure 130 through an onboard SAS interface or a SAS interface module.
  • the NVMe hard disk enclosure is more like a complete computer system.
  • the NVMe hard disk is inserted into the NVMe hard disk enclosure.
  • the NVMe hard disk enclosure is connected to engine 121 through the RDMA port.
  • the backend interface 126 communicates with the hard disk enclosure 130 .
  • the back-end interface 126 exists in the engine 121 in the form of an adapter card. Two or more back-end interfaces 126 can be used on one engine 121 to connect multiple hard disk enclosures 130 at the same time.
  • the adapter card can also be integrated on the motherboard, in which case the adapter card can communicate with the processor 112 through the PCIE bus.
  • the storage system may include two or more engines 121, and redundancy or load balancing is performed between the multiple engines 121.
  • the storage system 120 shown in Figure 1 is a storage system with separate disk and control.
  • the centralized storage system can also be a storage system with integrated disk and control.
  • the engine 121 can have a hard disk slot.
  • the hard disk 134 can be directly deployed in the engine 121.
  • the back-end interface 126 is an optional configuration. When the storage space of the system is insufficient, the hard disk 134 can be configured through the back-end interface 126. Connect more hard disks or hard disk enclosures.
  • a distributed storage system refers to a system that stores data distributedly on multiple independent storage nodes.
  • the distributed storage system adopts a scalable system structure and uses multiple storage nodes to share the storage load. It not only improves the reliability, availability and access efficiency of the system, but is also easy to expand.
  • the storage system includes a computing node cluster and a storage node cluster.
  • the computing node cluster includes one or more computing nodes 110 (three computing nodes 110 are shown in FIG. 2 , but are not limited to three computing nodes 110 ), and each computing node 110 can communicate with each other.
  • the computing node 110 is a computing device, such as a server, a desktop computer, or a controller of a storage array.
  • the computing node 110 at least includes a processor 112 , a memory 113 and a network card 114 .
  • the processor 112 is a central processing unit (CPU), used for processing data access requests from outside the computing node 110, or requests generated internally by the computing node 110. For example, when the processor 112 receives write requests sent by the user, the data in these write requests will be temporarily stored in the memory 113 . When the total amount of data in the memory 113 reaches a certain threshold, the processor 112 sends the data stored in the memory 113 to the storage node 100 for persistent storage. In addition, the processor 112 is also used to perform calculations or processing on data, such as metadata management, deduplication, data compression, virtualized storage space, address translation, etc. Only one processor 112 is shown in FIG. 2 . In actual applications, there are often multiple processors 112 , and one processor 112 has one or more processor cores. This embodiment does not limit the number of processors and the number of processor cores.
  • CPU central processing unit
  • Memory 113 refers to the internal memory that directly exchanges data with the processor. It can read and write data at any time and very quickly, and serves as a temporary data storage for the operating system or other running programs.
  • Memory includes at least two types of memory.
  • memory can be either random access memory (RAM) or read-only memory (ROM).
  • RAM random access memory
  • ROM read-only memory
  • the computing node 110 may be configured with multiple memories 113 and different types of memories 113 . This embodiment does not limit the number and type of memories 113 .
  • the memory 113 can be configured to have a power-saving function. The power-saving function means that the data stored in the memory 113 will not be lost when the system is powered off and then on again. Memory with a power-saving function is called non-volatile memory.
  • the network card 114 is used to communicate with the storage node 100. For example, when the total amount of data in the memory 113 reaches a certain threshold, the computing node 110 may send a request to the storage node 100 through the network card 114 to persistently store the data.
  • the computing node 110 may also include a bus for communication between components within the computing node 110 .
  • remote storage can be used to implement persistent storage when storing data, so it has less local storage than a conventional server, thus achieving cost and Space saving. However, this does not mean that the computing node 110 cannot have local storage.
  • the computing node 110 may also have a small number of built-in hard disks or a small number of external hard disks.
  • Any computing node 110 can access any storage node 100 in the storage node cluster through the network.
  • Storage node cluster package It includes multiple storage nodes 100 (three storage nodes 100 are shown in Figure 1, but are not limited to three storage nodes 100).
  • a storage node 100 includes one or more controllers 101, network cards 104, and multiple hard disks 105.
  • Network card 104 is used to communicate with computing node 110.
  • the hard disk 105 is used to store data, and can be a magnetic disk or other types of storage media, such as a solid-state hard disk or a shingled magnetic recording hard disk.
  • the controller 101 is configured to write data to the hard disk 105 or read data from the hard disk 105 according to the read/write data request sent by the computing node 110 . During the process of reading and writing data, the controller 101 needs to convert the address carried in the read/write data request into an address that the hard disk can recognize. It can be seen that the controller 101 also has some simple calculation functions.
  • a smart network card refers to a network card that integrates computing resources, such as a network card with a data processing unit (DPU).
  • the DPU has the generality and programmability of a CPU, but is more specialized and can run efficiently on network packets, storage requests, or analysis requests.
  • the storage system shown in Figure 2 is a distributed storage system that separates storage and computing.
  • the storage system can also be a distributed storage system that integrates storage and computing.
  • a storage-integrated distributed storage system includes a storage cluster (also called a storage node cluster).
  • the storage node cluster can include one or more servers, and the servers can communicate with each other.
  • a server is a device that has both computing and storage capabilities.
  • a server at least includes a processor, memory, network card and hard disk.
  • the processor is used to handle data access requests from outside the server (application server or other servers), as well as requests generated within the server. For example, when the processor receives write requests, the data in these write requests will be temporarily stored in the memory.
  • the processor sends the data stored in the memory to the hard disk for persistent storage.
  • the processor is also used to perform calculations or processing on data, such as metadata management, data deduplication, data compression, data verification, virtualized storage space, and address translation.
  • the storage system may also be a distributed storage system with a fully integrated architecture or a distributed storage system with a memory fabric architecture. I won’t go into details here.
  • metadata mapping can be a direct mapping from a logical block address (Logical Block Address, LBA) to a physical block address (Physical Block Address, PBA).
  • LBA Logical Block Address
  • PBA Physical Block Address
  • the logical block address is also called a logical address
  • the physical block address is also called a physical address.
  • the logical address is the address of the logical space presented by the storage medium to the host. When the host sends a write request or read request to the storage medium, it will carry the logical address in the write request or read request.
  • the storage medium When the storage medium receives a write request or a read request, it will obtain the logical address carried by the write request or read request, perform one or more address translations on the logical address to determine the physical address, and write data to the physical address or read from the physical address. data.
  • LBA the address of data
  • the three-dimensional addressing method based on the magnetic head, cylinder and sector of the physical address is transformed into one-dimensional linear addressing, which can improve the addressing efficiency.
  • the mapping from logical address to physical address is a single-hop mapping.
  • the deduplication function is introduced, since the logical address and data content do not match, it is difficult to find the data if routed according to the logical address. Therefore, it is necessary to add a hop-based route based on fingerprints. Therefore, , the single-hop mapping can be changed to a two-level mapping of logical address to fingerprint, and then from fingerprint to physical address.
  • the processor (for example, the processor 123 in Figure 1 or the processor 112 in Figure 2) can load the computer readable instructions corresponding to the data deduplication method into the memory (for example, the memory 124 in Figure 1 or the memory 113 in Figure 2), and then the processor executes the above computer-readable instructions to perform the data deduplication method, thereby saving storage space and improving storage performance.
  • the storage system performs post-deduplication on the data generated by the application as an example.
  • the storage system (such as a processor in the storage system) can divide the data into blocks.
  • the data is divided into multiple blocks with a granularity of 4 kilobytes (KB).
  • the deduplication module in the storage system can calculate the fingerprint for each 4KB data block, write the fingerprint to the log file, and then download the data block to the disk (write to a storage device such as a hard disk).
  • the user can manually trigger the deduplication operation, or set the deduplication cycle.
  • the application can issue a deduplication command in response to the user's deduplication operation, or periodically issue a deduplication command to the deduplication module, and the deduplication module responds.
  • the deduplication command the fingerprints in the log file are sorted, the sorted fingerprints are merged with the fingerprints in the fingerprint file, and data blocks with duplicate content are deleted based on the merge results.
  • the metadata such as the fingerprint of the data block can be appended to a log file (Write-ahead Log, WAL).
  • WAL Write-ahead Log
  • the deduplication module sorts the fingerprints in the log file and merges the sorted fingerprints with the fingerprints in the fingerprint file. Based on the merge result, one data block among multiple data blocks with the same fingerprint can be retained. , delete data blocks with duplicate content, and update the fingerprint file. It should be noted that in Figure 3 Rectangular blocks with different patterns represent different fingerprints.
  • the workload of many applications is usually non-uniform, that is, the data written by the application to storage devices such as hard disks for persistent storage may have different update frequencies.
  • the proportion of frequently updated data is usually higher than the proportion of infrequently updated data.
  • Frequently updated data is usually difficult to be deduplicated, that is, the proportion of data that is difficult to be deduplicated is high. In this case, resource congestion will occur, and the proportion of space allocated to infrequently updated data will be lower. This leads to easy elimination, resulting in a loss of deduplication rate.
  • the data deduplication method can actively partition the metadata management structure. After writing the data blocks in the write request to the storage device (such as a hard disk), the fingerprints and address information of the data blocks and other metadata The data is written to the partition corresponding to the characteristics of the data block among the multiple partitions of the metadata management structure, and when the same fingerprint as the fingerprint of the data block exists in the partition, the metadata of the data block in the partition is deleted, and deleting the data block from the storage device according to the address information of the data block.
  • the storage device such as a hard disk
  • the metadata of data blocks with different characteristics can be written to different partitions of the metadata management structure.
  • the metadata of frequently updated data blocks can be written to smaller-capacity partitions, and the metadata of infrequently updated data blocks can be written to into a partition with a larger capacity, thus preventing infrequently updated data from being occupied by frequently updated data and being eliminated, thereby improving the deduplication rate.
  • this application introduces a new metadata management structure, that is, an inverse mapping table (Inverse Mapping Table) used to store the mapping relationship between the fingerprint of the data block and the logical address.
  • an inverse mapping table Inverse Mapping Table
  • the embodiment of the present application manages metadata such as fingerprints and logical addresses through an inverse mapping table.
  • the partition of the inverse mapping table is the same as the currently written data block
  • deduplication is triggered. There is no need to wait for the user to manually trigger or trigger periodically.
  • Duplicate fingerprints and other metadata in the memory can be deleted in a timely manner, reducing metadata memory overhead and ensuring system security. performance.
  • the logical address and physical address of the currently written data block can be written into the address mapping table, and the physical address of the deduplicated data block can be modified into a fingerprint.
  • the deduplicated data blocks can be addressed through two-level mapping, and the non-duplicated data blocks can be addressed through single-level mapping, which shortens the response time of the non-duplicated data blocks.
  • the fingerprints and logical addresses in the deduplicated reverse mapping table can also be written into the fingerprint table, so that the fingerprint table can store the fingerprints and logical addresses of the deduplicated data blocks.
  • the storage system can also be addressed through fingerprint tables and forward mapping tables. Based on this, each partition of the inverse mapping table can also eliminate the metadata in the partition when the elimination conditions are met, thereby reducing the size of the metadata and reducing memory overhead.
  • the storage system 120 includes an engine 121 and a hard disk enclosure 130.
  • the engine 121 includes a controller 0 and a controller 1 that serve as backups for each other.
  • Figure 5 starts with the controller. Taking an example from the perspective of 0, the hard disk enclosure 130 includes multiple hard disks 134.
  • the method includes the following steps:
  • S502 The controller 0 receives the write request from the application server 100.
  • a write request is a request for writing data.
  • the write request includes data, and the write request is used to write the data to the hard disk 134 for persistent storage.
  • the write request may be generated by an application deployed on the application server 100 based on business requirements.
  • a video application can be deployed on the application server 100.
  • the video application can be a short video application or a long video application.
  • the video application can generate a write request, where the write request includes a video stream uploaded by the user.
  • a file management application may be deployed on the application server 100, and the file management application may be a file manager.
  • the file manager may generate a write request, and the write request includes the image to be archived.
  • the controller 0 includes a processor 123 and a front-end interface 125 .
  • the processor 123 of the controller 0 can receive the write request forwarded by the application server 100 through the switch 110 through the front-end interface 125 .
  • Controller 0 divides the data in the write request into blocks and obtains at least one data block.
  • the controller 0 may use fixed-length blocking or variable-length blocking to block the data in the write request, thereby obtaining at least one data block.
  • fixed-length chunking refers to chunking the data stream according to the set chunking granularity.
  • Variable-length chunking divides the data stream into data blocks of variable size.
  • Variable-length chunking can include variable-length chunking based on sliding windows and variable-length chunking based on content (content-defined chunking, CDC).
  • controller 0 can evenly divide the data into one or more data blocks.
  • controller 0 can fill the data, for example, filling in zeros at the end of the data so that the filled data is an integer multiple of the block granularity, and then controller 0 presses
  • the data is evenly divided into one or more data blocks according to the block granularity.
  • controller 0 can fill in zeros at the end of the data so that the size of the filled data is 20KB. Then controller 0 blocks it according to the block granularity of 4KB, and can obtain 5 sizes. For 4KB data blocks.
  • the controller 0 can select an appropriate blocking strategy for blocking according to the storage scenario. For example, in the main storage scenario, IO is usually small, and the IO mode is mainly random reading and writing. Controller 0 can choose to use fixed-length blocks; in the backup storage scenario, IO is usually large, and the IO mode is sequential reading and writing. Mainly, controller 0 can choose variable length chunking to obtain a better deduplication rate.
  • dividing data into blocks is an optional step of the data deduplication method in the embodiment of the present application, and the above steps may not be performed when performing the data deduplication method of the embodiment of the present application.
  • the data when the size of the data is equal to or smaller than the block granularity, the data can be directly treated as a data block.
  • the data when the size of the data is fixed, the data can also be directly used as a data block.
  • Controller 0 determines the fingerprint of at least one data block.
  • the controller 0 can calculate through the message digest algorithm according to the content of the data block to obtain the fingerprint of the data block.
  • the fingerprint of a data block may be the message digest of the data block, such as the hash value of the data block.
  • Controller 0 queries the fingerprint table based on the fingerprint of the first data block in at least one data block.
  • S509 is executed.
  • S510, S511, and S512 are executed.
  • Controller 0 returns a write response to application server 100.
  • the fingerprint table is used to record fingerprints and address information of data blocks stored in the hard disk 134 .
  • the address information may include a logical address. Further, the address information may also include a physical address.
  • the fingerprint table can store fingerprint and address information in key value (kv) mode. Specifically, the fingerprint table can be stored with fingerprints as keys and address information such as logical addresses as values.
  • the fingerprint and address information recorded in the fingerprint table can come from the inverse mapping table.
  • the inverse mapping table is a metadata management structure that stores the fingerprints and logical addresses of data blocks that have been downloaded to disk. Specifically, after the inverse mapping table triggers deduplication, the controller 0 can synchronize the deduplicated inverse mapping table to the fingerprint table, specifically metadata (such as fingerprints and logical addresses) in the deduplicated inverse mapping table. Store to fingerprint table.
  • the fingerprint table stores fingerprints of data blocks that have been downloaded to the disk, so it can support forward deduplication, thereby reducing the storage pressure on the hard disk 134.
  • controller 0 can query the fingerprint table according to the fingerprint of the first data block. For example, the controller 0 can compare the fingerprint of the first data block with the fingerprints in the fingerprint table, or the controller 0 can quickly search for fingerprints based on the fingerprint of the first data block and the index of the fingerprint table.
  • the controller 0 can execute S509 to directly return a write response, which is used to indicate a successful write.
  • the fingerprint of the first data block does not exist in the fingerprint table, it indicates that the disk 134 does not store a data block with the same content.
  • the controller 0 may execute S510 to write the first data block to the disk 134 . Further, the controller 0 may also return a write response to the application server 100 after the first data block is successfully written to the hard disk 134 .
  • the fingerprint table can be empty.
  • the application server 100 continues to store data to the hard disk 134, the metadata of the data blocks that have been downloaded can be recorded in the reverse mapping table.
  • the partition in the reverse mapping table triggers deletion
  • the metadata in the reverse mapping table after deletion Can be synchronized to fingerprint table.
  • controller 0 can query the fingerprint table to achieve pre-deduplication. Compared with post-deduplication, pre-deduplication removes duplicate data before the data is written to disk, eliminating the need to write duplicate data to storage media such as the hard disk 134, thus avoiding the occupation of resources.
  • the above-mentioned S508 and S509 may not be executed when performing the method of the embodiment of the present application.
  • the fingerprint table is empty.
  • Controller 0 can directly download the data blocks to the disk for post-deduplication, and record the metadata in the fingerprint table to the preset number, supporting pre-deduplication.
  • controller 0 may not perform pre-deduplication on the data blocks, but directly remove the data blocks to disk for post-deduplication.
  • Controller 0 writes the logical address and physical address of the first data block into the address mapping table.
  • the address mapping table is used to store address information of data blocks written to the hard disk 134 .
  • an address mapping table can store the logical and physical addresses of data blocks.
  • the address mapping table can store logical addresses and physical addresses in kv mode.
  • the address mapping table can use the logical address as the key and the physical address as the value to store the mapping relationship from the logical address to the physical address. On the one hand, it can facilitate quick addressing when subsequently accessing the data block. On the other hand, it can record the operation to facilitate subsequent traceability or fault recovery.
  • the address mapping table uses the logical address as the key, so it can also be called the forward mapping table.
  • controller 0 can store the logical address and fingerprint of the first data block in the forward mapping table.
  • the fingerprint of the first data block can be searched according to the forward mapping table, and then the fingerprint table can be searched to obtain a data block with the same fingerprint as the first data block, and a data block with the same fingerprint can be obtained.
  • the physical address of the data block with the same fingerprint can be obtained through the forward mapping table. Based on the physical address of the data block with the same fingerprint, the data block with the same fingerprint can be accessed. , thus enabling access to the first data block.
  • the controller 0 may also first write the logical address and the physical address in the forward mapping table, and then write the first data block to a storage device such as the hard disk 134.
  • Controller 0 inserts the fingerprint and logical address of the first data block into the first partition of the inverse mapping table.
  • S514 and S516 are executed.
  • the inverse mapping table is used to store fingerprints and logical addresses.
  • the inverse mapping table may be a table structure organized in the form of key-value pairs.
  • the key-value pairs in the inverse mapping table are used to represent the inverse mapping relationship. Different from the mapping form from logical address to physical address in the address mapping table, the inverse mapping table is used to store the inverse mapping relationship from fingerprint to logical address.
  • the key in the key-value pair is the fingerprint
  • the value is the logical address.
  • Controller 0 may insert the fingerprint and logical address of the first data block into the inverse mapping table in an orderly manner.
  • controller 0 can perform N-Way Merge sorting on the fingerprint of the first data block and the fingerprint of the first partition in the inverse mapping table, and then according to the sorting result, the fingerprint of the first data block and the logical address Insert the first partition of the inverse mapping table.
  • Multi-way merge sorting means that the objects to be sorted (such as the fingerprints of data blocks) are divided into multiple paths and sorted separately, and then the sorting results of each path are merged to achieve merge sorting.
  • controller 0 can remove the minimum value from set Si after writing the minimum value into the large set, and then continue to calculate according to the above formula (1) , determine the minimum value of all updated small sets, and write the minimum value to the large set. Controller 0 repeats the above process until all elements in the small set are written into the large set, completing the merge sort.
  • controller 0 can use a log structured merge tree (LSM tree) to sort the fingerprints of the first data block and the fingerprints in the inverse mapping table, and then sort the fingerprints of the first data block Fingerprints and logical addresses are inserted into the first partition of the inverse mapping table in order.
  • LSM tree log structured merge tree
  • each partition of the inverse mapping table can maintain an LSM tree, so that metadata such as fingerprints and logical addresses to be written to the partition can be inserted in an orderly manner.
  • the inverse mapping table includes multiple partitions, and the first partition can be determined according to the characteristics of the first data block.
  • the characteristic of the first data block may be a fingerprint corresponding to the first data block.
  • multiple data blocks can correspond to the same fingerprint.
  • the controller 0 may determine the heat of the fingerprint based on the fingerprint corresponding to the first data block, and then determine the first partition among the multiple partitions of the inverse mapping table corresponding to the heat based on the heat of the fingerprint corresponding to the first data block. For example, the controller 0 can compare the heat of the fingerprint with the preset heat.
  • the first partition is the partition used to store metadata of cold data in the inverse mapping table, also known as As a cold partition
  • the cold partition can be a partition with a larger capacity
  • the hot partition can be a partition with a smaller capacity. The capacity of the cold partition is greater than the capacity of the hot partition.
  • the controller 0 can also update the popularity of the fingerprint corresponding to the first data block to determine that subsequent writes have the same The partition corresponding to the fingerprint data block. Specifically, the controller 0 can determine the popularity of the logical address of the first data block, accumulate the popularity of the logical address to the popularity of the fingerprint corresponding to the first data block, thereby updating the popularity of the fingerprint corresponding to the first data block.
  • controller 0 can update the popularity of the fingerprint corresponding to the data block each time after writing the metadata of the data block.
  • the first data block written by controller 0 to hard disk 134 is data block 10
  • the fingerprint of data block 10 is recorded as FP3, because a data block with the same fingerprint has been written before, and the latest data written with the fingerprint is FP3
  • the block is data block 8, and controller 0 can obtain the popularity of FP3 updated after writing the metadata of data block 8.
  • the popularity of FP3 is 5.
  • Controller 0 may determine the first partition among multiple partitions of the inverse mapping table based on the popularity. For example, the first partition may be used to store cold data.
  • a partition of metadata also called a cold partition.
  • the controller 0 can also determine the heat of the logical address of the data block 10. Assuming that the heat of the logical address is 2, the heat of the logical address can be accumulated to the heat of the fingerprint, thereby updating the heat of the fingerprint. In this example, the hotness of the updated fingerprint can be 7.
  • the controller 0 can also move the metadata of the data block to different partitions based on the popularity of the fingerprint corresponding to the data block. For example, in the initial stage, the fingerprints corresponding to each data block are usually less popular, and the metadata of these data blocks can be written to the cold partition. As data blocks continue to be written, the heat of some fingerprints continues to increase. After writing a certain data block, when the heat of the fingerprint corresponding to the data block is greater than the preset heat, the controller 0 can change the element of the data block. Data is written to the hot partition, and metadata for data blocks with the same fingerprint is moved to the hot partition.
  • controller 0 may not move the metadata, but eliminate the metadata. For example, after writing a certain data block, when the heat of the fingerprint corresponding to the data block is greater than the preset heat, controller 0 can write the metadata of the data block to the hot partition, and write the data block with the same fingerprint to the hot partition. The metadata is eliminated from the cold partition.
  • the controller may trigger deduplication when the same fingerprint as the fingerprint of the first data block exists in the first partition, or the same fingerprint as the fingerprint of the first data block exists in the first partition, and the first When the number of fingerprints in the partition that are the same as the fingerprint of the first data block reaches the preset threshold, deduplication is triggered and S514 and S516 are executed.
  • the preset threshold can be set based on experience values.
  • the preset threshold can be set to 2, as illustrated in Figure 4.
  • the preset threshold can also be set to 1, that is, if there is a fingerprint that is the same as the fingerprint of the first data block, deduplication can be triggered.
  • the controller 0 can execute S514 and S516 to implement deduplication, but the embodiment of the present application does not limit the timing of deduplication.
  • the controller 0 can trigger deduplication when the number of identical fingerprints in the first partition reaches a preset threshold, or can trigger deduplication when identical fingerprints exist.
  • the controller 0 directly writes the first data block to the hard disk 134 and then performs post-deduplication. This can prevent the consumption of computing resources from affecting normal business operations, and avoid storage performance degradation when computing bottlenecks occur.
  • the inverse mapping table is a new type of metadata management structure introduced in this embodiment.
  • the metadata management structure may also adopt other organizational forms.
  • S514 The controller 0 deletes the first data block from the hard disk 134 according to the address information of the first data block.
  • controller 0 may retain one data block and delete other data blocks with the same fingerprint from hard disk 134 .
  • the controller 0 can obtain the physical address of the first data block from the address mapping table according to the logical address of the first data block, and then find the first data block from a storage device such as the hard disk 134 based on the physical address. and delete the first data block.
  • the controller 0 can also retain a data block with the same fingerprint as the first data block and delete other data blocks with the same fingerprint. Specifically, for data blocks with the same fingerprint, controller 0 can retain the data block written first and delete the data block written later.
  • Controller 0 deletes the fingerprint and logical address of the first data block from the first partition of the inverse mapping table.
  • controller 0 can search for the fingerprint of the first data block from the inverse mapping table, and then delete the fingerprint and logical address of the first data block. It should be noted that when the key-value pairs of fingerprints and logical addresses in the inverse mapping table are stored in the LSM tree, controller 0 can use table merging to delete the fingerprint of the first data block and the corresponding logical address.
  • the controller 0 can also retain the metadata of a data block with the same fingerprint as the first data block and delete the metadata of other data blocks with the same fingerprint. Specifically, for data blocks with the same fingerprint, controller 0 can retain the metadata of the data block written first and delete the metadata of the data block written later.
  • controller 0 when controller 0 executes the above-mentioned S514 and S516, it can be executed in parallel, or it can be executed one after another in the set order. For example, controller 0 may also execute S516 first and then execute S514. This embodiment does not limit the execution order of S514 and S516.
  • At least one data block may also include a second data block.
  • the heat of the fingerprint corresponding to the second data block may be higher than the heat of the fingerprint corresponding to the first data block.
  • the second data block may be hot data and the first data block may be cold data.
  • the controller 0 may write the second data block to the hard disk 134, and write the metadata of the second data block, such as the fingerprint and logical address of the second data block, into the second of the plurality of partitions of the inverse mapping table.
  • the second partition is specifically a hot partition used to store metadata of hot data.
  • the capacity of the second partition is smaller than the capacity of the first partition.
  • the heat of the fingerprint corresponding to the data block can also be divided into more types or levels.
  • the heat of the fingerprint can also be divided into hot, There are three levels: warm and cold.
  • the inverse mapping table may include more partitions.
  • the inverse mapping table may include a partition for storing metadata of hot data, a partition for storing metadata of warm data, and a partition for storing metadata of cold data. Partition.
  • the above S516 is an implementation method of deleting the metadata of the first data block in the first partition in the embodiment of the present application.
  • the controller 0 can delete the metadata.
  • the fingerprint, logical address, and physical address of the first data block in the first partition of the management structure can be deleted.
  • Controller 0 writes the fingerprints and logical addresses in the deduplicated reverse mapping table into the fingerprint table.
  • the controller 0 can also write metadata such as fingerprints and logical addresses in the deduplicated reverse mapping table to the fingerprint table in a synchronous manner.
  • controller 0 can synchronize the metadata in the deduplicated inverse mapping table into the fingerprint table at the partition granularity. For example, after the first partition triggers deduplication, controller 0 can synchronize the deduplication metadata of the first partition into the fingerprint table. After the second partition triggers deduplication, controller 0 can then write the deduplication metadata of the second partition to the fingerprint table. Metadata is written to the fingerprint table synchronously. Considering that a partition can trigger multiple deduplications, in order to reduce resource usage, controller 0 can use an incremental synchronization mechanism to synchronously write deduplicated metadata into the fingerprint table.
  • controller 0 eliminates the metadata in at least one partition.
  • the partition in the inverse mapping table can set a water level for elimination.
  • the controller 0 can eliminate the metadata in the partition, thereby avoiding the metadata in the partition. Data overflow.
  • the water levels of partitions can be different. For example, when the capacity of the first partition is 80% of the total capacity of the inverse mapping table and the capacity of the second partition is 20% of the total capacity of the inverse mapping table, the water level of the first partition can be 70% of the total capacity of the inverse mapping table. %, the water level of the first partition can be 10% of the total capacity of the inverse mapping table.
  • the water level of the partition in the inverse mapping table may include a high water level and a low water level.
  • the metadata in the partition can be eliminated so that the amount of resources occupied by the metadata in the eliminated partition is not lower than the above low water level and not higher than the high water level. water level.
  • Controller 0 eliminates the metadata in the partitions of the inverse mapping table, especially the metadata in the partitions corresponding to frequently updated hot data, which can significantly reduce the scale of metadata and reduce memory overhead.
  • Controller 0 modifies the physical address of the deduplicated first data block in the address mapping table to the fingerprint of the first data block.
  • the controller 0 can change the physical address of the duplicated first data block in the address mapping table. Modify it to the fingerprint of the first data block, which indicates that the first data block is a deduplicated data block. You can find the data block with the same fingerprint as the data block through the fingerprint table, and then determine the physical location of the data block with the same fingerprint. address, so that the addressing of the first data block that is deduplicated can be achieved.
  • data blocks that have not been deduplicated can still be addressed based on the mapping relationship between logical addresses and physical addresses in the forward mapping table, without the need for two-level metadata mapping, which greatly shortens the addressing time and further The response time is shortened and the response efficiency is improved.
  • the controller 0 may not write the data in the inverse mapping table to the fingerprint table, and perform the inverse mapping.
  • the metadata in the table undergoes elimination steps.
  • the data deduplication method provided by this application actively partitions the metadata management structure to differentially process the metadata of data with different update frequencies.
  • data that is updated infrequently can be allocated enough quota resources to support deduplication, while data that is frequently updated is frequently invalidated, so relatively few quota resources are allocated; in this way, the data that can ultimately be deduplicated is basically all It can be deduplicated instead of being squeezed out, thereby improving the deduplication rate; at the same time, data that is difficult to deduplicate can be eliminated, thereby reducing the overall metadata mapping scale and improving system performance.
  • this method also introduces a new metadata management structure of the inverse mapping table.
  • deletion can be triggered without the need for the user to manually delete, or Deduplication is triggered periodically, which enables timely deduplication and reduces the size of metadata, thereby reducing metadata memory overhead and ensuring system performance.
  • this method records the logical address and physical address of the data block on the disk in the reverse mapping table, and modifies the physical address of the deduplicated data block into a fingerprint, so that the non-duplicated data block can still pass through Single-hop mapping addressing shortens the addressing time and improves response efficiency.
  • the key to the embodiment shown in Figure 5 is to partition metadata management structures such as inverse mapping tables.
  • the capacity of each partition in the metadata management structure can be determined according to the partition decision model.
  • the partition decision model is used to predict the corresponding partition revenue after each partition capacity combination in the preset partition capacity combination is applied to the metadata management structure, and determine the partition capacity combination with the largest partition revenue as multiple partitions of the metadata management structure. capacity.
  • the partition capacity combination represents the capacity of each partition in a group of partitions.
  • the capacity of a partition refers to the amount of resources allocated to the partition.
  • the sum of the capacities of the individual partitions in a set of partitions is equal to the total capacity of the metadata management structure.
  • the partition capacity combination can be characterized by the actual capacity of each partition, or by the capacity ratio of each partition.
  • a partition capacity combination can be expressed as 80%:20%, which is used to characterize that the metadata management structure includes two partitions with capacities of 80% and 20% of the total capacity respectively.
  • a partition capacity combination can be expressed as 60%:30%:10%, which is used to represent that the metadata management structure includes three partitions, with capacities respectively being 60%, 30% and 10% of the total capacity.
  • Partitioning benefits refer to the benefits obtained after partitioning the metadata management structure. For example, after partitioning the metadata management structure, the deduplication rate can be improved. Therefore, the partitioning benefit can be the deduplication rate.
  • the partition decision model can predict the deduplication rate by estimating the hit rate of the data.
  • the first partition capacity combination among the preset partition capacity combinations is used as an example for description.
  • the partition decision model can predict the corresponding deduplication rate after the first partition capacity combination is applied to the metadata management structure in the following way:
  • the data distribution corresponding to the partition and the capacity of each partition are used to obtain the deduplication rate.
  • a workload can be a task that is using or waiting to use computing resources such as CPU for a period of time.
  • a workload can be a task that is using or waiting to use computing resources such as CPU for a period of time.
  • controller 0 can adjust the partition.
  • partition adjustment requires re-initialization of partitions and other operations, resulting in partition adjustment costs.
  • the zoning benefit can be determined based on at least one of the yield rate and the zoning adjustment cost.
  • controller 0 may periodically adjust the capacity of multiple partitions of the metadata management structure. Each cycle may also be called a partition adjustment cycle. When the adjustment time is reached, controller 0 can determine whether to adjust the capacity of multiple partitions based on the partition revenue corresponding to the period before the adjustment time (for example, the previous partition adjustment period), the partition capacity combination, or the workload characteristics corresponding to each partition. .
  • the workload characteristics refer to features extracted from the workload information. For example, the workload characteristics may include one or more of reuse distance, reuse cycle, and reuse frequency.
  • controller 0 can also perform the following steps after writing the first data block to the hard disk 134:
  • Controller 0 obtains the system resource usage information of the previous partition adjustment cycle and the workload information corresponding to each partition.
  • System resources include one or more of computing resources (such as CPU and other processor resources), memory resources, disk resources, and network resources. Based on this, system resource usage information includes CPU usage ratio, memory usage ratio, disk IO amount, and bandwidth usage.
  • Workload refers to tasks that are using or waiting to use computing resources such as CPU for a period of time.
  • the task can be to write data.
  • the workload information may include one or more of the data's reuse distance (Reuse Distance), reuse cycle, or reuse frequency.
  • the reuse distance can be the number of accesses between two adjacent accesses to the same data. When counting reuse distance, statistics can be made at the granularity of data blocks.
  • the reuse cycle can be the number of write requests between two adjacent accesses to the same data in different write requests.
  • the reuse frequency can be the reciprocal of the reuse period. It should be noted that when controller 0 obtains workload information such as reuse distance, reuse cycle or reuse frequency, it can extract the reuse distance, reuse cycle or reuse frequency for each partition, thereby obtaining the workload information corresponding to each partition.
  • Controller 0 extracts system resource characteristics from system resource usage information, and extracts workload characteristics corresponding to each partition from workload information corresponding to each partition.
  • the controller 0 can vectorize the system resource usage information to obtain the system resource characteristics, and vectorize the workload information to obtain the workload characteristics.
  • the vectorization of CPU occupancy ratio and reuse distance is illustrated below as an example.
  • controller 0 can compare the CPU occupancy ratio with the CPU occupancy threshold.
  • the CPU usage threshold can be set based on historical business experience. For example, the CPU usage threshold can be set to 70%.
  • the CPU usage ratio is For example, if it is higher than the CPU usage threshold, "1" can be output. When the CPU usage ratio is not higher than the CPU usage threshold, "0" can be output.
  • the CPU usage ratio is not higher than the CPU usage threshold, "0" can be output.
  • F represents the feature, and F can be represented by a vector.
  • controller 0 can compare system resource usage information such as memory usage ratio, disk IO amount, bandwidth usage, etc. with the threshold corresponding to the resource, and output according to the comparison result.
  • system resource usage information such as memory usage ratio, disk IO amount, bandwidth usage, etc.
  • the system resource characteristics corresponding to the above system resource usage information are, for example, F (memory usage), F (disk IO usage), and F (bandwidth usage).
  • controller 0 can use the same statistical method used to process reuse distance to collect statistics on reuse cycles and reuse cycles to fully explore the time correlation and spatial correlation, thereby extracting workload features from workload information. .
  • controller 0 may also extract system resource characteristics without obtaining system resource usage information.
  • Controller 0 obtains structured features based on system resource features and workload features.
  • Structural characteristics include system resource characteristics and workload characteristics.
  • the system resource characteristics are extracted from system resource usage information, and the workload characteristics are extracted from workload information.
  • Controller 0 can fuse system resource characteristics and workload characteristics to obtain structured characteristics. For example, controller 0 can splice system resource features and work responsibility features to achieve fusion and obtain structured features.
  • controller 0 can also merge system resource characteristics. From a business perspective, at certain times, a certain system resource may be occupied too high, which will cause a system performance bottleneck. At this time, the same type of system resource characteristics (also called correlation characteristics, common impact characteristics) are no longer necessary. For such features, feature merging can be performed. As shown in Figure 8, controller 0 can use the "OR" operation to achieve feature merging. For example, F (CPU usage), F (memory usage), F (disk IO volume), and F (bandwidth usage) are common impact features, and controller 0 can merge the above common impact features.
  • F CPU usage
  • F memory usage
  • F disk IO volume
  • F bandwidth usage
  • controller 0 obtains workload characteristics and system resource characteristics with common impacts through multi-source information processing. After merging the common influence features, the controller 0 can standardize and normalize the common influence features. Similarly, the controller 0 can standardize and normalize the workload features after extracting the workload features.
  • the system resource characteristics after standardization and normalization can be a 0 a 1 ... a k
  • the workload characteristics after standardization and normalization can be b 0 b 1 ... b k
  • Controller 0 can splice the above system resource characteristics a 0 a 1 ... a k and workload characteristics b 0 b 1 ... b k to obtain structured features.
  • Controller 0 uses general feature processing methods such as feature merging and normalization to clean the features, which can generate a feature model that accurately depicts the current situation with low computational overhead and provide a reliable basis for partitioning decisions.
  • the above-mentioned S602 to S606 are an implementation manner for the controller 0 to obtain the structured features.
  • the controller 0 can also obtain the structured features through other methods.
  • the controller 0 may not execute the above S606.
  • Controller 0 obtains the feedback information of the previous partition adjustment period, and determines whether to trigger partition adjustment based on the feedback information of the previous partition adjustment period. If yes, execute S610; if not, execute S622.
  • Controller 0 can set the trigger conditions for partition adjustment. Controller 0 can determine whether the triggering conditions for partition adjustment are met based on the feedback information of the previous partition adjustment cycle, such as the partition revenue (such as deduplication rate) of the previous partition adjustment period and the workload characteristics corresponding to each partition. Determine whether to trigger partition adjustment.
  • the partition revenue such as deduplication rate
  • the trigger condition for partition adjustment can be set as follows: the deduplication rate of the previous partition adjustment cycle is less than the preset value or the decline rate reaches the preset range, or the workload characteristics of the previous partition adjustment cycle are relative to Changes in workload characteristics during the upper partition adjustment period satisfy the preset conditions.
  • the preset value, preset range or preset condition can be set based on historical business experience.
  • the workload characteristics of the previous partition adjustment period indicate that the workload is mainly large IO
  • the workload characteristics of the previous partition adjustment period indicate that the workload is mainly small IO. That is, the workload characteristics of the previous partition adjustment period are relatively The workload characteristics change significantly during the previous partition adjustment cycle, which can trigger partition adjustment.
  • controller 0 can directly trigger partition adjustment and then perform partition updates based on the results of partition decision modeling.
  • Controller 0 determines a target modeling strategy from the modeling strategy set according to the structured features.
  • Controller 0 selects a target evaluation strategy from the partition revenue evaluation strategy set based on structured features and feedback information.
  • Controller 0 determines the objective function of the partition decision model according to the objective evaluation strategy.
  • Controller 0 performs partition decision modeling through the target modeling strategy and the objective function according to the structural features, and obtains a partition decision model.
  • Controller 0 obtains the partition capacity combination with the largest partition benefit based on the partition decision model.
  • Controller 0 can model partition decisions based on the structural features obtained from multi-source information processing, and with the assistance of early feedback information, such as workload characteristics, deduplication rate and other prior knowledge of the previous partition adjustment cycle.
  • Complete zoning decision As shown in Figure 10, controller 0 can select a modeling strategy from a modeling strategy set based on structural features to determine the target modeling strategy, and adjust feedback based on the workload characteristics, deduplication rate, etc. of the previous partition cycle.
  • Information is selected from the partition revenue evaluation strategy set to determine the target evaluation strategy. According to the target evaluation strategy, the objective function of the partition decision model can be determined.
  • the controller 0 can use the target modeling strategy based on the structured features, The objective function is modeled, and then the partition capacity combination with the largest partition benefit is obtained based on the partition decision model obtained through modeling. Among them, the partition capacity combination with the largest partition benefit can also be called a partition decision.
  • the modeling strategy set includes: 1 modeling strategy based on point partitioning; 2 modeling strategy based on Gaussian process regression.
  • the modeling strategy based on point partitioning is oriented to general scenes, that is, simple scenes, and the modeling strategy based on Gaussian process regression is oriented to complex scenes.
  • dotted partitioning refers to providing a variety of preset partition capacity combinations and selecting the partition capacity combination with the greatest partition revenue.
  • the workload characteristics in the structured feature vector reflect that the business scenario is a simple scenario.
  • the flag bit related to the CPU usage characteristics is 1, which indicates that the CPU and other system resources are highly occupied.
  • controller 0 can choose the modeling strategy based on point partitioning for modeling.
  • Controller 0 can use the mean ⁇ of the reuse distance in the structured features obtained by multi-source information processing , the mean reuse distance and the variance ⁇ of the reuse distance to fit the probability density function of the reuse distance. For details, see the following formula:
  • Controller 0 can set the following partition combination according to the modeling strategy based on point partitioning:
  • controller 0 can obtain the hit rate of the two types of data distribution through integration, and then obtain the hit rate obtained by this partitioning scheme through the product of the hit rate and the data proportion.
  • F1 and F2 respectively represent the data distribution corresponding to the two partitions. Specifically, they can be represented by the probability density function of the reuse distance of the data corresponding to the two partitions.
  • P is the capacity ratio of one partition.
  • controller 0 can select the strategy that "maximizes the deduplication rate" from the partition revenue evaluation strategy set based on the structural characteristics. Evaluation strategies as target evaluation strategies. That is, controller 0 can directly use the function of the above formula (6) as the objective function to perform partition decision modeling. Specifically, controller 0 substitutes the parameters of different partition combinations into the above objective function to obtain the deduplication rates of different partition combinations, from which controller 0 selects the partition capacity combination with the largest deduplication rate.
  • controller 0 can also reconstruct the partition revenue evaluation strategy in combination with other constraints. For example, during the implementation of a specific project, the change in partition size between the two previous times should not be too large, otherwise it will affect the partition deployment performance and cause unnecessary overhead. Based on this, other constraints can include minimizing partition adjustment costs.
  • the evaluation strategy of reconstructed partition benefits may be based on the deduplication rate and partition adjustment cost.
  • the controller 0 can obtain the feedback information of the previous partition adjustment cycle.
  • the feedback information can include one or more of workload characteristics and deduplication rate.
  • the controller 0 can adjust the partition based on the feedback information.
  • Benefit evaluation strategies Specifically, if the deduplication rate of the previous partition adjustment cycle is less than the preset value or the decline rate reaches the preset range, the controller 0 can adjust the evaluation strategy of partition revenue to an evaluation strategy based on the deduplication rate and partition adjustment cost. Similarly, if the changes in the workload characteristics of the previous partition adjustment period relative to the workload characteristics of the previous partition adjustment period meet the preset conditions, the controller 0 can adjust the evaluation strategy of partition revenue to be based on the deduplication rate and partition adjustment cost. assessment strategies.
  • the evaluation strategy based on deduplication rate and partition adjustment cost takes into account a variety of factors and is more targeted.
  • the above-mentioned S610 to S616 are an implementation manner for the controller 0 to construct the partition decision model based on the workload characteristics of the previous partition adjustment period.
  • S618 is an implementation method for determining the partition decision (specifically, the target partition combination). It should be noted that controller 0 does not need to execute the above S612 and S614 when modeling the partition decision model. For example, controller 0 can perform partition decision modeling based on the target modeling strategy and the default objective function to obtain the partition Decision model.
  • Controller 0 partitions the inverse mapping table according to the partition capacity combination with the largest partition revenue.
  • the partition capacity combination includes the capacity proportions of different partitions, that is, the proportion of resources allocated to different partitions.
  • the partition capacity combination may include the proportion of resources allocated to the hot partition and the proportion of resources allocated to the cold partition.
  • Controller 0 can partition the system resources of the inverse mapping table based on the proportion of resources allocated to the hot partition and the proportion of resources allocated to the cold partition.
  • controller 0 can allocate 20% of the storage space in the inverse mapping table to the hot partition. , 80% of the storage space in the inverse mapping table is allocated to the cold partition.
  • Controller 0 writes the fingerprint and logical address of the first data block into the first partition.
  • the first partition is a partition determined from multiple partitions according to the characteristics of the first data block. For example, when the fingerprint corresponding to the first data block has a high degree of popularity, the first partition can be a hot partition, and the first partition corresponding to the first data block can be a hot partition. When the fingerprint has lower heat, the first partition can be a cold partition.
  • controller 0 writing the fingerprint and logical address of the first data block into the first partition, please refer to the relevant description of the embodiment shown in Figure 5, and will not be described again here.
  • Controller 0 determines whether to trigger deduplication. If yes, execute S624; if not, execute S626.
  • the controller 0 can compare the fingerprint of the first data block with the fingerprint in the first partition of the inverse mapping table, when there is a fingerprint that is the same as the fingerprint of the first data block in the first partition. , can trigger deletion. For example, when the number of fingerprints in the first partition that are the same as the fingerprints of the first data block reaches a preset threshold, S624 may be executed to perform deduplication.
  • Controller 0 performs deduplication based on LSM tree.
  • Controller 0 can deduplicate the fingerprints and logical addresses in the first partition by merging LSM trees, and deduplicate the corresponding data blocks from hard disk 134 according to the physical addresses corresponding to the logical addresses. For example, controller 0 can deduplicate the fingerprints and logical addresses in the first partition by merging The LSM tree deletes the fingerprint and logical address of the first data block in the first partition, and deletes the first data block from the hard disk 134 based on the physical address corresponding to the logical address of the first data block.
  • deduplication based on the LSM tree please refer to the description of S514 to S516 in the embodiment shown in Figure 5, which will not be described again here.
  • Controller 0 eliminates fingerprints and logical addresses in at least one partition of the inverse mapping table.
  • Corresponding elimination conditions can be set for each partition of the inverse mapping table.
  • metadata such as fingerprints and logical addresses in the partition can be eliminated.
  • the controller 0 can determine the metadata that needs to be eliminated within each partition based on the partition capacity and the popularity of the fingerprint corresponding to the data block, such as the fingerprint and logical address of the non-duplicated metadata, and the fingerprint of the partially deduplicated metadata. and logical addresses, and then eliminate them from the inverse mapping table, which can reduce the size of metadata and ensure system performance.
  • Controller 0 stores the deduplicated fingerprint and logical address in the fingerprint table.
  • Controller 0 obtains the feedback information of the current partition adjustment period, which is used by controller 0 to determine whether the next partition adjustment period triggers partition adjustment.
  • the controller 0 can obtain the above feedback information, thereby assisting in adjustment of partitioning decisions and improving partitioning accuracy.
  • the data deduplication method of the embodiment of the present application provides an active partitioning mechanism to turn implicit partitioning into active partitioning.
  • the inverse mapping table is divided into hot partitions and cold partitions.
  • the hot partition and the cold partition are Allocate system resources with corresponding quotas. Data that is updated infrequently can be allocated enough storage space to support deduplication, while data that is frequently updated is frequently invalidated, so relatively few quota resources can be allocated; this makes In the end, data that can be deduplicated can basically be deduplicated instead of being squeezed out, thereby improving the deduplication rate; at the same time, data that is difficult to deduplicate can be eliminated, thereby reducing the overall metadata mapping scale and improving system performance.
  • Figure 6 mainly illustrates the modeling example of controller 0 using a modeling strategy based on point partitioning in a simple scenario.
  • controller 0 can also adopt a modeling strategy based on Gaussian process regression for modeling. The modeling process in complex scenarios is explained below.
  • GPR Gaussian Process Regression
  • this embodiment uses a partition capacity combination S to describe the resource proportion of each partition.
  • S is an n-dimensional array.
  • the i-th element Si represents the resource proportion of the i-th partition.
  • the deduplication rate can be regarded as determined by S.
  • the relationship between the deduplication rate and the partition capacity combination can be defined as f(S). Since f(S) cannot be obtained explicitly, this embodiment adopts the modeling strategy of Gaussian process regression to characterize f(S).
  • Modeling f(S) based on Gaussian process regression modeling strategy can include the following stages:
  • Controller 0 randomly generates several partition capacity combinations and adds them to the Set. Then controller 0 applies the above partition capacity combinations to the storage system, and obtains the deduplication rate under each partition capacity combination configuration through storage system operation. Through the above-mentioned correspondence between partition capacity combination and deduplication rate, a Gaussian model G between partition capacity combination and deduplication rate is initially established to describe f(S).
  • Gaussian model G recommends a partition capacity combination, and adds the partition capacity combination to the set Set. Controller 0 applies the partition capacity combination to the storage system, and obtains the deduplication rate under this configuration by running the storage system. Controller 0 feeds back the partition capacity combination and its corresponding deduplication rate to the Gaussian model for model update, and repeats the iterative update step L times (L is the preset number of iterations).
  • Output stage Output a set of partition resource configurations corresponding to the highest deduplication rate in the Set.
  • the modeling strategy based on Gaussian process regression can be implemented in complex scenarios, and a set of partition resource configurations with the highest deduplication rate, that is, the target partition combination, can be provided based on the above Gaussian model.
  • a client running on at least one host is connected to multiple storage nodes.
  • Storage nodes such as Node1 and Node2 can start the deduplication service process to perform the data deduplication method.
  • the storage node can also launch the LUN service process to assist the deduplication service process in executing the data deduplication method.
  • the client may send a write request, and the write request includes data, and the data may be divided into at least one piece of data.
  • the storage node can first record the data location information in the forward mapping table and then perform the data disk operation. Then, the deduplication service process on a storage node such as Node1 can execute the data deduplication method to identify and deduplicate redundant data in storage media such as disks, thereby significantly increasing user available capacity and reducing user costs.
  • the storage node uses asynchronous writing to insert the fingerprints and logical addresses of the data blocks into the reverse mapping table in an orderly manner.
  • the inverse mapping table is the fingerprint metadata management structure introduced in this application.
  • storage nodes use a data structure that constructs LSM Tree separately for effective management. This structure can trigger deduplication through table merging.
  • the deduplication service process of the storage node sends the deduplicated metadata in the inverse mapping table to the fingerprint table, and updates the deduplication data in the forward mapping table (that is, the LUN address mapping table) accordingly.
  • the physical address information is a fingerprint.
  • the upper layer of the global cache has a write cache and a read cache.
  • the write cache can be used for LBA heat statistics, and there is no need to design an additional statistics module for heat statistics.
  • the client can send a write request through the network, and the write request reaches the write cache through the server adaptation layer of the global cache.
  • the deduplication service process can divide the data in the write request into blocks, and then calculate each data block. of fingerprints. Next, the deduplication service process queries the fingerprint table. If the fingerprint of the data block hits the fingerprint table, it means that the storage device stores data blocks with the same fingerprint, and the deduplication service process can directly return a write response. If the fingerprint of the data block does not match the fingerprint table, it means that the data block with the same fingerprint is not stored in the storage device.
  • the deduplication service process can write the data block to a storage device such as a hard disk and then return a write response.
  • the deduplication service process can also insert the fingerprints and logical addresses of the data blocks into the reverse mapping table in an orderly manner.
  • the deduplication service process can also obtain the popularity of the LBA from the write cache through batch acquisition, and update the popularity of the fingerprint corresponding to the data block based on the popularity of the LBA.
  • the deduplication service process can also obtain workload characteristics such as reuse distance and reuse cycle, as well as obtain system resource usage information, and obtain system resource characteristics based on the system resource usage information.
  • the deduplication service process performs general feature processing based on the above characteristics to obtain structured feature.
  • the deduplication service process performs partition decision modeling based on the above structural characteristics, and then determines the partition capacity combination with the greatest partition benefit based on the partition decision model obtained through modeling.
  • the deduplication service process partitions the system resources of the inverse mapping table based on the partition capacity combination. Specifically, the system resources of the inverse mapping table are partitioned according to the proportion of resources allocated to the hot partition and the proportion of resources allocated to the cold partition. Partition.
  • the deduplication service process builds LSM Tree data structures in different partitions to achieve effective management of metadata such as fingerprints and logical addresses. For example, the deduplication service process can trigger deduplication through table merging based on the LSM tree.
  • the deduplication service process can also determine the metadata that needs to be eliminated based on the partition capacity and the popularity of the fingerprint corresponding to the data block, and eliminate it from the LSM tree of the inverse mapping table, thereby reducing the size of the metadata.
  • the deduplication service process can also send the deduplicated metadata in the inverse mapping table to the fingerprint table after completing the deduplication, and update the address corresponding to the deduplication data in the address mapping table through the LUN service process accordingly.
  • the physical address is a fingerprint.
  • the deduplication service process can also obtain feedback information, so that based on the feedback information, in the subsequent stage, it can determine whether to trigger partition adjustment in the next partition adjustment cycle, so that more accurate partitioning can be achieved, thereby improving the deduplication rate.
  • the embodiment of the present application also provides a data deduplication device.
  • the data deduplication device according to the embodiment of the present application will be introduced with reference to the accompanying drawings.
  • the device 1400 includes:
  • Communication module 1402 configured to receive a write request, where the write request includes the first data block
  • Write data module 1404 used to write the first data block to the storage device
  • the data writing module 1404 is also configured to write the metadata of the first data block into the first partition among the multiple partitions of the metadata management structure, where the first partition is based on the first data block. If the characteristics are determined, the metadata of the first data block includes the fingerprint and address information of the first data block;
  • the deduplication module 1406 is configured to delete the metadata of the first data block in the first partition when the same fingerprint as the fingerprint of the first data block exists in the first partition, and delete the metadata of the first data block in the first partition according to the The address information of the first data block deletes the first data block from the storage device.
  • the communication module 1402 can be used to implement the description of S502 related content in the embodiment shown in FIG. 5 .
  • the write data module 1404 is used to implement the description of S510 related content in the embodiment shown in Figure 5.
  • the write data module 1404 is also used to implement the related content description of S512 in the embodiment shown in Figure 5.
  • the deduplication module 1406 is used to implement the content description related to S514 and S516 in the embodiment shown in FIG. 5 .
  • the characteristic of the first data block is the fingerprint corresponding to the first data block
  • the data writing module 1404 is specifically used to:
  • the data writing module 1404 determines the heat of the fingerprint corresponding to the first data block, determines the first partition based on the heat, and writes the metadata to the first partition.
  • the data writing module 1404 determines the heat of the fingerprint corresponding to the first data block, determines the first partition based on the heat, and writes the metadata to the first partition.
  • the write data module 1404 is also used to:
  • the popularity of the logical address is accumulated to the popularity of the fingerprint corresponding to the first data block to update the popularity of the fingerprint corresponding to the first data block.
  • the write request also includes a second data block, and the fingerprint corresponding to the second data block has a higher popularity than the fingerprint corresponding to the first data block.
  • the write data module 1404 is also used for:
  • writing the second data block and the metadata of the second data block by the writing data module 1404 please refer to the related description of writing the first data block and its metadata, which will not be described again here.
  • the capacity of multiple partitions of the metadata management structure is determined according to a partition decision model, and the partition decision model is used to predict that each of the preset partition capacity combinations should be used in the The corresponding partition revenue after the metadata management structure is determined, and the partition capacity combination with the largest partition revenue is determined as the capacity of multiple partitions of the metadata management structure.
  • the partition revenue is determined based on at least one of the deduplication rate and the partition adjustment cost. .
  • the partition revenue is the deduplication rate
  • the preset partition capacity combination includes the first partition capacity combination.
  • the device 1400 also includes a partition module 1408, which is configured to predict the corresponding deduplication rate after the first partition capacity combination is applied to the metadata management structure in the following manner:
  • the deduplication rate is obtained according to the data distribution corresponding to each partition and the capacity of each partition.
  • partition module 1408 For the specific implementation of the partition module 1408 constructing the partition decision model, please refer to the description of S610 to S616 in the embodiment shown in FIG. 6, which will not be described again here.
  • the device 1400 further includes a partitioning module 1408, which is used to:
  • the deduplication module 1406 is specifically used to:
  • the deduplication module 1406 is used to implement the description of S514 and S516 in the embodiment shown in FIG. 5 . I won’t go into details here.
  • the address information of the first data block is the logical address of the first data block, and the data writing module 1404 is also used to:
  • the deduplication module 1406 is specifically used to:
  • the write data module 1404 is also used to implement the description of S518 related content in the embodiment shown in FIG. 5 . I won’t go into details here.
  • the deduplication module 1406 is also used to:
  • the deduplication module 1406 is also used to implement the description of S518 related content in the embodiment shown in Figure 5. I won’t go into details here.
  • the device 1400 further includes:
  • the elimination module 1409 is configured to eliminate the metadata in the at least one partition when at least one partition in the inverse mapping table meets the elimination condition.
  • the elimination module 1409 is also used to implement the description of S520 related content in the embodiment shown in FIG. 5 . I won’t go into details here.
  • the data deduplication device 1400 may correspond to performing the method described in the embodiment of the present application, and the above and other operations and/or functions of the various modules/units of the data deduplication device 1400 are respectively to realize Figure 5, The corresponding processes of each method in the embodiment shown in Figure 6 will not be described again for the sake of simplicity.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a data center containing one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc.
  • the computer-readable storage medium includes instructions that instruct a computing device or a cluster of computing devices (such as a storage system) to execute the above data deduplication method.
  • An embodiment of the present application also provides a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transmitted over a wired connection from a website, computer, or data center. (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website, computer or data center.
  • the computer program product may be a software installation package. If it is necessary to use any of the foregoing data deduplication methods, the computer program product may be downloaded and executed on a computing device or computing device cluster.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种数据重删方法,包括:接收写请求,将写请求中的第一数据块写入存储设备,将第一数据块的元数据,如指纹和逻辑地址,写入元数据管理结构的多个分区中根据第一数据块的特征确定的第一分区,当第一分区中存在与第一数据块的指纹相同的指纹时,删除第一分区中第一数据块的元数据,并根据第一数据块的地址信息从存储设备中删除第一数据块。该方法通过对元数据管理结构主动分区,将数据块的元数据写入与该数据块的特征对应的分区,由此避免更新不频繁的数据被更新频繁的数据挤占资源,进而被淘汰,提高了重删率。

Description

一种数据重删方法及相关系统
本申请要求于2022年06月24日提交中国国家知识产权局、申请号为202210730080.0、发明名称为“一种数据重删方法”的中国专利申请的优先权,以及要求于2022年09月16日提交中国国家知识产权局、申请号为202211132110.4、发明名称为“一种数据重删方法及相关系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储技术领域,尤其涉及一种数据重删方法、装置、存储系统、计算机可读存储介质、计算机程序产品。
背景技术
随着计算产业的发展,数据价值得到充分释放,数据中心的规模从拍字节(petabyte,PB)级向泽字节(zettabyte,ZB)级增长。数据中心存储的大量数据中存在大量冗余数据,统计表明在主存储系统和备份存储系统这两大主要应用场景中,分别存在约50%和85%的数据冗余。如何有效减少冗余数据,进而降低存储成本已成为研究的热点方向。
业界通常采用重复数据删除(Data Deduplication,DD)以减少数据冗余。数据重复删除也可以简称为重删,具体是通过对数据进行分块,并基于数据块的内容计算得到数据块的指纹,再通过比对不同数据块的指纹,识别并删除内容重复的数据块,进而达到消除数据冗余的目标。
其中,数据块的指纹通常是以追加写入方式写入日志文件。在进行重删时,通过手动或周期性触发的方式将日志文件中的指纹排序,并将排序后的指纹与指纹文件中的指纹合并,根据合并结果删除内容重复的数据块。
数据中心存储的数据可以根据更新频次分为更新频繁的数据和更新不频繁的数据。然而,更新频繁的数据的占比通常高于更新不频繁的数据的占比。更新频繁的数据通常难以被重删,也即难以被重删的数据占比反而高,此时就会出现资源挤占的情况,更新不频繁的数据所分配的空间占比就会较低,由此导致易被淘汰,从而损失重删率。
发明内容
本申请提供了一种数据重删方法,该方法通过对元数据管理结构进行主动分区,将数据块的指纹以及地址信息等元数据写入与数据块的特征对应的分区,由此避免更新不频繁的数据被更新频繁的数据挤占资源,进而被淘汰,提高了重删率。本申请还提供了上述方法对应的装置、存储系统、计算机可读存储介质以及计算机程序产品。
第一方面,本申请提供一种数据重删方法。该方法可以应用于存储系统,包括集中式存储系统或者分布式存储系统。其中,集中存储式系统还可以分为盘控一体或盘控分离的集中式存储系统,分布式存储系统还可以分为存算一体的分布式存储系统或存算分离的分布式存储系统。集中式存储系统具有引擎,该引擎包括控制器,控制器可以包括处理器和内存,处理器可以加载内存中的程序代码,从而执行本申请的数据重删方法。类似地,分布式存储系统包括计算节点和存储节点,计算节点包括处理器和内存,处理器可以加载内存中的程序代码,从而执行本申请的数据重删方法。
具体地,存储系统接收写请求,该写请求中包括第一数据块,然后存储系统将第一数据块写入存储设备(例如是硬盘),接着将第一数据块的元数据写入元数据管理结构的多个分区中的第一分区。该第一分区为根据第一数据块的特征确定的。第一数据块的元数据包括第一数据块的指纹及地址信息。当第一分区中存在与第一数据块的指纹相同的指纹时,存储系统删除第一分区中第一数据块的元数据,并根据第一数据块的地址信息从存储设备中删除第一数据块。
在该方法中,不同特征的数据块的元数据可以写入元数据管理结构的不同分区,例如更新频繁的数据块的元数据可以写入容量较小的分区,更新不频繁的数据块的元数据可以写入容量较大的分区,由此 避免更新不频繁的数据被更新频繁的数据挤占资源,进而被淘汰,提高了重删率。
在一些可能的实现方式中,第一数据块的特征为所述第一数据块的指纹。需要说明的是,不同数据块可以对应相同的指纹,例如在进行备份时,可以存在多个数据块对应同一指纹。存储系统在将所述第一数据块的元数据写入元数据管理结构时,可以先确定第一数据块对应的指纹的热度,根据第一数据块对应的指纹的热度确定该热度对应的元数据管理结构的多个分区中的第一分区,然后将第一数据块的元数据写入所述第一分区。
该方法中,存储系统通过确定第一数据块对应的指纹的热度,并根据该热度将第一数据块写入对应的第一分区,可以避免更新不频繁的数据被更新频繁的数据挤占资源,进而被淘汰,提高了重删率。
在一些可能的实现方式中,第一数据块的地址信息包括第一数据块的逻辑地址,相应地,存储系统还可以在将第一数据块的元数据写入第一分区之后,对第一数据块对应的指纹的热度进行更新。其中,存储系统可以确定第一数据块的逻辑地址的热度,将逻辑地址的热度累加至所述第一数据块对应的指纹的热度,以更新所述第一数据块对应的指纹的热度。
在该方法中,通过将逻辑地址的热度累加至数据块对应的指纹的热度,进行指纹的热度更新,可以为后续元数据写入提供参考。
在一些可能的实现方式中,写请求中还可以包括第二数据块,第二数据块对应的指纹的热度可以高于所述第一数据块对应的指纹的热度。相应地,存储系统还可以将第二数据块写入存储设备,将第二数据块的元数据写入元数据管理结构的多个分区中的第二分区。其中,第二分区的容量小于第一分区的容量。
该方法通过将指纹的热度不同的数据块写入元数据管理结构的不同分区,由此避免热度低的数据块的元数据被热度高的数据块的元数据挤占资源,进而被淘汰,提高了重删率。
在一些可能的实现方式中,写请求中还可以包括第三数据块,第三数据块的指纹与第一数据块的指纹相同。当写入第一数据块的元数据时,第一数据块对应的指纹的热度小于预设热度,当写入第三数据块的元数据时,第三数据块对应的指纹的热度大于预设热度,则存储系统可以将第三数据块的元数据写入第二分区,并将第一分区中与第三数据块具有相同指纹的数据块的元数据移动至第二分区。
如此,可以实现随着数据块的不断写入,对元数据的存储位置进行调整,例如将指纹的热度较高的数据块的元数据移动至第二分区,从而在第一分区中为指纹的热度较低的数据块的元数据留出存储空间,以及在第二分区中存储具有相同指纹的数据块的元数据,以支持第二分区触发重删,进一步提升重删率。
在一些可能的实现方式中,存储系统也可以将第三数据块的元数据写入第二分区,将第一分区中与第三数据块具有相同指纹的数据块的元数据淘汰,无需移动至第二分区,一方面可以减少移动开销,另一方面可以在第一分区中为指纹的热度较低的数据块的元数据留出存储空间,进一步提升重删率。
在一些可能的实现方式中,元数据管理结构的多个分区的容量根据分区决策模型确定。其中,分区决策模型用于预测预设的分区容量组合中每个分区容量组合应用于所述元数据管理结构后对应的分区收益,并确定分区收益最大的分区容量组合作为所述元数据管理结构的多个分区的容量,所述分区收益根据重删率和分区调整成本中的至少一个确定。
该方法通过构建分区决策模型,通过分区决策模型对元数据管理结构进行主动分区,由此避免了更新频繁的数据挤占更新不频繁的数据的资源,进而避免更新不频繁的数据被淘汰,导致损失重删率。
在一些可能的实现方式中,分区收益可以为重删率,预设的分区容量组合可以包括第一容量组合。分区决策模型可以通过预估数据的命中率,从而预测分区容量组合应用于数据管理结构后对应的重删率。
具体地,分区决策模型通过如下方式预测所述第一分区容量组合应用于所述元数据管理结构后对应的重删率:获取所述第一分区容量组合应用于所述元数据管理结构所形成的多个分区中各个分区对应的工作负载特征,根据所述各个分区对应的工作负载特征,获得所述各个分区对应的数据分布,根据所述各个分区对应的数据分布以及所述各个分区的容量,获得所述重删率。
该方法基于各个分区对应的工作负载特征,拟合各个分区对应的数据分布,基于各个分区对应的数据分布和分区容量可以预测命中率,进而预测出应用分区容量组合后对应的重删率,无需实际运行存储系统,即可通过较低成本预测出重删率最大的分区容量组合,能够满足业务的需求。
在一些可能的实现方式中,考虑到工作负载可能发生变化,分区的容量还支持调整。例如,存储系 统可以周期性调整分区的容量。在具体工程实施过程中,分区调整需要重新进行分区初始化等操作,产生分区调整成本。分区决策模型可以基于调整前后的分区容量占比预测分区调整成本。分区决策模型可以根据收益率和分区调整成本,预测分区收益。例如,分区决策模型可以将预测的收益率与预测的分区调整成本的差值,作为预测的分区收益。
该方法通过重构分区收益,可以使得分区收益的评估更精准、合理,以重构的分区收益最大化为目标,所确定的分区容量组合更具有参考价值,能够实现重删率和分区调整成本的均衡。
在一些可能的实现方式中,存储系统可以周期性地调整元数据管理结构的多个分区的容量,当到达调整时刻时,根据所述调整时刻前的周期对应的分区收益、分区容量组合或各个分区对应的工作负载特征,确定是否调整多个分区的容量。
该方法通过调整时刻前的周期对应的反馈信息,如分区收益、分区容量组合或各个分区对应的工作负载特征,决策是否对分区的容量进行调整,使得分区能够基于工作负载的变化灵活调整,尽可能保障在不同阶段均具有较好的分区收益。
在一些可能的实现方式中,存储系统可以在第一分区中存储与第一数据块的指纹相同的指纹,且第一分区中与第一数据块的指纹相同的指纹的数量达到预设阈值时,删除第一数据块的元数据,并根据第一数据块的地址信息从存储设备中删除第一数据块。
其中,预设阈值可以根据经验值设置。例如,预设阈值可以设置为1,则第一分区中存在与第一数据块相同的指纹,存储系统即删除第一数据块的元数据以及第一数据块。又例如,预设阈值可以设置为2,则第一分区中存在2个数据块的指纹与第一数据块的指纹相同时,存储系统删除第一数据块的元数据以及第一数据块,进一步地,存储系统保留与第一数据块的指纹相同的2个数据块中的一个数据块及其元数据,删除另一个数据块及其元数据。
当预设阈值设置为较小值时,可以及时删除冗余的数据块和元数据,当预设阈值设置为较大值时,可以减少重删次数,避免频繁重删占用大量资源,影响业务正常运行。
在一些可能的实现方式中,第一数据块的地址信息为第一数据块的逻辑地址。存储系统还可以将第一数据块的逻辑地址和物理地址写入地址映射表。相应地,存储系统在删除第一数据块时,可以根据所述第一数据块的逻辑地址,从所述地址映射表中获取所述第一数据块的物理地址,然后根据所述物理地址从所述存储设备中找到所述第一数据块,并删除所述第一数据块。
在该方法中,存储系统基于地址映射表中逻辑地址到物理地址的单跳映射直接定位第一数据块,缩短了查找时间,提高了重删效率。
在一些可能的实现方式中,在删除第一分区中的第一数据块的元数据之后,存储系统还可以将前向映射表中第一数据块的物理地址修改为所述第一数据块的指纹。
该方法可以实现对被重删的数据块的重定位,以便于后续可以基于指纹查找到具有相同指纹的数据块的物理地址,并通过访问该物理地址访问该数据块。
在一些可能的实现方式中,存储系统还可以在所述逆映射表中的至少一个分区满足淘汰条件时,对所述至少一个分区中的所述元数据进行淘汰。
该方法通过对逆映射表进行元数据淘汰,以降低元数据的规模,进而降低内存开销,保障系统性能。
第二方面,本申请提供一种数据重删装置。所述装置包括:
通信模块,用于接收写请求,所述写请求中包括第一数据块;
写数据模块,用于将所述第一数据块写入存储设备;
所述写数据模块,还用于将所述第一数据块的元数据写入元数据管理结构的多个分区中的第一分区,所述第一分区为根据所述第一数据块的特征确定的,所述第一数据块的元数据包括所述第一数据块的指纹及地址信息;
重删模块,用于在所述第一分区中存在与所述第一数据块的指纹相同的指纹时,删除所述第一分区中所述第一数据块的元数据,并根据所述第一数据块的地址信息从所述存储设备中删除所述第一数据块。
在一些可能的实现方式中,所述第一数据块的特征为所述第一数据块对应的指纹,所述写数据模块具体用于:
确定所述第一数据块对应的指纹的热度;
根据所述第一数据块对应的指纹的热度确定所述热度对应的所述元数据管理结构的多个分区中的所述第一分区;
将所述第一数据块的元数据写入所述第一分区。
在一些可能的实现方式中,所述写数据模块还用于:
在将所述第一数据块的元数据写入所述第一分区之后,确定所述第一数据块的逻辑地址的热度;
将所述逻辑地址的热度累加至所述第一数据块对应的指纹的热度,以更新所述第一数据块对应的指纹的热度。
在一些可能的实现方式中,所述写请求中还包括第二数据块,所述第二数据块对应的指纹的热度高于所述第一数据块对应的指纹的热度,所述写数据模块还用于:
将所述第二数据块写入所述存储设备,将所述第二数据块的元数据写入所述元数据管理结构的多个分区中的第二分区,所述第二分区的容量小于所述第一分区的容量。
在一些可能的实现方式中,所述元数据管理结构的多个分区的容量根据分区决策模型确定,所述分区决策模型用于预测预设的分区容量组合中每个分区容量组合应用于所述元数据管理结构后对应的分区收益,并确定分区收益最大的分区容量组合作为所述元数据管理结构的多个分区的容量,所述分区收益根据重删率和分区调整成本中的至少一个确定。
在一些可能的实现方式中,所述分区收益为重删率,所述预设的分区容量组合包括第一分区容量组合,所述分区决策模型通过如下方式预测所述第一分区容量组合应用于所述元数据管理结构后对应的重删率:
获取所述第一分区容量组合应用于所述元数据管理结构所形成的多个分区中各个分区对应的工作负载特征;
根据所述各个分区对应的工作负载特征,获得所述各个分区对应的数据分布;
根据所述各个分区对应的数据分布以及所述各个分区的容量,获得所述重删率。
在一些可能的实现方式中,所述装置还包括分区模块,所述分区模块用于:
周期性地调整所述元数据管理结构的多个分区的容量;
当到达调整时刻时,根据所述调整时刻前的周期对应的分区收益、分区容量组合或各个分区对应的工作负载特征,确定是否调整所述多个分区的容量。
在一些可能的实现方式中,所述重删模块具体用于:
在所述第一分区中存在与所述第一数据块的指纹相同的指纹,且所述第一分区中的所述指纹的数量达到预设阈值时,删除所述第一数据块的元数据,并根据所述第一数据块的地址信息从所述存储设备中删除所述第一数据块。
在一些可能的实现方式中,所述第一数据块的地址信息为所述第一数据块的逻辑地址,所述写数据模块还用于:
将所述第一数据块的逻辑地址和物理地址写入地址映射表;
所述重删模块具体用于:
根据所述第一数据块的逻辑地址,从所述地址映射表中获取所述第一数据块的物理地址;
根据所述物理地址从所述存储设备中找到所述第一数据块,并删除所述第一数据块。
在一些可能的实现方式中,所述重删模块还用于:
在删除所述第一分区中的所述第一数据块的元数据之后,将所述地址映射表中所述第一数据块的物理地址修改为所述第一数据块的指纹。
在一些可能的实现方式中,所述装置还包括:
淘汰模块,用于当所述逆映射表中的至少一个分区满足淘汰条件,对所述至少一个分区中的所述元数据进行淘汰。
第三方面,本申请提供一种计算机集群。所述计算机集群包括至少一台计算机,所述至少一台计算机包括至少一个处理器和至少一个存储器。所述至少一个处理器、所述至少一个存储器进行相互的通信。所述至少一个处理器用于执行所述至少一个存储器中存储的指令,以使得计算机或计算机集群执行如第一方面或第一方面的任一种实现方式所述的数据重删方法。
第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,所述指令指示计算机或计算机集群执行上述第一方面或第一方面的任一种实现方式所述的数据重删方法。
第五方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机或计算机集群上运行时,使得计算机或计算机集群执行上述第一方面或第一方面的任一种实现方式所述的数据重删方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需使用的附图作以简单地介绍。
图1为本申请实施例提供的一种集中式存储系统的系统架构图;
图2为本申请实施例提供的一种分布式存储系统的系统架构图;
图3为本申请实施例提供的一种追加写入日志文件进行元数据管理的示意图;
图4为本申请实施例提供的一种通过日志文件和逆映射表进行元数据管理的示意图;
图5为本申请实施例提供的一种数据重删方法的流程图;
图6为本申请实施例提供的一种数据重删方法的流程图;
图7为本申请实施例提供的一种系统资源特征提取的示意图;
图8为本申请实施例提供的一种特征归并的示意图;
图9为本申请实施例提供的一种获取结构化特征的示意图;
图10为本申请实施例提供的一种分区决策建模的流程示意图;
图11为本申请实施例提供的一种评估策略选择的流程示意图;
图12为本申请实施例提供的一种数据重删方法的应用场景示意图;
图13为本申请实施例提供的一种数据重删方法应用于全局缓存的流程示意图;
图14为本申请实施例提供的一种数据重删装置的结构示意图。
具体实施方式
本申请实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。
首先对本申请实施例中所涉及到的一些技术术语进行介绍。
重复数据删除(Data Deduplication,DD),也可以简称为重删,是一种对重复数据进行删除,使得存储介质中对于相同数据仅存储一份,从而节约数据存储空间的数据缩减方案。重删具体可以通过对数据进行分块,并基于数据块的内容计算得到数据块的指纹(fingerprint,FP),再通过比对不同数据块的指纹,识别并删除内容重复的数据块,进而达到消除数据冗余的目标。
指纹是指基于数据块的内容确定的、用于标识数据块的身份信息。数据块的指纹可以是通过消息摘要算法对数据块的内容计算所得的消息摘要。其中,消息摘要算法通常基于散列函数,也即哈希(hash)函数实现,因此,数据块的指纹也可以是通过散列函数或哈希函数确定的散列值或哈希值。
重删还可以按照执行时间分类。例如,重删可以包括前重删或后重删。前重删是指数据在写入存储介质(简称为存储,例如可以是硬盘等设备)之前,进行重删。前重删也可以称作在线重删。后重删是指数据在写入存储介质(例如是硬盘等设备)之后,进行重删。后重删也称作后台重删、离线重删。
本申请实施例提供的数据重删方法可以应用于不同应用场景,例如可以应用于集中式存储系统或分布式存储系统。
所谓集中式存储系统就是指由一台或多台主设备组成中心节点,数据集中存储于这个中心节点中,并且整个系统的所有数据处理业务都集中部署在这个中心节点上。换言之,集中式存储系统中,终端或客户端仅负责数据的录入和输出,而数据的存储与控制处理完全交由中心节点来完成。集中式系统最大的特点就是部署结构简单,无需考虑如何对服务进行多个节点的部署,也就不用考虑多个节点之间的分布式协作问题。
其中,集中式存储系统可以包括盘控一体的集中式存储系统,或盘控分离的集中式存储系统。盘控 一体是指存储介质(如硬盘)与控制器是一体化的,盘控分离是指存储介质与控制器分离。
图1为本申请实施例所应用的一种集中式存储系统的系统架构图,在图1所示的应用场景中,用户通过应用程序来存取数据。运行这些应用程序的计算机被称为“应用服务器”。应用服务器100可以是物理机,也可以是对物理机进行虚拟化形成的虚拟机。物理机包括但不限于桌面电脑、服务器、笔记本电脑以及移动设备。
应用服务器通过光纤交换机110访问存储系统以存取数据。然而,交换机110只是一个可选设备,应用服务器100也可以直接通过网络与存储系统120通信。或者,光纤交换机110也可以替换成以太网交换机、无限带宽(InfiniBand,IB)交换机、基于融合以太网的远程直接内存访问(RDMA over Converged Ethernet,RoCE)交换机等。
图1所示的存储系统120是一个集中式存储系统。集中式存储系统的特点是有一个统一的入口,所有从外部设备来的数据都要经过这个入口,这个入口就是集中式存储系统的引擎121。引擎121是集中式存储系统中最为核心的部件,许多存储系统的高级功能都在其中实现。
如图1所示,引擎121中有一个或多个控制器,图1以引擎包含两个控制器为例予以说明。控制器0与控制器1之间具有镜像通道,那么当控制器0将一份数据写入其内存124后,可以通过所述镜像通道将所述数据的副本发送给控制器1,控制器1将所述副本存储在自己本地的内存124中。由此,控制器0和控制器1互为备份,当控制器0发生故障时,控制器1可以接管控制器0的业务,当控制器1发生故障时,控制器0可以接管控制器1的业务,从而避免硬件故障导致整个存储系统120的不可用。当引擎121中部署有4个控制器时,任意两个控制器之间都具有镜像通道,因此任意两个控制器互为备份。
引擎121还包含前端接口125和后端接口126,其中前端接口125用于与应用服务器100通信,从而为应用服务器100提供存储服务。而后端接口126用于与硬盘134通信,以扩充存储系统的容量。通过后端接口126,引擎121可以连接更多的硬盘134,从而形成一个非常大的存储资源池。
在硬件上,如图1所示,控制器0至少包括处理器123、内存124。处理器123是一个中央处理器(central processing unit,CPU),用于处理来自存储系统外部(服务器或者其他存储系统)的数据访问请求(如读请求或写请求),也用于处理存储系统内部生成的请求。示例性的,处理器123通过前端端口125接收应用服务器100发送的写请求时,会将这些写请求中的数据暂时保存在内存124中。当内存124中的数据总量达到一定阈值时,处理器123通过后端端口将内存124中存储的数据发送给硬盘134进行持久化存储。
内存124是指与处理器直接交换数据的内部存储器,它可以随时读写数据,而且速度很快,作为操作系统或其他正在运行中的程序的临时数据存储器。内存包括至少两种存储器,例如内存既可以是随机存取存储器,也可以是只读存储器(Read Only Memory,ROM)。举例来说,随机存取存储器是动态随机存取存储器(Dynamic Random Access Memory,DRAM),或者存储级存储器(Storage Class Memory,SCM)。DRAM是一种半导体存储器,与大部分随机存取存储器(Random Access Memory,RAM)一样,属于一种易失性存储器(volatile memory)设备。SCM是一种同时结合传统储存装置与存储器特性的复合型储存技术,存储级存储器能够提供比硬盘更快速的读写速度,但存取速度上比DRAM慢,在成本上也比DRAM更为便宜。然而,DRAM和SCM在本实施例中只是示例性的说明,内存还可以包括其他随机存取存储器,例如静态随机存取存储器(Static Random Access Memory,SRAM)等。而对于只读存储器,举例来说,可以是可编程只读存储器(Programmable Read Only Memory,PROM)、可抹除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)等。另外,内存124还可以是双列直插式存储器模块或双线存储器模块(Dual In-line Memory Module,简称DIMM),即由动态随机存取存储器(DRAM)组成的模块,还可以是固态硬盘(Solid State Disk,SSD)。实际应用中,控制器0中可配置多个内存124,以及不同类型的内存124。本实施例不对内存113的数量和类型进行限定。此外,可对内存124进行配置使其具有保电功能。保电功能是指系统发生掉电又重新上电时,内存124中存储的数据也不会丢失。具有保电功能的内存被称为非易失性存储器。
内存124中存储有软件程序,处理器123运行内存124中的软件程序可实现对硬盘的管理。例如将硬盘抽象化为存储资源池,然后划分为逻辑单元设备(logic unit number device,LUN)提供给服务器使用等。这里的LUN其实就是在服务器上看到的硬盘。当然,一些集中式存储系统本身也是文件服务器, 可以为服务器提供共享文件服务。
控制器1(以及其他图1中未示出的控制器)的硬件组件和软件结构与控制器0类似,这里不再赘述。
图1所示的是一种盘控分离的集中式存储系统。在该系统中,引擎121可以不具有硬盘槽位,硬盘134可以放置在硬盘框130中。按照引擎121与硬盘框130之间通信协议的类型,硬盘框130可能是串行连接小型计算机系统接口(Serial Attached Small Computer System Interface,SAS)硬盘框,也可能是非易失性内存主机控制器接口规范(non-volatile memory express,NVMe)硬盘框,网际协议(Internet Protocol,IP)硬盘框以及其他类型的硬盘框。SAS硬盘框,采用SAS3.0协议,每个框支持25块SAS硬盘。引擎121通过板载SAS接口或者SAS接口模块与硬盘框130连接。NVMe硬盘框,更像一个完整的计算机系统,NVMe硬盘插在NVMe硬盘框内。NVMe硬盘框再通过RDMA端口与引擎121连接。
后端接口126与硬盘框130通信。后端接口126以适配卡的形态存在于引擎121中,一个引擎121上可以同时使用两个或两个以上后端接口126来连接多个硬盘框130。或者,适配卡也可以集成在主板上,此时适配卡可通过PCIE总线与处理器112通信。
需要说明的是,图1中只示出了一个引擎121,然而在实际应用中,存储系统中可包含两个或两个以上引擎121,多个引擎121之间做冗余或者负载均衡。
图1所示的存储系统120为盘控分离的存储系统。在一些可能的实现方式中,集中式存储系统也可以是盘控一体的存储系统。在盘控一体的存储系统中,引擎121可以具有硬盘槽位,硬盘134可直接部署在引擎121中,后端接口126属于可选配置,当系统的存储空间不足时,可通过后端接口126连接更多的硬盘或硬盘框。
分布式存储系统是指将数据分散存储在多台独立的存储节点上的系统。分布式存储系统采用可扩展的系统结构,利用多台存储节点分担存储负荷,它不但提高了系统的可靠性、可用性和存取效率,还易于扩展。
如图2所示,本实施例提供的存储系统包括计算节点集群和存储节点集群。计算节点集群包括一个或多个计算节点110(图2中示出了三个计算节点110,但不限于三个计算节点110),各个计算节点110之间可以相互通信。计算节点110是一种计算设备,如服务器、台式计算机或者存储阵列的控制器等。在硬件上,如图2所示,计算节点110至少包括处理器112、内存113和网卡114。其中,处理器112是一个中央处理器(central processing unit,CPU),用于处理来自计算节点110外部的数据访问请求,或者计算节点110内部生成的请求。示例性的,处理器112接收用户发送的写请求时,会将这些写请求中的数据暂时保存在内存113中。当内存113中的数据总量达到一定阈值时,处理器112将内存113中存储的数据发送给存储节点100进行持久化存储。除此之外,处理器112还用于对数据进行计算或处理,例如元数据管理、重删、数据压缩、虚拟化存储空间以及地址转换等。图2中仅示出了一个处理器112,在实际应用中,处理器112的数量往往有多个,其中,一个处理器112又具有一个或多个处理器核。本实施例不对处理器的数量,以及处理器核的数量进行限定。
内存113是指与处理器直接交换数据的内部存储器,它可以随时读写数据,而且速度很快,作为操作系统或其他正在运行中的程序的临时数据存储器。内存包括至少两种存储器,例如内存既可以是随机存取存储器RAM,也可以是只读存储器ROM。实际应用中,计算节点110中可配置多个内存113,以及不同类型的内存113。本实施例不对内存113的数量和类型进行限定。此外,可对内存113进行配置使其具有保电功能。保电功能是指系统发生掉电又重新上电时,内存113中存储的数据也不会丢失。具有保电功能的内存被称为非易失性存储器。
网卡114用于与存储节点100通信。例如,当内存113中的数据总量达到一定阈值时,计算节点110可通过网卡114向存储节点100发送请求以对所述数据进行持久化存储。另外,计算节点110还可以包括总线,用于计算节点110内部各组件之间的通信。在功能上,由于图1中的计算节点110的主要功能是计算业务,在存储数据时可以利用远程存储器来实现持久化存储,因此它具有比常规服务器更少的本地存储器,从而实现了成本和空间的节省。但这并不代表计算节点110不能具有本地存储器,在实际实现中,计算节点110也可以内置少量的硬盘,或者外接少量硬盘。
任意一个计算节点110可通过网络访问存储节点集群中的任意一个存储节点100。存储节点集群包 括多个存储节点100(图1中示出了三个存储节点100,但不限于三个存储节点100)。一个存储节点100包括一个或多个控制器101、网卡104与多个硬盘105。网卡104用于与计算节点110通信。硬盘105用于存储数据,可以是磁盘或者其他类型的存储介质,例如固态硬盘或者叠瓦式磁记录硬盘等。控制器101用于根据计算节点110发送的读/写数据请求,往硬盘105中写入数据或者从硬盘105中读取数据。在读写数据的过程中,控制器101需要将读/写数据请求中携带的地址转换为硬盘能够识别的地址。由此可见,控制器101也具有一些简单的计算功能。
需要说明的是,网卡114或网卡104为智能网卡时,处理器112的功能如重删等也可以卸载至智能网卡。智能网卡是指融合有计算资源的网卡,例如是具有数据处理单元(data processing unit,DPU)的网卡。DPU具有CPU的通用性和可编程性,但更具有专用性,可以在网络数据包,存储请求或分析请求上高效运行。通过将重删等功能卸载至DPU,一方面可以减少对CPU资源的占用,另一方面可以缩短访问路径。
图2所示的存储系统为存算分离的分布式存储系统,在一些可能的实现方式中,存储系统也可以为存算一体的分布式存储系统。存储一体的分布式存储系统包括存储集群(也称作存储节点集群),存储节点集群可以包括一个或多个服务器,服务器之间可以相互通信。服务器是一种既具有计算能力又具有存储能力的设备。在硬件上,服务器至少包括处理器、内存、网卡和硬盘。处理器用于处理来自服务器外部(应用服务器或者其他服务器)的数据访问请求,也用于处理服务器内部生成的请求。示例性的,处理器接收写请求时,会将这些写请求中的数据暂时保存在内存中。当内存中的数据总量达到一定阈值时,处理器将内存中存储的数据发送给硬盘进行持久化存储。处理器还用于对数据进行计算或处理,例如元数据管理、重复数据删除、数据压缩、数据校验、虚拟化存储空间以及地址转换等。
需要说明,以上仅仅是对存储系统的示例说明,在本申请实施例其他可能的实现方式中,存储系统还可以是全融合架构的分布式存储系统或者是memory fabric架构的分布式存储系统。在此不再赘述。
为了实现重删功能,存储系统通常需要引入“两级元数据映射”(Two-level Metadata Mapping)。
一般情况下,例如是在不支持重删功能的存储系统中,元数据映射可以为逻辑块地址(Logical Block Address,LBA)到物理块地址(Physical Block Address,PBA)的直接映射。其中,逻辑块地址也称作逻辑地址,物理块地址也称作物理地址。逻辑地址是存储介质呈现给主机的逻辑空间的地址。主机在向存储介质发送写请求或读请求时,会将逻辑地址携带在写请求或读请求中。存储介质接收到送写请求或读请求时,会获取写请求或读请求携带的逻辑地址,对逻辑地址经过一次或多次地址转换确定物理地址,向物理地址写入数据或者从物理地址读取数据。通过采用LBA作为数据的地址,将物理地址这种基于磁头、柱面和扇区的三维寻址方式转变为一维的线性寻址,可以提高寻址的效率。
逻辑地址到物理地址的映射为单跳映射,在引入重删功能后,由于逻辑地址和数据内容并不匹配,如果按照逻辑地址进行路由难以找到数据,因此需要增加一跳按指纹的路由,因此,单跳映射可以变更为逻辑地址到指纹,再由指纹到物理地址的两级映射。
在图1或图2所示的存储系统中,处理器(例如是图1中的处理器123或者图2中的处理器112)可以将数据重删方法对应的计算机可读指令加载进内存(例如是图1中的内存124或者图2中的内存113),然后处理器执行上述计算机可读指令,以执行数据重删方法,从而节省存储空间、提升存储性能。
为了便于描述,以存储系统对应用产生的数据进行后重删示例说明。
具体地,应用写入数据时,存储系统(例如是存储系统中的处理器)可以将数据分块,该示例中假设以4千字节(kilo byte,KB)的粒度将数据分为多个数据块,然后存储系统中的重删模块可以对每个4KB的数据块计算指纹,将指纹写入日志文件,接着将数据块下盘(写入硬盘等存储设备)。用户可以手动触发重删操作,或者设置重删周期,如此,应用可以响应于用户的重删操作下发重删命令,或者是周期性地向重删模块下发重删命令,重删模块响应于重删命令,对日志文件中的指纹进行排序,将排序后的指纹与指纹文件中的指纹合并,根据合并结果删除内容重复的数据块。
如图3所示,对于新写入硬盘等存储介质(也可以为存储设备)的数据块,该数据块的指纹等元数据可以采用追加写入日志文件(Write-ahead Log,WAL)的方式进行管理。当触发重删时,重删模块将日志文件中的指纹进行排序,并将排序后的指纹与指纹文件中的指纹合并,基于合并结果可以保留具有相同指纹的多个数据块中的一个数据块,删除内容重复的数据块,并更新指纹文件。需要说明,图3中 不同图案的矩形块代表不同指纹。
然而,很多应用的工作负载(workload)通常是非均匀的,也即应用写入硬盘等存储设备进行持久化存储的数据可以具有不同的更新频次。其中,更新频繁的数据的占比通常高于更新不频繁的数据的占比。更新频繁的数据通常难以被重删,也即难以被重删的数据占比反而高,此时就会出现资源挤占的情况,更新不频繁的数据所分配的空间占比就会较低,由此导致易被淘汰,从而损失重删率。
有鉴于此,本申请提供的数据重删方法可以对元数据管理结构进行主动分区,在将写请求中的数据块写入存储设备(如硬盘)后,将数据块的指纹以及地址信息等元数据写入元数据管理结构的多个分区中与该数据块的特征对应的分区,并在该分区中存在与该数据块的指纹相同的指纹时,删除该分区中该数据块的元数据,以及根据该数据块的地址信息从存储设备删除该数据块。
如此,不同特征的数据块的元数据可以写入元数据管理结构的不同分区,例如更新频繁的数据块的元数据可以写入容量较小的分区,更新不频繁的数据块的元数据可以写入容量较大的分区,由此避免更新不频繁的数据被更新频繁的数据挤占资源,进而被淘汰,提高了重删率。
进一步地,本申请引入了一种新的元数据管理结构,即用于存储数据块的指纹到逻辑地址的映射关系的逆映射表(Inverse Mapping Table)。如图4所示,区别于基于日志文件的元数据管理,本申请实施例通过逆映射表对指纹、逻辑地址等元数据进行管理,当逆映射表的分区中与当前写入的数据块的指纹相同的指纹的数量达到预设阈值,即触发重删,无需等待用户手动触发或者周期性地触发,能够及时地删除内存中重复的指纹等元数据,减少了元数据内存开销,保障了系统性能。此外,当前写入数据块的逻辑地址和物理地址可以写入地址映射表,被重删的数据块的物理地址可以修改为指纹。如此,被重删的数据块可以通过两级映射进行寻址,未被重删的数据块可以通过单级映射进行寻址,缩短了未被重删的数据块的响应时间。
需要说明,重删后的逆映射表中的指纹和逻辑地址还可以写入指纹表,如此指纹表中可以存储重删后的数据块的指纹和逻辑地址。存储系统也可以通过指纹表、前向映射表进行寻址。基于此,逆映射表的各个分区还可以在满足淘汰条件时,对分区中的元数据进行淘汰,从而降低元数据的规模,减小内存开销。
为了使得本申请的技术方案更加清楚、易于理解,下面将以图1所示的存储系统120为例,对本申请提供的数据重删方法进行介绍。
参见图5所示的数据重删方法的流程图,存储系统120包括引擎121和硬盘框130,引擎121中包括互为备份的控制器0和控制器1,为了便于描述,图5从控制器0的角度进行示例说明,硬盘框130中包括多个硬盘134,该方法包括如下步骤:
S502:控制器0接收来自于应用服务器100的写请求。
写请求是指用于写数据的请求。写请求中包括数据,该写请求即用于将该数据写入硬盘134,进行持久化存储。其中,写请求可以由部署于应用服务器100上的应用基于业务需求生成。例如,应用服务器100上可以部署视频应用,视频应用可以为短视频应用或长视频应用,该视频应用可以生成写请求,其中,写请求中包括用户上传的视频流。又例如,应用服务器100上可以部署文件管理应用,该文件管理应用可以是文件管理器,文件管理器可以生成写请求,写请求中包括待归档的图像。
在图1的示例中,控制器0包括处理器123和前端接口125,控制器0的处理器123可以通过前端接口125,接收应用服务器100通过交换机110转发的写请求。
S504:控制器0将写请求中数据分块,获得至少一个数据块。
在本实施例中,控制器0可以采用定长分块或变长分块,对写请求中数据进行分块,从而获得至少一个数据块。其中,定长分块是指按照设置好的分块粒度对数据流进行分块。变长分块是将数据流分为大小不固定的数据块,变长分块可以包括基于滑动窗口的变长分块和基于内容的变长分块(content-defined chunking,CDC)。
为了便于理解,下面以定长分块进行示例说明。具体地,数据流的大小为分块粒度的整数倍时,控制器0可以将数据均匀地切分为一个或多个数据块。数据大小并非分块粒度的整数倍时,控制器0可以将数据进行填充,例如是在数据的末端填零,使得填充后的数据为分块粒度的整数倍,接着控制器0按 照该分块粒度将数据均匀地切分为一个或多个数据块。例如,数据的大小为19KB时,控制器0可以在该数据的末端填零,使得填充后的数据的大小为20KB,然后控制器0按照4KB的分块粒度进行分块,可以获得5个大小为4KB的数据块。
考虑到不同存储场景的输入输出(input output,IO)模式、IO大小、特性要求不同,控制器0可以根据存储场景选择合适的分块策略进行分块。例如,主存储场景中,IO通常较小,并且IO模式以随机读写为主,控制器0可以选择采用定长分块;备份存储场景中,IO通常较大,IO模式以顺序读写为主,控制器0可以选择变长分块,以获得较好的重删率。
需要说明的是,对数据进行分块是本申请实施例中数据重删方法的可选步骤,执行本申请实施例的数据重删方法也可以不执行上述步骤。例如,数据的大小等于分块粒度,或者小于分块粒度时,可以直接将数据作为一个数据块。又例如,数据的大小是固定大小时,也可以直接将数据作为一个数据块。
S506:控制器0确定至少一个数据块的指纹。
针对至少一个数据块中的任意数据块,控制器0可以根据该数据块的内容,通过消息摘要算法进行计算,获得该数据块的指纹。数据块的指纹可以是该数据块的消息摘要,例如是数据块的哈希值。
S508:控制器0根据至少一个数据块中第一数据块的指纹查询指纹表。当第一数据块的指纹在指纹表中存在时,执行S509,当第一数据块的指纹在指纹表中不存在时执行S510、S511、S512。
S509:控制器0向应用服务器100返回写响应。
S510:控制器0将第一数据块写入硬盘134。
指纹表用于记录硬盘134存储的数据块的指纹和地址信息。其中,地址信息可以包括逻辑地址。进一步地,地址信息还可以包括物理地址。指纹表可以采用键值对(key value,kv)方式存储指纹和地址信息。具体地,指纹表可以以指纹为key,以逻辑地址等地址信息为value进行存储。
指纹表中记录的指纹和地址信息可以来自逆映射表。逆映射表是一种存储已下盘的数据块的指纹和逻辑地址的元数据管理结构。具体地,逆映射表触发重删后,控制器0可以将重删后的逆映射表同步至指纹表,具体是将重删后的逆映射表中的元数据(例如是指纹和逻辑地址)存储至指纹表。
该指纹表中存储有已下盘的数据块的指纹,因此可以支持前重删,从而减少硬盘134的存储压力。具体地,控制器0可以根据第一数据块的指纹查询指纹表。例如,控制器0可以将第一数据块的指纹与指纹表中的指纹进行比对,或者控制器0可以根据第一数据块的指纹以及指纹表的索引快速查找指纹。
当第一数据块的指纹在指纹表中存在时,表明具有相同内容的数据块已写入硬盘134,控制器0可以执行S509,以直接返回写响应,该写响应用于表征写成功。当第一数据块的指纹在指纹表中不存在时,表明磁盘134中并未存储相同内容的数据块,控制器0可以执行S510,将第一数据块写入磁盘134。进一步地,控制器0也可以在第一数据块写入硬盘134成功后,向应用服务器100返回写响应。
需要说明的是,在业务的初始阶段,指纹表可以为空。随着应用服务器100不断向硬盘134存储数据,逆映射表中可以记录已下盘的数据块的元数据,当逆映射表中的分区触发重删,重删后的逆映射表中的元数据可以同步至指纹表。在该阶段,控制器0可以查询指纹表,从而实现前重删。与后重删相比,前重删通过在数据落盘之前删除重复数据,无需将重复数据写入硬盘134等存储介质,避免了对资源的占用。
基于此,执行本申请实施例的方法也可以不执行上述S508、S509。例如,在业务的初始节点,指纹表为空,控制器0可以直接将数据块下盘进行后重删,并在指纹表中记录的元数据达到预设条数,支持前重删。又例如,控制器0可以不对数据块进行前重删,直接将数据块下盘进行后重删。
S511:控制器0向地址映射表中写入第一数据块的逻辑地址和物理地址。
地址映射表用于存储写入硬盘134的数据块的地址信息。例如,地址映射表可以存储数据块的逻辑地址和物理地址。具体实现时,地址映射表可以采用kv方式存储逻辑地址和物理地址。为了便于查找或定位数据块,地址映射表可以以逻辑地址为key,以物理地址为value,存储逻辑地址到物理地址的映射关系。一方面可以便于后续访问该数据块时快速寻址,另一方面可以记录该操作,便于后续追溯或故障恢复。
区别于逆映射表以逻辑地址为value,地址映射表以逻辑地址为key,因此也可以称之为前向映射表。需要说明的是,如果第一数据块被前重删,也就表明第一数据块并未被写入硬盘134,也就不存在相应 的物理地址,控制器0可以在前向映射表中存储第一数据块的逻辑地址和指纹。当需要查找第一数据块时,可以根据前向映射表查找该第一数据块的指纹,然后查找指纹表,获得与该第一数据块具有相同指纹的数据块,获得具有相同指纹的数据块的逻辑地址,基于具有相同指纹的数据块的逻辑地址可以通过前向映射表,获得具有相同指纹的数据块的物理地址,基于具有相同指纹的数据块的物理地址可以访问具有相同指纹的数据块,由此可以实现访问第一数据块。
需要说明的是,上述S510、S511的执行顺序并不限定。在一些可能的实现方式中,控制器0也可以先在前向映射表中写入逻辑地址和物理地址,然后再向硬盘134等存储设备中写入第一数据块。
S512:控制器0将第一数据块的指纹和逻辑地址插入逆映射表的第一分区。在第一分区中存在与第一数据块的指纹相同的指纹时,执行S514、S516。
逆映射表用于存储指纹和逻辑地址。其中,逆映射表可以是以键值对形式组织的表结构。逆映射表中的键值对用于表征逆映射关系。区别于地址映射表中自逻辑地址映射到物理地址的映射形式,逆映射表中用于存储指纹到逻辑地址的逆映射关系。其中,键值对中的key为指纹,value为逻辑地址。控制器0可以将第一数据块的指纹和逻辑地址有序插入逆映射表。
其中,控制器0可以将第一数据块的指纹和逆映射表中第一分区的指纹进行多路归并(N-Way Merge)排序,然后根据排序结果,将第一数据块的指纹和逻辑地址插入逆映射表的第一分区。
多路归并排序是指待排序的对象(例如是数据块的指纹)分为多路分别进行排序,然后将各路排序结果进行合并,从而实现归并排序。假设每路的排序结果可以记作具有n个元素的有序小集合S={x|xi≤xj,i,j∈[0,n)},假设待排序的指纹被分为m路,则可以对m个小集合S1、S2、…Sm进行合并以实现归并排序。其中,控制器0可以按照如下公式确定所有小集合中的最小值,写入大集合:
min=min(min(S1),min(S2),…min(Sm))      (1)
假设所有小集合中的最小值为集合Si中最小值,控制器0可以在将该最小值写入大集合后,将该最小值从集合Si去除,然后继续按照上述公式(1)计算,确定更新后的所有小集合的最小值,并将最小值写入大集合。控制器0重复上述过程,直至所有小集合中的元素均写入大集合,完成归并排序。
在一些可能的实现方式中,控制器0可以采用日志结构合并树(log structured merge tree,LSM tree)将第一数据块的指纹和逆映射表中的指纹进行排序,进而将第一数据块的指纹和逻辑地址有序插入逆映射表的第一分区。其中,逆映射表的各个分区均可以维护一个LSM tree,从而将待写入该分区的指纹和逻辑地址等元数据有序插入。
在本实施例中,逆映射表包括多个分区,第一分区可以根据第一数据块的特征确定。其中,第一数据块的特征可以是第一数据块对应的指纹。在一些情况下,多个数据块可以对应相同指纹。例如,在备份场景中,多个数据块可以对应相同指纹。控制器0可以根据第一数据块对应的指纹,确定该指纹的热度,然后根据第一数据块对应的指纹的热度确定该热度对应的逆映射表的多个分区中的第一分区。例如,控制器0可以将指纹的热度与预设热度进行比较,当指纹的热度小于预设热度,则可以确定第一分区为逆映射表中用于存储冷数据的元数据的分区,也称作冷分区,当指纹的热度大于或等于预设热度,则可以确定第一分区为用于存储热数据的分区,也称作热分区。其中,冷分区可以是容量较大的分区,热分区可以是容量较小的分区。冷分区的容量大于热分区的容量。接着控制器0将第一数据块的指纹和逻辑地址写入第一分区。
进一步地,在将第一数据块的指纹和逻辑地址等元数据写入第一分区后,控制器0还可以更新第一数据块对应的指纹的热度,以用于确定后续写入的具有相同指纹的数据块所对应的分区。具体地,控制器0可以确定第一数据块的逻辑地址的热度,将逻辑地址的热度累加至第一数据块对应的指纹的热度,从而更新第一数据块对应的指纹的热度。
为了便于理解,下面结合一示例进行说明。该示例中,控制器0可以在每次写入数据块的元数据后,更新数据块对应的指纹的热度。例如,控制器0写入硬盘134的第一数据块为数据块10,数据块10的指纹记作FP3,由于之前曾写入具有相同指纹的数据块,并且最近一次写入指纹为FP3的数据块为数据块8,控制器0可以获取写入数据块8的元数据后所更新的FP3的热度。该示例中假设FP3的热度为5。控制器0可以基于该热度确定逆映射表的多个分区中的第一分区,例如第一分区可以是用于存储冷数据 的元数据的分区,也称作冷分区。控制器0还可以确定数据块10的逻辑地址的热度,假设逻辑地址的热度为2,则可以将逻辑地址的热度累计至指纹的热度,从而更新指纹的热度。在该示例中,更新后的指纹的热度可以为7。
随着数据块的不断写入,数据块对应的指纹的热度可以发生变化。为此,控制器0还可以根据数据块对应的指纹的热度,将数据块的元数据在不同分区移动。例如,在初始阶段,各数据块对应的指纹的热度通常较低,可以将这些数据块的元数据写入冷分区。随着数据块的不断写入,部分指纹的热度不断增加,当在写入某个数据块后,该数据块对应的指纹的热度大于预设热度时,控制器0可以将该数据块的元数据写入热分区,以及将具有相同指纹的数据块的元数据移动至热分区。
在一些可能的实现方式中,考虑到在不同分区移动元数据的开销,控制器0也可以不移动元数据,而是将元数据淘汰。例如,当在写入某个数据块后,该数据块对应的指纹的热度大于预设热度时,控制器0可以将该数据块的元数据写入热分区,以及将具有相同指纹的数据块的元数据淘汰出冷分区。
进一步地,控制器可以在第一分区存在与第一数据块的指纹相同的指纹时,即触发重删,也可以在第一分区中存在与第一数据块的指纹相同的指纹,且第一分区中与第一数据块的指纹相同的指纹的数量达到预设阈值时,触发重删,执行S514、S516。
其中,预设阈值可以根据经验值进行设置。例如,预设阈值可以设置为2,以图4示例说明,该示例中,逆映射表的分区1(如第一分区)中与第一数据块的指纹相同的指纹的数量达到2,则可以触发重删。又例如,预设阈值也可以设置为1,也即存在与第一数据块的指纹相同的指纹,即可触发重删。也就是说,当第一分区中存在与第一数据块的指纹相同的指纹时,控制器0可以执行S514、S516实现重删,但本申请实施例对重删时机不作限定,例如,控制器0可以在第一分区中相同的指纹的数量达到预设阈值时触发重删,也可以在存在相同的指纹时即触发重删。
在本实施例中,控制器0通过直接将第一数据块写入硬盘134,然后执行后重删。如此可以避免消耗计算资源影响业务正常运行,以及出现计算瓶颈时导致存储性能下降。
需要说明的是,逆映射表为本实施例引入的一种新型元数据管理结构,在本申请实施例其他可能的实现方式中,元数据管理结构也可以采用其他组织形式。
S514:控制器0根据第一数据块的地址信息从硬盘134中删除第一数据块。
对于具有相同指纹的数据块,控制器0可以保留一个数据块,从硬盘134中删除其他具有相同指纹的数据块。具体实现时,控制器0可以根据第一数据块的逻辑地址,从地址映射表中获取第一数据块的物理地址,然后根据该物理地址从硬盘134等存储设备中找到该第一数据块,并删除第一数据块。
进一步地,预设阈值大于1时,控制器0还可以保留一个与第一数据块指纹相同的数据块,删除其他指纹相同的数据块。具体地,针对指纹相同的数据块,控制器0可以保留最先写入的数据块,删除后写入的数据块。
S516:控制器0从逆映射表的第一分区中删除第一数据块的指纹以及逻辑地址。
具体地,控制器0可以从逆映射表中查找第一数据块的指纹,然后删除第一数据块的指纹以及逻辑地址。需要说明的是,当逆映射表中指纹与逻辑地址的键值对采用LSM tree的方式存储时,控制器0可以采用表格合并的方式删除第一数据块的指纹以及对应的逻辑地址。
进一步地,预设阈值大于1时,控制器0还可以保留一个与第一数据块指纹相同的数据块的元数据,删除其他指纹相同的数据块的元数据。具体地,针对指纹相同的数据块,控制器0可以保留最先写入的数据块的元数据,删除后写入的数据块的元数据。
此外,控制器0在执行上述S514、516时可以并行执行,也可以按照设定的顺序先后执行。例如,控制器0也可以先执行S516,然后执行S514。本实施例对S514和S516的执行顺序不作限定。
需要说明的是,至少一个数据块还可以包括第二数据块。其中,第二数据块对应的指纹的热度可以高于第一数据块对应的指纹的热度,例如第二数据块可以为热数据,第一数据块可以为冷数据。相应地,控制器0可以将第二数据块写入硬盘134,将第二数据块的元数据,如第二数据块的指纹和逻辑地址,写入逆映射表的多个分区中的第二分区。第二分区具体是用于存储热数据的元数据的热分区。第二分区的容量小于第一分区的容量。
进一步地,数据块对应的指纹的热度还可以分为更多的类型或级别,例如指纹的热度也可以分为热、 温、冷三个级别。相应地,逆映射表可以包括更多的分区,例如逆映射表可以包括用于存储热数据的元数据的分区、用于存储温数据的元数据的分区和用于存储冷数据的元数据的分区。
上述S516为本申请实施例中删除第一分区中第一数据块的元数据的一种实现方式,当元数据管理结构用于存储指纹、逻辑地址和物理地址时,控制器0可以删除元数据管理结构的第一分区中第一数据块的指纹、逻辑地址和物理地址。
S518:控制器0将重删后的逆映射表中的指纹和逻辑地址写入指纹表。
具体地,控制器0还可以将重删后的逆映射表中的指纹和逻辑地址等元数据,以同步方式写入至指纹表。其中,控制器0可以以分区为粒度,将重删后的逆映射表中的元数据同步写入指纹表。例如,第一分区触发重删后,控制器0可以将第一分区重删后的元数据同步写入指纹表,第二分区触发重删后,控制器0再将第二分区重删后的元数据同步写入指纹表。考虑到分区可以触发多次重删,为了减少资源占用,控制器0可以采用增量同步机制,将重删后的元数据同步写入指纹表。
S520:当逆映射表中的至少一个分区满足淘汰条件,控制器0对至少一个分区中的元数据进行淘汰。
具体地,逆映射表中的分区可以设置用于淘汰的水位,当该分区中的元数据占用的资源量到达该水位,则控制器0可以对该分区中的元数据进行淘汰,从而避免元数据溢出。其中,由于不同分区存储有不同热度的指纹,分区的水位可以是不同的。例如,第一分区的容量为逆映射表的总容量的80%,第二分区的容量为逆映射表的总容量的20%时,第一分区的水位可以是逆映射表的总容量的70%,第一分区的水位可以是逆映射表的总容量的10%。
在一些可能的实现方式中,逆映射表中的分区的水位可以包括高水位和低水位。当分区中的元数据占用的资源量到达高水位时,可以对分区中的元数据进行淘汰,使得淘汰后的分区中元数据占用的资源量,不低于上述低水位,且不高于高水位。
控制器0对逆映射表的分区中的元数据进行淘汰,尤其是对更新频繁的热数据对应的分区中的元数据进行淘汰,可以大幅降低元数据规模,减少内存开销。
S522:控制器0将地址映射表中被重删的第一数据块的物理地址修改为第一数据块的指纹。
由于第一数据块已经从硬盘134中删除,基于地址映射表中的物理地址难以定位该第一数据块,因此,控制器0可以将地址映射表中被重删的第一数据块的物理地址修改为第一数据块的指纹,表征该第一数据块为被重删的数据块,可以通过指纹表查找到与该数据块具有相同指纹的数据块,进而确定具有相同指纹的数据块的物理地址,如此可以实现被重删的第一数据块的寻址。
基于此,未被重删的数据块仍能基于前向映射表中逻辑地址与物理地址之间的映射关系实现寻址,而不需要通过两级元数据映射,大幅缩短了寻址时间,进而缩短了响应时间,提高了响应效率。
需要说明的是,上述S518至S522为本申请实施例的可选步骤,例如内存为大容量内存时,控制器0也可以不执行将逆映射表中的数据写入指纹表,并对逆映射表中的元数据进行淘汰的步骤。
基于上述内容描述,本申请提供的数据重删方法通过对元数据管理结构进行主动分区,以差异化处理不同更新频次的数据的元数据。其中,更新不频繁的数据能够分配足够的配额资源以支持重删,而更新频繁的数据因为被频繁无效掉,因此分配的配额资源就相对较少;这样使得最终可以被重删的数据基本都可以被重删掉而不是被挤占淘汰,获得重删率的提升;同时难以重删的数据能够被淘汰,进而降低整体元数据映射规模,获得系统性能提升。
进一步地,该方法还引入逆映射表这一新型元数据管理结构,当逆映射表的第一分区中存在与第一数据块的指纹相同的指纹可以触发重删,无需用户手动重删,或者周期性地触发重删,如此可以实现及时进行重删,减少元数据规模,进而减少了元数据内存开销,保障了系统性能。而且,该方法在逆映射表中记录下盘的数据块的逻辑地址和物理地址,并将被重删的数据块的物理地址修改为指纹,如此可以使得未被重删的数据块仍能通过单跳映射寻址,缩短了寻址时间,提高了响应效率。
图5所示实施例的关键在于对逆映射表等元数据管理结构进行分区。其中,元数据管理结构中各个分区的容量可以根据分区决策模型确定。分区决策模型用于预测预设的分区容量组合中每个分区容量组合应用于元数据管理结构后对应的分区收益,并确定分区收益最大的分区容量组合作为所述元数据管理结构的多个分区的容量。
其中,分区容量组合表示一组分区中各个分区的容量。分区的容量是指分区所分配到的资源量。一组分区中各个分区的容量之和等于元数据管理结构的总容量。基于此,分区容量组合可以通过各分区的实际容量表征,也可以通过各分区的容量占比表征。
为了便于描述,下文均以分区容量组合通过分区容量占比示例说明。例如,一个分区容量组合可以表示为80%:20%,用于表征该元数据管理结构包括两个分区,容量分别为总容量的80%和20%。又例如,一个分区容量组合可以表示为60%:30%:10%,用于表征该元数据管理结构包括三个分区,容量分别为总容量的60%、30%和10%。
分区收益是指对元数据管理结构实施分区后所获得的收益。例如,在对元数据管理结构实施分区后可以获得重删率的提升,因此,分区收益可以为重删率。
分区决策模型可以通过预估数据的命中率,从而预测重删率。为了便于描述,以预设的分区容量组合中的第一分区容量组合示例说明。分区决策模型可以通过以下方式预测第一分区容量组合应用于所述元数据管理结构后对应的重删率:
获取第一分区容量组合应用于所述元数据管理结构所形成的多个分区中各个分区对应的工作负载特征,然后根据各个分区对应的工作负载特征,获得各个分区对应的数据分布,接着根据各个分区对应的数据分布以及各个分区的容量,获得重删率。
进一步地,考虑到工作负载可能发生变化,分区的容量还支持调整。工作负载可以为一段时间内正在使用或等待使用CPU等计算资源的任务。在业务高峰期,正在使用或等待使用CPU等计算资源的任务较多,工作负载较大,在业务低谷期,正在使用或等待使用CPU等计算资源的任务较少,工作负载较小。基于此,控制器0可以对分区进行调整。在具体工程实施过程中,分区调整需要重新进行分区初始化等操作,产生分区调整成本。基于此,分区收益可以根据收益率和分区调整成本中的至少一个确定。
在一些可能的实现方式中,控制器0可以周期性地调整元数据管理结构的多个分区的容量。其中,每个周期也可以称作分区调整周期。当到达调整时刻时,控制器0可以根据调整时刻前的周期(例如是上一分区调整周期)对应的分区收益、分区容量组合或各个分区对应的工作负载特征,确定是否调整多个分区的容量。其中,工作负载特征是指从工作负载信息中提取的特征,例如工作负载特征可以包括重用距离、重用周期、重用频次中的一种或多种。
为了便于理解,下面结合附图对本申请实施例的数据重删方法中分区过程进行详细介绍。为了便于描述,下面仍以对逆映射表分区进行示例说明。
参见图6所示的数据重删方法的流程图,在图5所示实施例基础上,控制器0在将第一数据块写入硬盘134之后还可以执行如下步骤:
S602:控制器0获取上一分区调整周期的系统资源使用信息和各个分区对应的工作负载信息。
系统资源包括计算资源(如CPU等处理器资源)、内存资源、磁盘资源、网络资源中的一种或多种。基于此,系统资源使用信息包括CPU占用比例、内存占用比例、磁盘IO量、以及带宽占用量。
工作负载(workload)是指一段时间内正在使用或等待使用CPU等计算资源的任务。该任务可以是写数据。基于此,工作负载信息可以包括数据的重用距离(Reuse Distance)、重用周期或重用频次中的一种或多种。重用距离可以为对同一数据的相邻两次访问之间所间隔的访问次数。在统计重用距离时,可以以数据块为粒度进行统计。重用周期可以为不同写请求中对同一数据的相邻两次访问之间所间隔的写请求个数。重用频次可以是重用周期的倒数。需要说明的是,控制器0在获取重用距离、重用周期或重用频次等工作负载信息时,可以针对各个分区分别提取重用距离、重用周期或重用频次,从而获得各个分区对应的工作负载信息。
S604:控制器0从系统资源使用信息中提取系统资源特征,从各个分区对应的工作负载信息中提取各个分区对应的工作负载特征。
具体地,控制器0可以对系统资源使用信息进行向量化,获得系统资源特征,以及对工作负载信息进行向量化,获得工作负载特征。为了便于理解,下文以对CPU占用比例、重用距离的向量化进行示例说明。
在对CPU占用比例进行向量化时,控制器0可以将CPU占用比例与CPU占用阈值进行比较。其中,CPU占用阈值可以根据历史业务经验设置。例如,CPU占用阈值可以设置为70%。当CPU占用比 例高于CPU占用阈值,则可以输出“1”,当CPU占用比例不高于CPU占用阈值,则可以输出“0”,具体可以参见如下公式:
其中,F表示特征,F可以采用向量表示。
类似地,参见图7所示的系统资源特征提取的示意图,控制器0可以将内存占用比例、磁盘IO量、带宽占用量等系统资源使用信息与该资源对应的阈值进行比较,根据比较结果输出上述系统资源使用信息对应的系统资源特征,例如为F(内存占用)、F(磁盘IO量)、F(带宽占用量)。
在对重用距离进行向量化时,可以累计本批次数据(如本次写请求中数据包括的数据块)的重用距离,然后由此计算出重用距离的均值和方差,具体如下所示:
类似地,控制器0可以采用和处理重用距离相同的统计方法,对重用周期、重用周期进行统计,以充分挖掘其中的时间关联性和空间关联性,从而实现从工作负载信息中提取工作负载特征。
需要说明的是,上述S602、S604为本申请实施例的可选实施方式,执行本申请实施例的方法也可以不执行上述S602、S604。例如,控制器0也可以不获取系统资源使用信息,提取系统资源特征。
S606:控制器0根据系统资源特征和工作负载特征,获得结构化特征。
结构化特征包括系统资源特征和工作负载特征。该系统资源特征从系统资源使用信息提取得到,该工作负载特征从工作负载信息提取得到。控制器0可以对系统资源特征和工作负载特征进行融合,从而得到结构化特征。例如,控制器0可以将系统资源特征和工作负责特征进行拼接,从而实现融合得到结构化特征。
其中,控制器0还可以对系统资源特征进行归并。从业务角度分析,某些时刻某一系统资源出现占用过高,就会造成系统性能瓶颈,此时相同类型的系统资源特征(也可以称作关联特征、共性影响特征)已不再必要。对于此类特征,可执行特征归并。如图8所示,控制器0可以采用“或”运算即可实现特征归并。例如,F(CPU占用)、F(内存占用)、F(磁盘IO量)、F(带宽占用量)属于共性影响特征,控制器0可以将上述共性影响特征归并。
如图9所示,控制器0通过多源信息处理,获得工作负载特征以及具有共性影响的系统资源特征。在归并共性影响特征后,控制器0可以对共性影响特征进行标准化、归一化,类似地,控制器0可以在提取到工作负载特征后,对工作负载特征进行标准化、归一化。其中,标准化、归一化后的系统资源特征可以为a0a1…ak,标准化、归一化后的工作负载特征可以为b0b1…bk。控制器0可以将上述系统资源特征a0a1…ak以及工作负载特征b0b1…bk进行拼接,从而获得结构化特征。
控制器0通过特征归并与归一化等通用特征处理手段,对特征进行清洗处理,可实现以较低的计算开销,生成对当前情况准确刻画的特征模型,为分区决策提供可靠依据。
上述S602至S606为控制器0获取结构化特征的一种实现方式,在本申请实施例其他可能的实现方式中,控制器0也可以通过其他方式获取结构化特征。进一步地,当控制器0不获取系统资源使用信息,并从中提取系统资源特征时,控制器0也可以不执行上述S606。
S608:控制器0获取上一分区调整周期的反馈信息,根据上一分区调整周期的反馈信息确定是否触发分区调整。若是,则执行S610;若否,则执行S622。
控制器0可以设置分区调整的触发条件。控制器0可以根据上一分区调整周期的反馈信息,例如是上一分区调整周期的分区收益(如重删率)、各个分区对应的工作负载特征,判断分区调整的触发条件是否被满足,从而确定是否触发分区调整。
其中,针对当前分区调整周期,分区调整的触发条件可以设置为:上一分区调整周期的重删率小于预设值或者下降幅度达到预设幅度,或者上一分区调整周期的工作负载特征相对于上上分区调整周期的工作负载特征的变化满足预设条件。其中,预设值、预设幅度或预设条件可以根据历史的业务经验设置。
例如,上上分区调整周期的工作负载特征表示工作负载以大IO为主,上一分区调整周期的工作负载特征表征工作负载以小IO为主,也即上一分区调整周期的工作负载特征相对于上上分区调整周期的工作负载特征的变化较为显著,可以触发分区调整。
需要说明的是,执行本申请实施例的数据重删方法也可以不执行上述S608。例如,控制器0可以直接触发分区调整,进而根据分区决策建模的结果进行分区更新。
S610:控制器0根据所述结构化特征从建模策略集合中确定目标建模策略。
S612:控制器0根据结构化特征和反馈信息从分区收益评估策略集合中选择目标评估策略。
S614:控制器0根据所述目标评估策略确定所述分区决策模型的目标函数。
S616:控制器0根据所述结构化特征,通过所述目标建模策略和所述目标函数进行分区决策建模,获得分区决策模型。
S618:控制器0根据分区决策模型获得分区收益最大的分区容量组合。
控制器0可以根据多源信息处理所得的结构化特征,通过分区决策建模,并在前期的反馈信息,例如是上一分区调整周期的工作负载特征、重删率等先验知识的辅助下完成分区决策。如图10所示,控制器0可以根据结构化特征,从建模策略集合中进行建模策略选择,以确定目标建模策略,根据上一分区调整周期的工作负载特征、重删率等反馈信息,从分区收益评估策略集合中进行评估策略选择,以确定目标评估策略,根据该目标评估策略可以确定分区决策模型的目标函数,接着控制器0可以基于结构化特征,通过目标建模策略、目标函数进行建模,进而根据建模得到的分区决策模型获得分区收益最大的分区容量组合。其中,分区收益最大的分区容量组合也可以称作分区决策。
为了使得本申请的技术方案更加清楚、易于理解,下面结合示例对建模策略选择、评估策略选择、分区决策建模等过程进行示例说明。
以工作负载较大、系统资源占用比例大的情况为例。
在该示例中,建模策略集合包括:①基于打点分区的建模策略;②基于高斯过程回归的建模策略。基于打点分区的建模策略面向一般场景,也即简单场景,基于高斯过程回归的建模策略面向复杂场景。其中,打点分区是指提供多种预设的分区容量组合,从中选择选择分区收益最大的分区容量组合。
在该示例中,结构化特征向量中工作负载特征反映出该业务场景为简单场景,此外,与CPU占用特征相关的标志位为1,表征CPU等系统资源占用较高,为了避免分区决策建模占用较多的系统资源,控制器0可以选择基于打点分区的建模策略进行建模。
该示例中,假设工作负载的重用距离分布服从正态分布,如下所示:
其中,σ表示方差,μ表示均值。控制器0可以采用多源信息处理所得的结构化特征中重用距离的均值μ重用距离均值和重用距离的方差σ重用距离方差拟合重用距离的概率密度函数,具体参见如下公式:
该示例以提供两个分区进行示例说明。两个分区具体为更新频繁的数据对应的热分区和更新不频繁的数据对应的冷分区,控制器0可以根据基于打点分区的建模策略,设置如下分区组合:
表1分区组合
结合上面公式(5)的重用距离概率密度函数,控制器0可通过积分的方式获得两种类型数据分布的命中率,再通过命中率和数据占比的乘积,获得该种分区方案所获得的重删率,如下所示:
f(P)=P*∫F1+(1-P)*∫F2           (6)
其中,F1和F2分别表示两个分区对应的数据分布,具体可以通过两个分区对应的数据的重用距离的概率密度函数表示,P为一个分区的容量占比。
在该示例中,控制器0可以根据结构化特征,从分区收益评估策略集合中选择“重删率最大化”的 评估策略,作为目标评估策略。也就是,控制器0可以直接以上述公式(6)的函数为目标函数,进行分区决策建模。具体地,控制器0将不同分区组合的参数代入上述目标函数,获得不同分区组合的重删率,控制器0从中选择重删率最大的分区容量组合。
上述示例是直接基于结构化特征进行评估策略选择示例说明。在一些可能的实现方式中,控制器0还可以结合其他约束条件对分区收益的评估策略进行重构。例如,在具体工程实施过程中,前后两次分区大小变化幅度不应过大,否则将会影响分区部署性能,造成不必要的开销。基于此,其他约束条件可以包括最小化分区调整成本。重构的分区收益的评估策略可以是基于重删率和分区调整成本的评估策略。
如图11所示,控制器0可以获取上一分区调整周期的反馈信息,该反馈信息可以包括工作负载特征、重删率中的一种或多种,控制器0可以根据该反馈信息调整分区收益的评估策略。具体地,上一分区调整周期的重删率小于预设值或者下降幅度达到预设幅度,控制器0可以将分区收益的评估策略调整为基于重删率和分区调整成本的评估策略。类似地,上一分区调整周期的工作负载特征相对于上上分区调整周期的工作负载特征的变化满足预设条件,控制器0可以将分区收益的评估策略调整为基于重删率和分区调整成本的评估策略。
与基于重删率的评估策略相比,基于重删率和分区调整成本的评估策略考虑了多种因素,更具有针对性,由此选出的分区容量组合也具有更高的分区收益。
分区收益的评估策略调整为基于重删率和分区调整成本的评估策略时,目标函数可以由上述公式(6)调整为:
f(P)=P*∫F1+(1-P)*∫F2-||P-P当前分区比例||2          (7)
其中,上述S610至S616为控制器0根据上一分区调整周期的工作负载特征,构建所述分区决策模型的一种实现方式。其中,S618为确定分区决策(具体为目标分区组合)的一种实现方式。需要说明的是,控制器0在建模分区决策模型时,也可以不执行上述S612、S614,例如,控制器0可以基于目标建模策略以及默认的目标函数,进行分区决策建模,获得分区决策模型。
S620:控制器0根据分区收益最大的分区容量组合对逆映射表进行分区。
分区容量组合包括不同分区的容量占比,也即不同分区所分配到的资源占比。例如,分区容量组合可以包括热分区所分配到的资源占比和冷分区所分配到的资源占比。控制器0可以根据热分区所分配到的资源占比和冷分区所分配到的资源占比,对逆映射表的系统资源进行分区。
例如,分区容量组合为热分区所分配到的资源占比为20%,冷分区所分配到的资源占比为80%,控制器0可以将逆映射表中20%的存储空间分配至热分区,逆映射表中80%的存储空间分配至冷分区。
S620:控制器0将第一数据块的指纹和逻辑地址写入第一分区。
其中,第一分区是根据第一数据块的特征从多个分区中确定的分区,例如第一数据块对应的指纹具有较高热度时,第一分区可以是热分区,第一数据块对应的指纹具有较低热度时,第一分区可以是冷分区。
控制器0将第一数据块的指纹和逻辑地址写入第一分区的具体实现可以参见图5所示实施例相关内容描述,在此不再赘述。
S622:控制器0确定是否触发重删。若是,则执行S624,若否,则执行S626。
参见图5所示实施例相关内容描述,控制器0可以比较第一数据块的指纹与逆映射表的第一分区中的指纹,当第一分区中存在与第一数据块的指纹相同的指纹,可以触发重删。例如,第一分区中与第一数据块的指纹相同的指纹的数量达到预设阈值时,可以执行S624,以进行重删。
S624:控制器0基于LSM tree进行重删。
控制器0可以通过合并LSM tree的方式对第一分区中的指纹和逻辑地址进行重删,并根据逻辑地址对应的物理地址从硬盘134中重删相应的数据块,例如控制器0可以通过合并LSM tree删除第一分区中第一数据块的指纹和逻辑地址,并根据第一数据块的逻辑地址对应的物理地址,从硬盘134删除第一数据块。基于LSM tree进行重删的具体实现可以参见图5所示实施例中S514至S516相关内容描述,在此不再赘述。
S626:控制器0对逆映射表的至少一个分区中的指纹和逻辑地址进行淘汰。
逆映射表的各个分区可以设置相应的淘汰条件,当逆映射表中的分区满足对应的淘汰条件,则可以对该分区中的指纹和逻辑地址等元数据进行淘汰。具体地,控制器0可以在各分区内部,基于分区容量以及数据块对应的指纹的热度,决策需要淘汰的元数据,例如是非重删元数据的指纹和逻辑地址,部分重删元数据的指纹和逻辑地址,然后将其淘汰出逆映射表,如此可以降低元数据规模,保障系统性能。
S628:控制器0将重删后的指纹和逻辑地址存放于指纹表。
S630:控制器0获取当前分区调整周期的反馈信息,以用于控制器0判断下一分区调整周期是否触发分区调整。
在本实施例中,控制器0可以获取上述反馈信息,从而协助进行分区决策调整,提高分区精度。
需要说明的是,上述S628至S630为本申请实施例的可选步骤,执行本申请实施例的数据重删方法也可以不执行上述步骤。
基于上述内容描述,本申请实施例的数据重删方法,提供一种主动分区机制,将隐式分区变为主动分区,例如将逆映射表分为热分区、冷分区,热分区以及冷分区被分配相应配额的系统资源,其中,更新不频繁的数据能够分配足够的存储空间以支持重删,而更新频繁的数据因为被频繁无效掉,因此分能够配的配额资源就相对较少;这样使得最终可以被重删的数据基本都可以被重删掉而不是被挤占淘汰,获得重删率的提升;同时难以重删的数据能够被淘汰,进而降低整体元数据映射规模,获得系统性能提升。
图6主要以简单场景中,控制器0采用基于打点分区的建模策略进行建模示例说明。在复杂场景中,控制器0也可以采用基于高斯过程回归的建模策略进行建模。下面对复杂场景下的建模过程进行说明。
在机器学习中,机器学习算法通常情况下是根据输入值x预测出一个最佳输出值y,用于分类或回归任务。这种情况将y看作普通的变量。某些情况下,任务并不需要预测出一个函数值,而是给出这个函数值的后验概率分布,记作p(y|x)。此时,函数值y可以视作随机变量。高斯过程回归(Gaussian Process Regression,GPR)即是对表达式未知的函数(也称黑盒函数)的一组函数值进行贝叶斯建模,给出函数值的概率分布。
在利用高斯过程回归建模分区策略模型时,可以将问题描述为在分区配置资源有限的前提下,通过合理分配各分区配置的资源占比,进而获得最大重删率。为了更好的描述问题,本实施例采用分区容量组合S来描述各个分区的资源占比,S为一个n维数组,第i个元素Si表示第i个分区的资源占比,在其他因素不变的前提下,重删率可以被视为由S决定,此时可以定义重删率与分区容量组合之间的关系为f(S)。由于f(S)不可以显式获得,本实施例采用高斯过程回归的建模策略来刻画f(S)。
基于高斯过程回归的建模策略来建模f(S),可以包括如下阶段:
初始化阶段:控制器0随机生成若干个分区容量组合并添加进集合Set。然后控制器0分别将上述分区容量组合应用于存储系统,通过存储系统运行得到各分区容量组合配置下的重删率。通过上述分区容量组合与重删率间的对应关系,初步建立分区容量组合与重删率之间的高斯模型G来刻画f(S)。
迭代更新阶段:高斯模型G推荐出一个分区容量组合,将该分区容量组合添加进集合Set。控制器0将该分区容量组合应用到存储系统,通过存储系统运行得到该配置下的重删率。控制器0将该分区容量组合及其对应的重删率反馈给高斯模型进行模型更新,重复执行迭代更新步骤L次(L为预先设定迭代次数)。
输出阶段:输出集合Set中对应重删率最高的一组分区资源配置。
如此,可以实现复杂场景下基于高斯过程回归的建模策略进行建模,并基于上述高斯模型提供重删率最高的一组分区资源配置,即目标分区组合。
为了使得本申请的技术方案更加清楚、易于理解,下面以本申请实施例的数据重删方法应用于具有全局缓存(global cache)的存储系统进行示例说明。
参见图12所示的数据重删方法的应用场景示意图,针对存算分离的分布式存储系统,至少一台主机上运行的客户端连接多个存储节点。存储节点如Node1、Node2可以启动重删服务进程,以执行数据重删方法。其中,存储节点还可以拉起LUN服务进程,以协助重删服务进程执行数据重删方法。
具体地,客户端可以发送写请求,写请求中包括数据,该数据可以被分为至少一个数据。对于数据块,存储节点可以先在前向映射表中记录数据位置信息后即进行数据落盘操作。然后,存储节点如Node1上的重删服务进程可以执行数据重删方法对磁盘等存储介质中的冗余数据进行识别和重删,从而显著提高用户可用容量,降低用户成本。
在数据落盘的基础上,存储节点采用异步写入的方式,将数据块的指纹和逻辑地址有序插入逆映射表。该逆映射表为本申请引入的指纹元数据管理结构。对于不同更新频度的数据,存储节点采用分别构建LSM Tree的数据结构进行有效管理,该结构能够以表格合并的方式触发重删。在完成重删后,存储节点的重删服务进程将逆映射表中重删后的元数据发送至指纹表中,并相应更新前向映射表(也即LUN地址映射表)中重删数据的物理地址信息为指纹。
接着参见图13所示的数据重删方法应用于具有全局缓存的存储系统的流程示意图,全局缓存的上层具有写缓存和读缓存。其中,写缓存可以用于LBA的热度统计,无需额外设计统计模块进行热度统计。
如图13所示,客户端可以通过网络发送写请求,该写请求经过全局缓存的服务端适配层到达写缓存,重删服务进程可以将写请求中数据进行分块,然后计算各个数据块的指纹。接着,重删服务进程查询指纹表。若数据块的指纹在指纹表中命中,则表明存储设备中存储具有相同指纹的数据块,重删服务进程可以直接返回写响应。若数据块的指纹在指纹表中未命中,则表明存储设备中未存储具有相同指纹的数据块,重删服务进程可以将数据块写入硬盘等存储设备,然后返回写响应。
重删服务进程还可以将数据块的指纹与逻辑地址有序插入逆映射表。此外,重删服务进程还可以通过批量获取的方式,从写缓存获取LBA的热度,根据LBA的热度更新数据块对应的指纹的热度。重删服务进程还可以获取重用距离、重用周期等工作负载特征,以及获取系统资源使用信息,并根据系统资源使用信息获取系统资源特征,重删服务进程基于上述特征进行通用特征处理,获得结构化特征。
重删服务进程根据上述结构化特征,进行分区决策建模,进而根据建模得到的分区决策模型确定分区收益最大的分区容量组合。重删服务进程基于该分区容量组合对逆映射表的系统资源进行分区,具体是按照热分区所分配到的资源占比和冷分区所分配到的资源占比,对逆映射表的系统资源进行分区。重删服务进程在不同分区分别构建LSM Tree的数据结构,以实现对指纹、逻辑地址等元数据进行有效管理。例如,重删服务进程可以根据LSM tree通过表格合并的方式触发重删。重删服务进程还可以根据分区容量和数据块对应的指纹的热度,决策需要淘汰的元数据,并将其从逆映射表的LSM tree中淘汰,由此缩减元数据规模。
在该示例中,重删服务进程还可以在完成重删后,将逆映射表中重删后的元数据发送至指纹表中,并通过LUN服务进程相应更新地址映射表中重删数据对应的物理地址为指纹。进一步地,重删服务进程还可以获取反馈信息,以便基于该反馈信息,在后续阶段如下一分区调整周期确定是否触发分区调整,由此可以实现更精确地分区,进而实现重删率的提升。
基于本申请实施例提供的数据重删方法,本申请实施例还提供了一种数据重删装置。接下来,从功能模块化的角度,结合附图对本申请实施例的数据重删装置进行介绍。
参见图14所示的数据重删装置的结构示意图,该装置1400包括:
通信模块1402,用于接收写请求,所述写请求中包括第一数据块;
写数据模块1404,用于将所述第一数据块写入存储设备;
所述写数据模块1404,还用于将所述第一数据块的元数据写入元数据管理结构的多个分区中的第一分区,所述第一分区为根据所述第一数据块的特征确定的,所述第一数据块的元数据包括所述第一数据块的指纹及地址信息;
重删模块1406,用于在所述第一分区中存在与所述第一数据块的指纹相同的指纹时,删除所述第一分区中所述第一数据块的元数据,并根据所述第一数据块的地址信息从所述存储设备中删除所述第一数据块。
其中,通信模块1402可以用于实现图5所示实施例中S502相关内容描述。写数据模块1404用于实现图5所示实施例中S510相关内容描述,写数据模块1404还用于实现图5所示实施例中S512相关 内容描述。重删模块1406用于实现图5所示实施例中S514、S516相关内容描述。
在一些可能的实现方式中,所述第一数据块的特征为所述第一数据块对应的指纹,所述写数据模块1404具体用于:
确定所述第一数据块对应的指纹的热度;
根据所述第一数据块对应的指纹的热度确定所述热度对应的所述元数据管理结构的多个分区中的所述第一分区;
将所述第一数据块的元数据写入所述第一分区。
其中,写数据模块1404确定第一数据块对应的指纹的热度,基于该热度确定第一分区,并将元数据写入第一分区的实现可以参见图5所示实施例中S506、S512相关内容描述,在此不再赘述。
在一些可能的实现方式中,所述写数据模块1404还用于:
在将所述第一数据块的元数据写入所述第一分区之后,确定所述第一数据块的逻辑地址的热度;
将所述逻辑地址的热度累加至所述第一数据块对应的指纹的热度,以更新所述第一数据块对应的指纹的热度。
其中,写数据模块1404更新热度的实现可以参见图5所示实施例中相关内容描述,在此不再赘述。
在一些可能的实现方式中,所述写请求中还包括第二数据块,所述第二数据块对应的指纹的热度高于所述第一数据块对应的指纹的热度,所述写数据模块1404还用于:
将所述第二数据块写入所述存储设备,将所述第二数据块的元数据写入所述元数据管理结构的多个分区中的第二分区,所述第二分区的容量小于所述第一分区的容量。
其中,写数据模块1404写入第二数据块以及第二数据块的元数据的具体实现可以参考写入第一数据块及其元数据的相关内容描述,在此不再赘述。
在一些可能的实现方式中,所述元数据管理结构的多个分区的容量根据分区决策模型确定,所述分区决策模型用于预测预设的分区容量组合中每个分区容量组合应用于所述元数据管理结构后对应的分区收益,并确定分区收益最大的分区容量组合作为所述元数据管理结构的多个分区的容量,所述分区收益根据重删率和分区调整成本中的至少一个确定。
在一些可能的实现方式中,分区收益为重删率,预设的分区容量组合包括第一分区容量组合。所述装置1400还包括分区模块1408,所述分区模块1408用于通过如下方式预测所述第一分区容量组合应用于所述元数据管理结构后对应的重删率:
获取所述第一分区容量组合应用于所述元数据管理结构所形成的多个分区中各个分区对应的工作负载特征;
根据所述各个分区对应的工作负载特征,获得所述各个分区对应的数据分布;
根据所述各个分区对应的数据分布以及所述各个分区的容量,获得所述重删率。
其中,分区模块1408构建分区决策模型的具体实现可以参见图6所示实施例中S610至S616相关内容描述,在此不再赘述。
在一些可能的实现方式中,所述装置1400还包括分区模块1408,所述分区模块1408用于:
周期性地调整所述元数据管理结构的多个分区的容量;
当到达调整时刻时,根据所述调整时刻前的周期对应的分区收益、分区容量组合或各个分区对应的工作负载特征,确定是否调整所述多个分区的容量。
在一些可能的实现方式中,所述重删模块1406具体用于:
在所述第一分区中存在与所述第一数据块的指纹相同的指纹,且所述第一分区中的所述指纹的数量达到预设阈值时,删除所述第一数据块的元数据,并根据所述第一数据块的地址信息从所述存储设备中删除所述第一数据块。
其中,重删模块1406用于实现图5所示实施例中S514、S516相关内容描述。在此不再赘述。
在一些可能的实现方式中,所述第一数据块的地址信息为所述第一数据块的逻辑地址,所述写数据模块1404还用于:
将所述第一数据块的逻辑地址和物理地址写入地址映射表;
所述重删模块1406具体用于:
根据所述第一数据块的逻辑地址,从所述地址映射表中获取所述第一数据块的物理地址;
根据所述物理地址从所述存储设备中找到所述第一数据块,并删除所述第一数据块。
其中,写数据模块1404还用于实现图5所示实施例中S518相关内容描述。在此不再赘述。
在一些可能的实现方式中,所述重删模块1406还用于:
在删除所述第一分区中的所述第一数据块的元数据之后,将所述地址映射表中所述第一数据块的物理地址修改为所述第一数据块的指纹。
其中,重删模块1406还用于实现图5所示实施例中S518相关内容描述。在此不再赘述。
在一些可能的实现方式中,所述装置1400还包括:
淘汰模块1409,用于当所述逆映射表中的至少一个分区满足淘汰条件,对所述至少一个分区中的所述元数据进行淘汰。
其中,淘汰模块1409还用于实现图5所示实施例中S520相关内容描述。在此不再赘述。
根据本申请实施例的数据重删装置1400可对应于执行本申请实施例中描述的方法,并且数据重删装置1400的各个模块/单元的上述和其它操作和/或功能分别为了实现图5、图6所示实施例中的各个方法的相应流程,为了简洁,在此不再赘述。
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备或计算设备集群(例如是存储系统)执行上述数据重删方法。
本申请实施例还提供了一种计算机程序产品。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机或数据中心进行传输。所述计算机程序产品可以为一个软件安装包,在需要使用前述数据重删方法的任一方法的情况下,可以下载该计算机程序产品并在计算设备或计算设备集群上执行该计算机程序产品。
上述各个附图对应的流程或结构的描述各有侧重,某个流程或结构中没有详述的部分,可以参见其他流程或结构的相关描述。

Claims (25)

  1. 一种数据重删方法,其特征在于,所述方法包括:
    接收写请求,所述写请求中包括第一数据块;
    将所述第一数据块写入存储设备;
    将所述第一数据块的元数据写入元数据管理结构的多个分区中的第一分区,所述第一分区为根据所述第一数据块的特征确定的,所述第一数据块的元数据包括所述第一数据块的指纹及地址信息;
    在所述第一分区中存在与所述第一数据块的指纹相同的指纹时,删除所述第一分区中所述第一数据块的元数据,并根据所述第一数据块的地址信息从所述存储设备中删除所述第一数据块。
  2. 根据权利要求1所述的方法,其特征在于,所述第一数据块的特征为所述第一数据块对应的指纹,所述将所述第一数据块的元数据写入元数据管理结构的多个分区中的第一分区,包括:
    确定所述第一数据块对应的指纹的热度;
    根据所述第一数据块对应的指纹的热度确定所述热度对应的所述元数据管理结构的多个分区中的所述第一分区;
    将所述第一数据块的元数据写入所述第一分区。
  3. 根据权利要求2所述的方法,其特征在于,在将所述第一数据块的元数据写入所述第一分区之后,所述方法还包括:
    确定所述第一数据块的逻辑地址的热度;
    将所述逻辑地址的热度累加至所述第一数据块对应的指纹的热度,以更新所述第一数据块对应的指纹的热度。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述写请求中还包括第二数据块,所述第二数据块对应的指纹的热度高于所述第一数据块对应的指纹的热度,所述方法还包括:
    将所述第二数据块写入所述存储设备,将所述第二数据块的元数据写入所述元数据管理结构的多个分区中的第二分区,所述第二分区的容量小于所述第一分区的容量。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述元数据管理结构的多个分区的容量根据分区决策模型确定,所述分区决策模型用于预测预设的分区容量组合中每个分区容量组合应用于所述元数据管理结构后对应的分区收益,并确定分区收益最大的分区容量组合作为所述元数据管理结构的多个分区的容量,所述分区收益根据重删率和分区调整成本中的至少一个确定。
  6. 根据权利要求5所述的方法,其特征在于,所述分区收益为重删率,所述预设的分区容量组合包括第一分区容量组合,所述分区决策模型通过如下方式预测所述第一分区容量组合应用于所述元数据管理结构后对应的重删率:
    获取所述第一分区容量组合应用于所述元数据管理结构所形成的多个分区中各个分区对应的工作负载特征;
    根据所述各个分区对应的工作负载特征,获得所述各个分区对应的数据分布;
    根据所述各个分区对应的数据分布以及所述各个分区的容量,获得所述重删率。
  7. 根据权利要求5或6所述的方法,其特征在于,所述方法还包括:
    周期性地调整所述元数据管理结构的多个分区的容量;
    当到达调整时刻时,根据所述调整时刻前的周期对应的分区收益、分区容量组合或各个分区对应的工作负载特征,确定是否调整所述多个分区的容量。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述在所述第一分区中存在与所述第一数据块的指纹相同的指纹时,删除所述第一分区中所述第一数据块的元数据,并根据所述第一数据块的地址信息从所述存储设备中删除所述第一数据块,包括:
    在所述第一分区中存在与所述第一数据块的指纹相同的指纹,且所述第一分区中的所述指纹的数量达到预设阈值时,删除所述第一数据块的元数据,并根据所述第一数据块的地址信息从所述存储设备中删除所述第一数据块。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,所述第一数据块的地址信息为所述第一数据块的逻辑地址,所述方法还包括:
    将所述第一数据块的逻辑地址和物理地址写入地址映射表;
    所述根据所述第一数据块的地址信息从所述存储设备中删除所述第一数据块,包括:
    根据所述第一数据块的逻辑地址,从所述地址映射表中获取所述第一数据块的物理地址;
    根据所述物理地址从所述存储设备中找到所述第一数据块,并删除所述第一数据块。
  10. 根据权利要求9所述的方法,其特征在于,在删除所述第一分区中的所述第一数据块的元数据之后,所述方法还包括:
    将所述地址映射表中所述第一数据块的物理地址修改为所述第一数据块的指纹。
  11. 根据权利要求9所述的方法,其特征在于,所述方法还包括:
    当所述元数据管理结构中的至少一个分区满足淘汰条件,对所述至少一个分区中的所述元数据进行淘汰。
  12. 一种数据重删装置,其特征在于,所述装置包括:
    通信模块,用于接收写请求,所述写请求中包括第一数据块;
    写数据模块,用于将所述第一数据块写入存储设备;
    所述写数据模块,还用于将所述第一数据块的元数据写入元数据管理结构的多个分区中的第一分区,所述第一分区为根据所述第一数据块的特征确定的,所述第一数据块的元数据包括所述第一数据块的指纹及地址信息;
    重删模块,用于在所述第一分区中存在与所述第一数据块的指纹相同的指纹时,删除所述第一分区中所述第一数据块的元数据,并根据所述第一数据块的地址信息从所述存储设备中删除所述第一数据块。
  13. 根据权利要求12所述的装置,其特征在于,所述第一数据块的特征为所述第一数据块对应的指纹,所述写数据模块具体用于:
    确定所述第一数据块对应的指纹的热度;
    根据所述第一数据块对应的指纹的热度确定所述热度对应的所述元数据管理结构的多个分区中的所述第一分区;
    将所述第一数据块的元数据写入所述第一分区。
  14. 根据权利要求13所述的装置,其特征在于,所述写数据模块还用于:
    在将所述第一数据块的元数据写入所述第一分区之后,确定所述第一数据块的逻辑地址的热度;
    将所述逻辑地址的热度累加至所述第一数据块对应的指纹的热度,以更新所述第一数据块对应的指纹的热度。
  15. 根据权利要求12至14任一项所述的装置,其特征在于,所述写请求中还包括第二数据块,所述第二数据块对应的指纹的热度高于所述第一数据块对应的指纹的热度,所述写数据模块还用于:
    将所述第二数据块写入所述存储设备,将所述第二数据块的元数据写入所述元数据管理结构的多个分区中的第二分区,所述第二分区的容量小于所述第一分区的容量。
  16. 根据权利要求12至15任一项所述的装置,其特征在于,所述元数据管理结构的多个分区的容量根据分区决策模型确定,所述分区决策模型用于预测预设的分区容量组合中每个分区容量组合应用于所述元数据管理结构后对应的分区收益,并确定分区收益最大的分区容量组合作为所述元数据管理结构的多个分区的容量,所述分区收益根据重删率和分区调整成本中的至少一个确定。
  17. 根据权利要求16所述的装置,其特征在于,所述分区收益为重删率,所述预设的分区容量组合包括第一分区容量组合,所述分区决策模型通过如下方式预测所述第一分区容量组合应用于所述元数据管理结构后对应的重删率:
    获取所述第一分区容量组合应用于所述元数据管理结构所形成的多个分区中各个分区对应的工作负载特征;
    根据所述各个分区对应的工作负载特征,获得所述各个分区对应的数据分布;
    根据所述各个分区对应的数据分布以及所述各个分区的容量,获得所述重删率。
  18. 根据权利要求16或17所述的装置,其特征在于,所述装置还包括分区模块,所述分区模块用于:
    周期性地调整所述元数据管理结构的多个分区的容量;
    当到达调整时刻时,根据所述调整时刻前的周期对应的分区收益、分区容量组合或各个分区对应的工作负载特征,确定是否调整所述多个分区的容量。
  19. 根据权利要求12至18任一项所述的装置,其特征在于,所述重删模块具体用于:
    在所述第一分区中存在与所述第一数据块的指纹相同的指纹,且所述第一分区中的所述指纹的数量达到预设阈值时,删除所述第一数据块的元数据,并根据所述第一数据块的地址信息从所述存储设备中删除所述第一数据块。
  20. 根据权利要求12至19任一项所述的装置,其特征在于,所述第一数据块的地址信息为所述第一数据块的逻辑地址,所述写数据模块还用于:
    将所述第一数据块的逻辑地址和物理地址写入地址映射表;
    所述重删模块具体用于:
    根据所述第一数据块的逻辑地址,从所述地址映射表中获取所述第一数据块的物理地址;
    根据所述物理地址从所述存储设备中找到所述第一数据块,并删除所述第一数据块。
  21. 根据权利要求20所述的装置,其特征在于,所述重删模块还用于:
    在删除所述第一分区中的所述第一数据块的元数据之后,将所述地址映射表中所述第一数据块的物理地址修改为所述第一数据块的指纹。
  22. 根据权利要求20所述的装置,其特征在于,所述装置还包括:
    淘汰模块,用于当所述逆映射表中的至少一个分区满足淘汰条件,对所述至少一个分区中的所述元数据进行淘汰。
  23. 一种存储系统,其特征在于,所述存储系统包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储有计算机可读指令;所述至少一个处理器执行所述计算机可读指令,以使得所述存储系统执行如权利要求1至11中任一项所述的方法。
  24. 一种计算机可读存储介质,其特征在于,包括计算机可读指令;所述计算机可读指令用于实现权利要求1至11中任一项所述的方法。
  25. 一种计算机程序产品,其特征在于,包括计算机可读指令;所述计算机可读指令用于实现权利要求1至11中任一项所述的方法。
PCT/CN2023/101303 2022-06-24 2023-06-20 一种数据重删方法及相关系统 WO2023246754A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210730080 2022-06-24
CN202210730080.0 2022-06-24
CN202211132110.4A CN117331487A (zh) 2022-06-24 2022-09-16 一种数据重删方法及相关系统
CN202211132110.4 2022-09-16

Publications (1)

Publication Number Publication Date
WO2023246754A1 true WO2023246754A1 (zh) 2023-12-28

Family

ID=89277971

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101303 WO2023246754A1 (zh) 2022-06-24 2023-06-20 一种数据重删方法及相关系统

Country Status (2)

Country Link
CN (1) CN117331487A (zh)
WO (1) WO2023246754A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117931092B (zh) * 2024-03-20 2024-05-24 苏州元脑智能科技有限公司 数据重删调整方法、装置、设备、存储系统及存储介质
CN118466855B (zh) * 2024-07-10 2024-09-27 合肥康芯威存储技术有限公司 一种存储芯片及其数据压缩方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514250A (zh) * 2013-06-20 2014-01-15 易乐天 一种全局重复数据删除的方法和系统及存储装置
CN107329692A (zh) * 2017-06-07 2017-11-07 杭州宏杉科技股份有限公司 一种数据重删的方法及存储设备
CN110618789A (zh) * 2019-08-14 2019-12-27 华为技术有限公司 一种重复数据的删除方法及装置
US20200133547A1 (en) * 2018-10-31 2020-04-30 EMC IP Holding Company LLC Applying deduplication digests to avoid same-data writes
CN111381779A (zh) * 2020-03-05 2020-07-07 深信服科技股份有限公司 数据处理方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514250A (zh) * 2013-06-20 2014-01-15 易乐天 一种全局重复数据删除的方法和系统及存储装置
CN107329692A (zh) * 2017-06-07 2017-11-07 杭州宏杉科技股份有限公司 一种数据重删的方法及存储设备
US20200133547A1 (en) * 2018-10-31 2020-04-30 EMC IP Holding Company LLC Applying deduplication digests to avoid same-data writes
CN110618789A (zh) * 2019-08-14 2019-12-27 华为技术有限公司 一种重复数据的删除方法及装置
CN111381779A (zh) * 2020-03-05 2020-07-07 深信服科技股份有限公司 数据处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN117331487A (zh) 2024-01-02

Similar Documents

Publication Publication Date Title
WO2023246754A1 (zh) 一种数据重删方法及相关系统
KR102457611B1 (ko) 터넌트-어웨어 스토리지 쉐어링 플랫폼을 위한 방법 및 장치
CN106066896B (zh) 一种应用感知的大数据重复删除存储系统及方法
US10374792B1 (en) Layout-independent cryptographic stamp of a distributed dataset
CN110908589B (zh) 数据文件的处理方法、装置、系统和存储介质
US11625169B2 (en) Efficient token management in a storage system
CN111708719B (zh) 计算机存储加速方法、电子设备及存储介质
US11620263B2 (en) Data compression using dictionaries
CN107888687B (zh) 一种基于分布式存储系统的代理客户端存储加速方法及系统
JP2016511478A (ja) データ重複排除における、類似性探索に基づくダイジェスト検索
CN111159176A (zh) 一种海量流数据的存储和读取的方法和系统
CN113867627A (zh) 一种存储系统性能优化方法及系统
US20210397360A1 (en) Regulating storage device rebuild rate in a storage system
WO2023045492A1 (zh) 一种数据预取方法、计算节点和存储系统
US20210342271A1 (en) Cache retention for inline deduplication based on number of physical blocks with common fingerprints among multiple cache entries
CN117312256B (zh) 文件系统、操作系统和电子设备
CN117312261B (zh) 文件的压缩编码方法、装置存储介质及电子设备
US20240070120A1 (en) Data processing method and apparatus
CN116414828A (zh) 一种数据管理方法及相关装置
CN112799590B (zh) 一种针对在线主存储重删的差异化缓存方法
US20240126722A1 (en) Doctrine of MEAN: Realizing High-performance Data Caching at Unreliable Edge
CN117312260A (zh) 文件的压缩存储方法、装置、存储介质和电子设备
WO2022267508A1 (zh) 元数据压缩方法及装置
WO2023050856A1 (zh) 数据处理方法及存储系统
EP4318257A1 (en) Method and apparatus for processing data, reduction server, and mapping server

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23826401

Country of ref document: EP

Kind code of ref document: A1