WO2023040200A1 - Data deduplication method and system, and storage medium and device - Google Patents
Data deduplication method and system, and storage medium and device Download PDFInfo
- Publication number
- WO2023040200A1 WO2023040200A1 PCT/CN2022/078324 CN2022078324W WO2023040200A1 WO 2023040200 A1 WO2023040200 A1 WO 2023040200A1 CN 2022078324 W CN2022078324 W CN 2022078324W WO 2023040200 A1 WO2023040200 A1 WO 2023040200A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- metadata
- unit
- fingerprint value
- unit data
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000013507 mapping Methods 0.000 claims abstract description 46
- 230000004044 response Effects 0.000 claims abstract description 38
- 230000015654 memory Effects 0.000 claims description 29
- 238000007726 management method Methods 0.000 description 41
- 238000010586 diagram Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 5
- 238000013500 data storage Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
Definitions
- the present application relates to a data deduplication method, system, storage medium and equipment.
- Metadata refers to the data describing data (data about data), which can be understood as data with a wider range of data than the general meaning, not only indicating the type, name, value and other information of the data, but also further providing The context information of the data, such as the domain to which the data belongs, the source of the data, and so on.
- metadata is the basis of information storage and the smallest unit of data.
- querying and analyzing the data content and meaning of the data can make more effective use of the data.
- the efficient organization and management of metadata is an effective means to solve this problem, and can support the management and maintenance of data by the system. Therefore, data becomes more valuable only when metadata is managed effectively.
- All-flash storage is a storage system based on all-flash arrays. It is an independent storage array or device completely composed of solid-state storage media. The main difference from traditional hard disk storage is higher performance, faster and more stable data processing.
- the data online deduplication feature is the most important and necessary feature, because the back-end of the all-flash storage system uses solid-state drives as storage media.
- all-flash storage systems require data to be online.
- Deduplication is used to reduce the actual storage space of the backend disk. To realize the online deduplication of the all-flash storage system, metadata management is very important.
- Metadata management mainly manages L-P (LBA ⁇ PBA) mapping, P-L (PBA ⁇ LBA) mapping, and H-P (HASHKEY ⁇ PBA) mapping relationship.
- LBA Logical Block Address
- PBA Physical Block Address
- HASHKEY represents the hash value.
- metadata management has more metadata of P-L mapping and H-P mapping relationship, which involves a large amount of data access with high concurrency and short delay, which puts more pressure on metadata management. big.
- the embodiment of the present application provides a data deduplication method, including the following steps:
- first metadata including the mapping relationship between the original physical address and the fingerprint value of the unit data is established, and the first metadata is stored in the metadata management module for deduplication of the unit data.
- judging whether the unit data has been deduplicated based on the fingerprint value and/or logical address of the unit data includes:
- the method also includes:
- a second metadata group including the mapping relationship between the original physical address and the logical address of the unit data is established, and the second metadata group is stored in the metadata management module.
- the method also includes:
- the method also includes:
- the garbage collection module In response to storing both the first metadata and the second metadata group in the metadata management module, the garbage collection module is notified to perform garbage collection on the unit of data.
- establishing the second metadata group including the mapping relationship between the original physical address and the logical address of the unit data includes:
- Metadata containing key-value pairs pointing to the logical address of the unit data from the original physical address and metadata containing key-value pairs pointing to the original physical address from the logical address of the unit data are established.
- the method also includes:
- third metadata including a mapping relationship between the fingerprint value of the unit data and its physical address
- fourth metadata including a mapping relationship between the logical address of the unit data and its physical address
- Another aspect of the present application also provides a data deduplication system, including:
- the storage space judging module is configured to respond to the host giving up the deduplication data operation and write the data in the unit of granularity to the hard disk, calculate the fingerprint value of the data, and write the fingerprint value and its logical address of the data into the hard disk, and Determine whether the occupied storage space in the hard disk reaches a preset threshold;
- the deduplication judging module is configured to obtain the unit data from the hard disk in response to the occupied storage space reaching a preset threshold, and judge whether the unit data has been deduplicated based on the fingerprint value and/or logical address of the unit data;
- the original metadata query module is configured to obtain the fingerprint value of the unit data in response to the fact that the unit data has not been deduplicated, and query whether there is an original physical address that has a mapping relationship with the fingerprint value of the unit data through the metadata management module.
- the data deduplication module is configured to, in response to the existence of the original metadata, establish the first metadata including the mapping relationship between the original physical address and the fingerprint value of the unit data, and store the first metadata in the metadata management module for unit data Data deduplication operation.
- one or more non-volatile computer-readable storage media storing computer-readable instructions is also provided.
- the computer-readable instructions are executed by one or more processors, one or more The processor executes the steps of the data deduplication method in any of the foregoing embodiments.
- a computer device including a memory and one or more processors, where computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the one or more processors, a or multiple processors execute the steps of the data deduplication method in any of the foregoing embodiments.
- FIG. 1 is a schematic diagram of a data deduplication method provided according to one or more embodiments of the present application
- FIG. 2 is a schematic diagram of a data deduplication system provided according to one or more embodiments of the present application
- FIG. 3 is a schematic diagram of a non-volatile computer-readable storage medium for implementing a data deduplication method provided according to one or more embodiments of the present application;
- Fig. 4 is a schematic diagram of a hardware structure of a computer device for performing a data deduplication method according to one or more embodiments of the present application.
- FIG. 1 is a schematic diagram of a data deduplication method provided by one or more embodiments of the present application.
- the application of the method to computer equipment is taken as an example for illustration.
- the embodiment of this application may include the following steps:
- Step S10 in response to the host giving up the operation of deduplicating data and writing the data in the unit of granularity to the hard disk, calculating the fingerprint value of the data, and writing the fingerprint value of the data and its logical address into the hard disk, and judging the hard disk Whether the occupied storage space reaches the preset threshold;
- Step S20 in response to the occupied storage space reaching a preset threshold, acquiring the unit data from the hard disk, and judging whether the unit data has been deduplicated based on the fingerprint value and/or logical address of the unit data;
- Step S30 Obtain the fingerprint value of the unit data in response to the fact that the unit data has not been deduplicated, and query whether there is original metadata including the original physical address that has a mapping relationship with the fingerprint value of the unit data through the metadata management module;
- Step S40 in response to the existence of the original metadata, establish the first metadata including the mapping relationship between the original physical address and the fingerprint value of the unit data, and store the first metadata in the metadata management module to perform the deduplication operation of the unit data .
- the deduplication operation does not mean deduplicating data as it is literally displayed.
- the deduplication operation refers to querying the HP (HASHKEY ⁇ PBA) mapping for the data newly written to the hard disk.
- HP HP (HASHKEY ⁇ PBA) mapping for the data newly written to the hard disk.
- HASHKEY fingerprint value
- PBA physical address
- a piece of data that has not undergone a deduplication operation not only corresponds to a logical address (LBA) of a storage volume, but also has a physical address (PBA) corresponding to a storage pool.
- the fingerprint value is the hash value, which is mainly used as the unique identifier of the data content. If the content of two data is the same, the fingerprint value of the two is also the same, but the logical address and physical address of the two are not necessarily the same.
- the granularity represents the minimum capacity unit of data, and data is written according to the granularity unit.
- the unit data in the embodiment of the present application means the data of one granularity unit.
- the metadata is stored in the metadata management module.
- the fingerprint value H of the unit of data is obtained, and the metadata management module is used to query whether it contains the fingerprint value H
- the original metadata of the original physical address P0 of the mapping relationship if such original metadata exists, establish the first metadata including the mapping relationship between the original physical address P0 and the fingerprint value H of the unit data, and store the first metadata to the metadata management module. Assuming that the fingerprint value in the original metadata is H0, and this unit of data has its own physical address P, then H and H0 are the same, which means that the data content of the two is the same.
- H and P0 forms a mapping relationship, and stores the first metadata containing the mapping relationship into the metadata management module.
- the original metadata in the metadata management module is the same as the first metadata, and it is possible to avoid the metadata of the same content through the metadata. Data is called repeatedly.
- the data deduplication method in the embodiment of the present application obtains the fingerprint value of the unit data that has not been deduplicated, and when the original metadata containing the mapping relationship between the fingerprint value of the unit data and the corresponding original physical address is found, Establish the first metadata including the mapping relationship between the original physical address and the fingerprint value of the unit data, and store the first metadata in the metadata management module, which avoids the impact on the performance of the storage system while realizing online deduplication. In this way, the requirements of the overall deduplication rate of the storage system are met, which is efficient and accurate; and the concurrency of access is improved by setting the metadata management module to obtain efficient metadata access.
- judging whether the unit data has been deduplicated based on the fingerprint value and/or logical address of the unit data includes: judging whether the unit data has a corresponding fingerprint value and logical address; If the unit data has a corresponding fingerprint value and no corresponding logical address, it is confirmed that the unit data has been deduplicated.
- the method further includes: in response to the existence of the original metadata, establishing a second metadata group including the mapping relationship between the original physical address and the logical address of the unit data, and storing the second metadata group in Metadata management module.
- the method further includes: invalidating the logical address of the unit of data in response to storing the second metadata group into the metadata management module.
- the method further includes: in response to storing both the first metadata and the second metadata group in the metadata management module, informing the garbage collection module to perform garbage collection on the unit of data.
- a garbage collection mechanism is activated for the unit of data.
- establishing the second metadata group including the mapping relationship between the original physical address and the logical address of the unit data includes: establishing metadata including key-value pairs pointing to the logical address of the unit data from the original physical address , and metadata containing key-value pairs pointing from the logical address of the unit data to the original physical address.
- two metadata of two key-value pairs including the mapping relationship between the original physical address and the logical address of the unit data are stored in the metadata management module.
- the method further includes: in response to the absence of original metadata, establishing third metadata including the mapping relationship between the fingerprint value of the unit data and its physical address, and establishing the logical address and The fourth metadata of the mapping relation of the physical address, and store the third metadata and the fourth metadata in the metadata management module.
- the unit data is new data, and there is no data with the same content, and naturally there will be no exceptions. Duplicate operation. Therefore, the corresponding mapping relationship is established for the fingerprint value, physical address, and logical address of the unit data itself, and the third metadata and fourth metadata with the corresponding mapping relationship are stored in the metadata management module, so that other When there is new data, it can be known whether there is a fingerprint value consistent with the data content of the unit by querying the metadata management module.
- the fourth metadata in this embodiment includes the metadata of the key-value pair pointing to the physical address from the logical address of the unit data, and also includes the metadata of the key-value pair pointing to the logical address from the physical address of the unit.
- steps in the flow chart of FIG. 1 are displayed sequentially as indicated by the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in Fig. 1 may include multiple sub-steps or multiple stages, these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, the execution of these sub-steps or stages The order is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
- a data deduplication system includes: a storage space judging module 10, configured to respond to the host giving up the deduplication data operation and write the data in granularity to the hard disk, calculate the fingerprint value of the data, And write the fingerprint value of the data and its logical address into the hard disk, and judge whether the occupied storage space in the hard disk reaches the preset threshold; the deduplication judging module 20 is configured to respond to the occupied storage space reaching the preset threshold, Acquire the unit data from the hard disk, and judge whether the unit data has been deduplicated based on the fingerprint value and/or logical address of the unit data; the original metadata query module 30 is configured to acquire the unit in response to the unit data not performing a deduplication operation The fingerprint value of the data, and whether there is original metadata containing
- the deduplication judging module 20 includes a fingerprint value and logical address judging module configured to judge whether the unit data has a corresponding fingerprint value and logical address; Address, to confirm that the unit data has not been deduplicated; in response to the unit data having a corresponding fingerprint value and no corresponding logical address, confirming that the unit data has been deduplicated.
- the system further includes a second metadata group storage module configured to, in response to the existence of original metadata, establish a second metadata group including a mapping relationship between the original physical address and the logical address of the unit data , and store the second metadata group in the metadata management module.
- a second metadata group storage module configured to, in response to the existence of original metadata, establish a second metadata group including a mapping relationship between the original physical address and the logical address of the unit data , and store the second metadata group in the metadata management module.
- system further includes a logical address invalidation module configured to invalidate the logical address of the unit of data in response to storing the second metadata group into the metadata management module.
- system further includes a garbage collection module configured to notify the garbage collection module to perform garbage collection on the unit data in response to storing both the first metadata and the second metadata group in the metadata management module .
- the second metadata group storage module includes a key-value pair module configured to create metadata containing a key-value pair of a logical address pointing to a unit of data from an original physical address, and containing The metadata of the key-value pair whose logical address points to the original physical address.
- the system further includes a metadata storage module configured to, in response to the absence of original metadata, establish third metadata including the mapping relationship between the fingerprint value of the unit data and its physical address, and establish The fourth metadata of the mapping relationship between the logical address of the unit data and its physical address is included, and the third metadata and the fourth metadata are stored in the metadata management module.
- a metadata storage module configured to, in response to the absence of original metadata, establish third metadata including the mapping relationship between the fingerprint value of the unit data and its physical address, and establish The fourth metadata of the mapping relationship between the logical address of the unit data and its physical address is included, and the third metadata and the fourth metadata are stored in the metadata management module.
- Each module in the above-mentioned data deduplication device can be fully or partially realized by software, hardware and a combination thereof.
- the above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.
- FIG. 3 shows the A schematic diagram of a non-volatile computer-readable storage medium implementing a data deduplication method. As shown in FIG. 3 , the non-volatile computer-readable storage medium 3 stores computer-readable instructions 31 . When the computer-readable instructions 31 are executed by the processor, the method in any one of the above-mentioned embodiments is realized.
- the fourth aspect of the embodiments of the present application also provides a computer device, including a memory 402 and one or more processors 401 as shown in FIG.
- a computer device including a memory 402 and one or more processors 401 as shown in FIG.
- FIG. 4 it is a schematic diagram of a hardware structure of an embodiment of a computer device performing a data deduplication method provided by one or more embodiments of the present application.
- the computer equipment includes a processor 401 and a memory 402 , and may further include: an input device 403 and an output device 404 .
- the processor 401, the memory 402, the input device 403, and the output device 404 may be connected via a bus or in other ways. In FIG. 4, connection via a bus is taken as an example.
- the input device 403 can receive input numbers or character information, and generate key signal input related to user settings and function control of the data deduplication system.
- the output device 404 may include a display device such as a display screen.
- the memory 402 as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs and modules, such as the computer corresponding to the data deduplication method in the embodiment of this application Readable directives/modules.
- the memory 402 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created by using the data deduplication method, and the like.
- the memory 402 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage devices.
- the memory 402 may optionally include memory that is remotely located relative to the processor 401, and these remote memories may be connected to the local module through a network.
- Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
- the processor 401 executes various functional applications and data processing of the server by running non-volatile software programs, instructions and modules stored in the memory 402, that is, implements the data deduplication method in the above method embodiment.
- FIG. 4 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the equipment to which the solution of the application is applied.
- the specific equipment may include More or fewer components are shown in the figures, or certain components are combined, or have different component arrangements.
- nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory memory.
- Volatile memory can include random access memory (RAM), which can act as external cache memory.
- RAM is available in various forms such as Synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM).
- DRAM Synchronous RAM
- DRAM Dynamic RAM
- SDRAM Synchronous DRAM
- DDR SDRAM Double Data Rate SDRAM
- ESDRAM Enhanced SDRAM
- SLDRAM Synchronous Link DRAM
- DRRAM Direct Rambus RAM
- Storage devices of the disclosed aspects are intended to include, but are not limited to, these and other suitable types of memory.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Provided in the present application are a data deduplication method and system, and a storage medium and a device. The method comprises: in response to a host abandoning a data deduplication operation and writing, into a hard disk, data taking granularity as a unit, calculating a fingerprint value of the data, writing the fingerprint value and a logical address thereof into the hard disk, and determining whether a storage space occupied in the hard disk reaches a preset threshold value; if the preset threshold value is reached, acquiring unit data from the hard disk, and determining whether a deduplication operation has been performed on the unit data; if the deduplication operation has not been performed, acquiring the fingerprint value of the unit data, and querying, by means of a metadata management module, whether there is original metadata which includes an original physical address which has a mapping relationship with the fingerprint value of the unit data; and if so, establishing first metadata which includes the mapping relationship between the original physical address and the fingerprint value of the unit data, and storing the first metadata in the metadata management module for the deduplication operation performed on the unit data.
Description
相关申请的交叉引用Cross References to Related Applications
本申请要求于2021年9月17日提交中国专利局,申请号为202111090326.4,申请名称为“一种数据重删方法、系统、存储介质及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on September 17, 2021, with the application number 202111090326.4, and the application name is "a data deduplication method, system, storage medium and equipment", and the entire content of the application is passed References are incorporated in this application.
本申请涉及一种数据重删方法、系统、存储介质及设备。The present application relates to a data deduplication method, system, storage medium and equipment.
元数据(Mete data)是指描述数据的数据(data about data),可以理解为比一般意义的数据范畴更加广泛的数据,不仅仅是表示数据的类型、名称、值等信息,也进一步提供了数据的上下文信息,比如数据所属域、数据来源等等。在数据存储系统中,元数据是信息存储的基础,是数据的最小单元。近年来,随着信息技术的发展,产生了海量的数据,但是如何有效地管理和组织这些海量数据已经成为一个突出的问题。对于存储的大量数据,查询分析其中的数据内容和数据含义,才能更加有效的利用数据。在存储系统中,元数据的高效组织和管理是解决这一问题的有效手段,能支持系统对数据的管理和维护。因此,只有有效地管理元数据,数据才变得更有价值。Metadata (Mete data) refers to the data describing data (data about data), which can be understood as data with a wider range of data than the general meaning, not only indicating the type, name, value and other information of the data, but also further providing The context information of the data, such as the domain to which the data belongs, the source of the data, and so on. In a data storage system, metadata is the basis of information storage and the smallest unit of data. In recent years, with the development of information technology, massive data has been generated, but how to effectively manage and organize these massive data has become a prominent problem. For a large amount of stored data, querying and analyzing the data content and meaning of the data can make more effective use of the data. In the storage system, the efficient organization and management of metadata is an effective means to solve this problem, and can support the management and maintenance of data by the system. Therefore, data becomes more valuable only when metadata is managed effectively.
全闪存存储是以全闪存阵列为基础的储存系统,是完全由固态存储介质构成的独立的存储阵列或设备,与传统硬盘存储的主要区别在于性能更高,处理数据更快、更稳定。在全闪存存储系统中数据在线重删特性是最重要也是必备的特性,因为全闪存存储系统后端使用固态硬盘作为存储介质,鉴于固态硬盘的价值问题,所以全闪存存储系统都要求数据在线重删以达到减小后端盘的实际存储空间。要实现全闪存存储系统的在线重删,元数据管理至关重要,元数据管理主要管理L-P(LBA→PBA)映射、P-L(PBA→LBA)映射、H-P(HASHKEY→PBA)映射关系,其中,LBA(Logical Block Address)表示逻辑块地址,PBA(Physical Block Address)表示物理块地址,HASHKEY表示哈希值。相对于传统的不支持在线重删特性来说,元数据管理多了P-L映射、H-P映射关系两种元数据,涉及到大量且高并发、短时延的数据访问对元数据管理来说压力更大。All-flash storage is a storage system based on all-flash arrays. It is an independent storage array or device completely composed of solid-state storage media. The main difference from traditional hard disk storage is higher performance, faster and more stable data processing. In the all-flash storage system, the data online deduplication feature is the most important and necessary feature, because the back-end of the all-flash storage system uses solid-state drives as storage media. In view of the value of solid-state drives, all-flash storage systems require data to be online. Deduplication is used to reduce the actual storage space of the backend disk. To realize the online deduplication of the all-flash storage system, metadata management is very important. Metadata management mainly manages L-P (LBA→PBA) mapping, P-L (PBA→LBA) mapping, and H-P (HASHKEY→PBA) mapping relationship. Among them, LBA (Logical Block Address) represents the logical block address, PBA (Physical Block Address) represents the physical block address, and HASHKEY represents the hash value. Compared with the traditional feature that does not support online deduplication, metadata management has more metadata of P-L mapping and H-P mapping relationship, which involves a large amount of data access with high concurrency and short delay, which puts more pressure on metadata management. big.
在某些特别场景,例如某个控制器故障或数据写压力较大使性能不能满足要求时, 会通过放弃部分在线重删请求来满足性能。但是,发明人意识到,这样会使应该可以重删的部分数据没有重删,从而多占用了后端固态硬盘的存储空间。In some special scenarios, such as a controller failure or high data writing pressure so that the performance cannot meet the requirements, some online deduplication requests will be abandoned to meet the performance. However, the inventor realizes that in this way, some data that should be deduplicated will not be deduplicated, thereby occupying more storage space of the back-end solid-state disk.
发明内容Contents of the invention
本申请实施例提供了一种数据重删方法,包括以下步骤:The embodiment of the present application provides a data deduplication method, including the following steps:
响应于主机放弃重删数据的操作并将以粒度为单位的该数据写入硬盘,计算该数据的指纹值,且将该数据的指纹值及其逻辑地址写入硬盘,并判断硬盘中被占用的存储空间是否达到预设阈值;In response to the host giving up the deduplication operation and writing the data in the unit of granularity to the hard disk, calculating the fingerprint value of the data, and writing the fingerprint value of the data and its logical address to the hard disk, and judging that the hard disk is occupied Whether the storage space of the user reaches the preset threshold;
响应于被占用的存储空间达到预设阈值,从硬盘获取单位数据,并基于单位数据的指纹值和/或逻辑地址判断单位数据是否已进行重删操作;In response to the occupied storage space reaching a preset threshold, acquire the unit data from the hard disk, and determine whether the unit data has been deduplicated based on the fingerprint value and/or logical address of the unit data;
响应于单位数据未进行重删操作,获取单位数据的指纹值,并通过元数据管理模块查询是否存在包含与单位数据的指纹值存在映射关系的原始物理地址的原始元数据;以及Responding to the fact that the unit data has not been deduplicated, obtain the fingerprint value of the unit data, and query whether there is original metadata including the original physical address that has a mapping relationship with the fingerprint value of the unit data through the metadata management module; and
响应于存在原始元数据,建立包含原始物理地址与单位数据的指纹值的映射关系的第一元数据,并将第一元数据存放至元数据管理模块以进行单位数据的重删操作。In response to the existence of the original metadata, first metadata including the mapping relationship between the original physical address and the fingerprint value of the unit data is established, and the first metadata is stored in the metadata management module for deduplication of the unit data.
在一个或多个实施例中,基于单位数据的指纹值和/或逻辑地址判断单位数据是否已进行重删操作,包括:In one or more embodiments, judging whether the unit data has been deduplicated based on the fingerprint value and/or logical address of the unit data includes:
判断单位数据是否有相应的指纹值及逻辑地址;Determine whether the unit data has a corresponding fingerprint value and logical address;
响应于单位数据有相应的指纹值及逻辑地址,确认单位数据未进行重删操作;以及Confirming that the unit data has not been deduplicated in response to the unit data having a corresponding fingerprint value and logical address; and
响应于单位数据有相应的指纹值且没有相应的逻辑地址,确认单位数据已进行重删操作。In response to the fact that the unit of data has a corresponding fingerprint value but does not have a corresponding logical address, it is confirmed that the unit of data has undergone a deduplication operation.
在一个或多个实施例中,方法还包括:In one or more embodiments, the method also includes:
响应于存在原始元数据,建立包含原始物理地址与单位数据的逻辑地址的映射关系的第二元数据组,并将第二元数据组存放至元数据管理模块。In response to the existence of the original metadata, a second metadata group including the mapping relationship between the original physical address and the logical address of the unit data is established, and the second metadata group is stored in the metadata management module.
在一个或多个实施例中,方法还包括:In one or more embodiments, the method also includes:
响应于将第二元数据组存放至元数据管理模块,将单位数据的逻辑地址置为无效。In response to storing the second metadata group into the metadata management module, invalidate the logical address of the unit data.
在一个或多个实施例中,方法还包括:In one or more embodiments, the method also includes:
响应于将第一元数据和第二元数据组均存放至元数据管理模块,通知垃圾回收模块对单位数据进行垃圾回收。In response to storing both the first metadata and the second metadata group in the metadata management module, the garbage collection module is notified to perform garbage collection on the unit of data.
在一个或多个实施例中,建立包含原始物理地址与单位数据的逻辑地址的映射关系的第二元数据组,包括:In one or more embodiments, establishing the second metadata group including the mapping relationship between the original physical address and the logical address of the unit data includes:
建立包含由原始物理地址指向单位数据的逻辑地址的键值对的元数据,以及包含由单位数据的逻辑地址指向原始物理地址的键值对的元数据。Metadata containing key-value pairs pointing to the logical address of the unit data from the original physical address and metadata containing key-value pairs pointing to the original physical address from the logical address of the unit data are established.
在一个或多个实施例中,方法还包括:In one or more embodiments, the method also includes:
响应于不存在原始元数据,建立包含单位数据的指纹值及其物理地址的映射关系的第三元数据,并建立包含单位数据的逻辑地址及其物理地址的映射关系的第四元数据,并将第三元数据和第四元数据存放至元数据管理模块。In response to the absence of original metadata, establishing third metadata including a mapping relationship between the fingerprint value of the unit data and its physical address, and establishing fourth metadata including a mapping relationship between the logical address of the unit data and its physical address, and The third metadata and the fourth metadata are stored in the metadata management module.
本申请的另一方面,还提供了一种数据重删系统,包括:Another aspect of the present application also provides a data deduplication system, including:
存储空间判断模块,配置用于响应于主机放弃重删数据的操作并将以粒度为单位的数据写入硬盘,计算数据的指纹值,且将数据的指纹值及其逻辑地址写入硬盘,并判断硬盘中被占用的存储空间是否达到预设阈值;The storage space judging module is configured to respond to the host giving up the deduplication data operation and write the data in the unit of granularity to the hard disk, calculate the fingerprint value of the data, and write the fingerprint value and its logical address of the data into the hard disk, and Determine whether the occupied storage space in the hard disk reaches a preset threshold;
重删判断模块,配置用于响应于被占用的存储空间达到预设阈值,从硬盘获取单位数据,并基于单位数据的指纹值和/或逻辑地址判断单位数据是否已进行重删操作;The deduplication judging module is configured to obtain the unit data from the hard disk in response to the occupied storage space reaching a preset threshold, and judge whether the unit data has been deduplicated based on the fingerprint value and/or logical address of the unit data;
原始元数据查询模块,配置用于响应于单位数据未进行重删操作,获取单位数据的指纹值,并通过元数据管理模块查询是否存在包含与单位数据的指纹值存在映射关系的原始物理地址的原始元数据;以及The original metadata query module is configured to obtain the fingerprint value of the unit data in response to the fact that the unit data has not been deduplicated, and query whether there is an original physical address that has a mapping relationship with the fingerprint value of the unit data through the metadata management module. raw metadata; and
数据重删模块,配置用于响应于存在原始元数据,建立包含原始物理地址与单位数据的指纹值的映射关系的第一元数据,并将第一元数据存放至元数据管理模块以进行单位数据的重删操作。The data deduplication module is configured to, in response to the existence of the original metadata, establish the first metadata including the mapping relationship between the original physical address and the fingerprint value of the unit data, and store the first metadata in the metadata management module for unit data Data deduplication operation.
本申请的又一方面,还提供了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行前述任一实施例中数据重删方法的步骤。In yet another aspect of the present application, one or more non-volatile computer-readable storage media storing computer-readable instructions is also provided. When the computer-readable instructions are executed by one or more processors, one or more The processor executes the steps of the data deduplication method in any of the foregoing embodiments.
本申请的再一方面,还提供了一种计算机设备,包括存储器及一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行前述任一实施例中数据重删方法的步骤。In yet another aspect of the present application, a computer device is provided, including a memory and one or more processors, where computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the one or more processors, a or multiple processors execute the steps of the data deduplication method in any of the foregoing embodiments.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features and advantages of the application will be apparent from the description, drawings, and claims.
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的 实施例。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following will briefly introduce the accompanying drawings that need to be used in the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present application. Those of ordinary skill in the art can also obtain other embodiments according to these drawings without making creative efforts.
图1为根据本申请一个或多个实施例提供的数据重删方法的示意图;FIG. 1 is a schematic diagram of a data deduplication method provided according to one or more embodiments of the present application;
图2为根据本申请一个或多个实施例提供的数据重删系统的示意图;FIG. 2 is a schematic diagram of a data deduplication system provided according to one or more embodiments of the present application;
图3为根据本申请一个或多个实施例提供的实现数据重删方法的非易失性计算机可读存储介质的示意图;FIG. 3 is a schematic diagram of a non-volatile computer-readable storage medium for implementing a data deduplication method provided according to one or more embodiments of the present application;
图4为根据本申请一个或多个实施例提供的执行数据重删方法的计算机设备的硬件结构示意图。Fig. 4 is a schematic diagram of a hardware structure of a computer device for performing a data deduplication method according to one or more embodiments of the present application.
为使本申请技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本申请实施例进一步详细说明。In order to make the technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below in combination with specific embodiments and with reference to the accompanying drawings.
需要说明的是,本申请实施例中所有使用“第一”和“第二”的表述均是为了区分两个相同名称的非相同的实体或者非相同的参量,可见“第一”“第二”仅为了表述的方便,不应理解为对本申请实施例的限定。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备固有的其他步骤或单元。It should be noted that all the expressions using "first" and "second" in the embodiments of this application are to distinguish two entities with the same name or different parameters. It can be seen that "first" and "second" " is only for the convenience of expression, and should not be understood as limiting the embodiment of the present application. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, of a process, method, system, product or other steps or elements inherent in a process, method, system, product, or device comprising a series of steps or elements.
基于上述目的,本申请实施例的第一个方面,提出了一种数据重删方法的实施例。图1示出的是本申请一个或多个实施例提供的数据重删方法的示意图。以该方法应用于计算机设备为例进行说明。如图1所示,本申请实施例可以包括如下步骤:Based on the above purpose, the first aspect of the embodiments of the present application proposes an embodiment of a data deduplication method. FIG. 1 is a schematic diagram of a data deduplication method provided by one or more embodiments of the present application. The application of the method to computer equipment is taken as an example for illustration. As shown in Figure 1, the embodiment of this application may include the following steps:
步骤S10、响应于主机放弃重删数据的操作并将以粒度为单位的该数据写入硬盘,计算该数据的指纹值,且将该数据的指纹值及其逻辑地址写入硬盘,并判断硬盘中被占用的存储空间是否达到预设阈值;Step S10, in response to the host giving up the operation of deduplicating data and writing the data in the unit of granularity to the hard disk, calculating the fingerprint value of the data, and writing the fingerprint value of the data and its logical address into the hard disk, and judging the hard disk Whether the occupied storage space reaches the preset threshold;
步骤S20、响应于被占用的存储空间达到预设阈值,从硬盘获取单位数据,并基于单位数据的指纹值和/或逻辑地址判断单位数据是否已进行重删操作;Step S20, in response to the occupied storage space reaching a preset threshold, acquiring the unit data from the hard disk, and judging whether the unit data has been deduplicated based on the fingerprint value and/or logical address of the unit data;
步骤S30、响应于单位数据未进行重删操作,获取单位数据的指纹值,并通过元数据管理模块查询是否存在包含与单位数据的指纹值存在映射关系的原始物理地址的原始元数据;以及Step S30: Obtain the fingerprint value of the unit data in response to the fact that the unit data has not been deduplicated, and query whether there is original metadata including the original physical address that has a mapping relationship with the fingerprint value of the unit data through the metadata management module; and
步骤S40、响应于存在原始元数据,建立包含原始物理地址与单位数据的指纹值的映射关系的第一元数据,并将第一元数据存放至元数据管理模块以进行单位数据的重删操作。Step S40, in response to the existence of the original metadata, establish the first metadata including the mapping relationship between the original physical address and the fingerprint value of the unit data, and store the first metadata in the metadata management module to perform the deduplication operation of the unit data .
本申请实施例中,重删操作并非是其字面显示的将重复数据删除的意思,重删操作 是指:对新写到硬盘中的数据查询HP(HASHKEY→PBA)映射,如果查询到了该H(HASHKEY,指纹值)具有相应的HP映射,说明该H所属的该数据已经有与其相同内容的数据在存储池中了,则不再为其分配物理地址(PBA)了,以免同样内容的数据被重复调用。需要说明的是,一个未经过重删操作的数据既有对应到存储卷的逻辑地址(LBA),也有对应到存储池的物理地址(PBA)。指纹值即哈希值,其主要作为数据内容的唯一标识,若有两个数据的内容相同,则二者的指纹值也相同,但二者的逻辑地址和物理地址不一定相同。In the embodiment of the present application, the deduplication operation does not mean deduplicating data as it is literally displayed. The deduplication operation refers to querying the HP (HASHKEY→PBA) mapping for the data newly written to the hard disk. (HASHKEY, fingerprint value) has a corresponding HP mapping, indicating that the data to which the H belongs already has data with the same content in the storage pool, so no physical address (PBA) will be assigned to it, so as to avoid data with the same content is called repeatedly. It should be noted that a piece of data that has not undergone a deduplication operation not only corresponds to a logical address (LBA) of a storage volume, but also has a physical address (PBA) corresponding to a storage pool. The fingerprint value is the hash value, which is mainly used as the unique identifier of the data content. If the content of two data is the same, the fingerprint value of the two is also the same, but the logical address and physical address of the two are not necessarily the same.
本申请实施例中,粒度(Grain)表示数据的最小容量单位,写数据时是按照粒度单位来写的。本申请实施例中的单位数据表示一个粒度单位的数据。In the embodiment of the present application, the granularity (Grain) represents the minimum capacity unit of data, and data is written according to the granularity unit. The unit data in the embodiment of the present application means the data of one granularity unit.
本申请实施例中,元数据均存放在元数据管理模块。当遍历硬盘,获取到硬盘中的一个单位数据时,若该单位数据未进行过重删操作,则获取该单位数据的指纹值H,并通过元数据管理模块查询是否存在包含与其指纹值H存在映射关系的原始物理地址P0的原始元数据,若存在这样的原始元数据,建立包含原始物理地址P0与该单位数据的指纹值H的映射关系的第一元数据,并将第一元数据存放至元数据管理模块。假设原始元数据中的指纹值是H0,且该单位数据存在自己的物理地址P,那么H与H0相同,即表示二者的数据内容是相同的,因此,为了实现重删,便将H与P0组成映射关系,并将包含该映射关系的第一元数据存入元数据管理模块,此时元数据管理模块中的原始元数据与第一元数据相同,可以避免通过元数据将相同内容的数据重复调用。In this embodiment of the application, the metadata is stored in the metadata management module. When traversing the hard disk and obtaining a unit of data in the hard disk, if the unit of data has not been deduplicated, the fingerprint value H of the unit of data is obtained, and the metadata management module is used to query whether it contains the fingerprint value H The original metadata of the original physical address P0 of the mapping relationship, if such original metadata exists, establish the first metadata including the mapping relationship between the original physical address P0 and the fingerprint value H of the unit data, and store the first metadata to the metadata management module. Assuming that the fingerprint value in the original metadata is H0, and this unit of data has its own physical address P, then H and H0 are the same, which means that the data content of the two is the same. Therefore, in order to achieve deduplication, H and P0 forms a mapping relationship, and stores the first metadata containing the mapping relationship into the metadata management module. At this time, the original metadata in the metadata management module is the same as the first metadata, and it is possible to avoid the metadata of the same content through the metadata. Data is called repeatedly.
本申请实施例的数据重删方法,通过获取未进行重删操作的单位数据的指纹值,并在查询到存在包含单位数据的指纹值与对应的原始物理地址的映射关系的原始元数据时,建立包含原始物理地址与单位数据的指纹值的映射关系的第一元数据,并将第一元数据存放至元数据管理模块,在实现在线重删数据的同时避免了对存储系统性能的影响,从而满足了存储系统整体重删率的要求,高效且准确;并且通过设置元数据管理模块提高了访问的并发程度,以获得高效的元数据访问。The data deduplication method in the embodiment of the present application obtains the fingerprint value of the unit data that has not been deduplicated, and when the original metadata containing the mapping relationship between the fingerprint value of the unit data and the corresponding original physical address is found, Establish the first metadata including the mapping relationship between the original physical address and the fingerprint value of the unit data, and store the first metadata in the metadata management module, which avoids the impact on the performance of the storage system while realizing online deduplication. In this way, the requirements of the overall deduplication rate of the storage system are met, which is efficient and accurate; and the concurrency of access is improved by setting the metadata management module to obtain efficient metadata access.
在一个或多个实施例中,基于单位数据的指纹值和/或逻辑地址判断单位数据是否已进行重删操作包括:判断单位数据是否有相应的指纹值及逻辑地址;响应于单位数据有相应的指纹值及逻辑地址,确认单位数据未进行重删操作;响应于单位数据有相应的指纹值且没有相应的逻辑地址,确认单位数据已进行重删操作。In one or more embodiments, judging whether the unit data has been deduplicated based on the fingerprint value and/or logical address of the unit data includes: judging whether the unit data has a corresponding fingerprint value and logical address; If the unit data has a corresponding fingerprint value and no corresponding logical address, it is confirmed that the unit data has been deduplicated.
上述实施例中,如果一个单位数据从主机下发后本来走的就是传统的重删流程,那么在硬盘中只存其指纹值,不存其逻辑地址。In the above-mentioned embodiment, if a unit of data is sent from the host through the traditional deduplication process, only its fingerprint value is stored in the hard disk, and its logical address is not stored.
在一个或多个实施例中,方法还包括:响应于存在原始元数据,建立包含原始物理 地址与单位数据的逻辑地址的映射关系的第二元数据组,并将第二元数据组存放至元数据管理模块。In one or more embodiments, the method further includes: in response to the existence of the original metadata, establishing a second metadata group including the mapping relationship between the original physical address and the logical address of the unit data, and storing the second metadata group in Metadata management module.
上述实施例中,为了保证元数据管理模块管理功能的有效性,以及保证元数据管理模块中元数据的完整性,将与该单位数据有关的元数据均存放至元数据管理模块中。In the above embodiments, in order to ensure the effectiveness of the management function of the metadata management module and the integrity of the metadata in the metadata management module, all the metadata related to the unit of data are stored in the metadata management module.
在一个或多个实施例中,方法还包括:响应于将第二元数据组存放至元数据管理模块,将单位数据的逻辑地址置为无效。In one or more embodiments, the method further includes: invalidating the logical address of the unit of data in response to storing the second metadata group into the metadata management module.
上述实施例中,通过将单位数据的逻辑地址置为无效,也有助于通过逻辑地址判断单位数据是否已进行重删操作。In the above embodiment, by invalidating the logical address of the unit of data, it is also helpful to judge whether the unit of data has been deduplicated through the logical address.
在一个或多个实施例中,方法还包括:响应于将第一元数据和第二元数据组均存放至元数据管理模块,通知垃圾回收模块对单位数据进行垃圾回收。In one or more embodiments, the method further includes: in response to storing both the first metadata and the second metadata group in the metadata management module, informing the garbage collection module to perform garbage collection on the unit of data.
上述实施例中,为了进一步避免重复内容的数据占用内存空间,便对该单位数据启动垃圾回收机制。In the above-mentioned embodiment, in order to further avoid data with repeated content from occupying memory space, a garbage collection mechanism is activated for the unit of data.
在一个或多个实施例中,建立包含原始物理地址与单位数据的逻辑地址的映射关系的第二元数据组包括:建立包含由原始物理地址指向单位数据的逻辑地址的键值对的元数据,以及包含由单位数据的逻辑地址指向原始物理地址的键值对的元数据。In one or more embodiments, establishing the second metadata group including the mapping relationship between the original physical address and the logical address of the unit data includes: establishing metadata including key-value pairs pointing to the logical address of the unit data from the original physical address , and metadata containing key-value pairs pointing from the logical address of the unit data to the original physical address.
上述实施例中,为了保证元数据管理模块中的元数据完整,将包含原始物理地址与单位数据的逻辑地址的映射关系的两个键值对的两个元数据均存放到元数据管理模块。In the above embodiment, in order to ensure the integrity of the metadata in the metadata management module, two metadata of two key-value pairs including the mapping relationship between the original physical address and the logical address of the unit data are stored in the metadata management module.
在一个或多个实施例中,方法还包括:响应于不存在原始元数据,建立包含单位数据的指纹值及其物理地址的映射关系的第三元数据,并建立包含单位数据的逻辑地址及其物理地址的映射关系的第四元数据,并将第三元数据和第四元数据存放至元数据管理模块。In one or more embodiments, the method further includes: in response to the absence of original metadata, establishing third metadata including the mapping relationship between the fingerprint value of the unit data and its physical address, and establishing the logical address and The fourth metadata of the mapping relation of the physical address, and store the third metadata and the fourth metadata in the metadata management module.
上述实施例中,若没有查询到包含单位数据的指纹值与对应的原始物理地址的映射关系的原始元数据,说明该单位数据是新数据,没有与其内容相同的数据,自然也不会有过重删操作。因此将该单位数据自己的指纹值、物理地址、逻辑地址建立相应的映射关系,并将具有相应映射关系的第三元数据和第四元数据存放到元数据管理模块中,以便之后来了其他新数据时通过查询元数据管理模块可以知道是否有与该单位数据内容一致的指纹值。本实施例中的第四元数据包括由该单位数据的逻辑地址指向物理地址的键值对的元数据,也包括由该单位的物理地址指向逻辑地址的键值对的元数据。In the above embodiment, if there is no original metadata containing the mapping relationship between the fingerprint value of the unit data and the corresponding original physical address, it means that the unit data is new data, and there is no data with the same content, and naturally there will be no exceptions. Duplicate operation. Therefore, the corresponding mapping relationship is established for the fingerprint value, physical address, and logical address of the unit data itself, and the third metadata and fourth metadata with the corresponding mapping relationship are stored in the metadata management module, so that other When there is new data, it can be known whether there is a fingerprint value consistent with the data content of the unit by querying the metadata management module. The fourth metadata in this embodiment includes the metadata of the key-value pair pointing to the physical address from the logical address of the unit data, and also includes the metadata of the key-value pair pointing to the logical address from the physical address of the unit.
应该理解的是,虽然图1的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图1中的至少 一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flow chart of FIG. 1 are displayed sequentially as indicated by the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in Fig. 1 may include multiple sub-steps or multiple stages, these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, the execution of these sub-steps or stages The order is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
本申请实施例的第二个方面,还提供了一种数据重删系统。图2示出的是根据本申请一个或多个实施例提供的数据重删系统的示意图。如图2所示,一种数据重删系统包括:存储空间判断模块10,配置用于响应于主机放弃重删数据的操作并将以粒度为单位的数据写入硬盘,计算数据的指纹值,且将数据的指纹值及其逻辑地址写入硬盘,并判断硬盘中被占用的存储空间是否达到预设阈值;重删判断模块20,配置用于响应于被占用的存储空间达到预设阈值,从硬盘获取单位数据,并基于单位数据的指纹值和/或逻辑地址判断单位数据是否已进行重删操作;原始元数据查询模块30,配置用于响应于单位数据未进行重删操作,获取单位数据的指纹值,并通过元数据管理模块查询是否存在包含与单位数据的指纹值存在映射关系的原始物理地址的原始元数据;以及数据重删模块40,配置用于响应于存在原始元数据,建立包含原始物理地址与单位数据的指纹值的映射关系的第一元数据,并将第一元数据存放至元数据管理模块以进行单位数据的重删操作。According to the second aspect of the embodiments of the present application, a data deduplication system is also provided. Fig. 2 is a schematic diagram of a data deduplication system provided according to one or more embodiments of the present application. As shown in Figure 2, a data deduplication system includes: a storage space judging module 10, configured to respond to the host giving up the deduplication data operation and write the data in granularity to the hard disk, calculate the fingerprint value of the data, And write the fingerprint value of the data and its logical address into the hard disk, and judge whether the occupied storage space in the hard disk reaches the preset threshold; the deduplication judging module 20 is configured to respond to the occupied storage space reaching the preset threshold, Acquire the unit data from the hard disk, and judge whether the unit data has been deduplicated based on the fingerprint value and/or logical address of the unit data; the original metadata query module 30 is configured to acquire the unit in response to the unit data not performing a deduplication operation The fingerprint value of the data, and whether there is original metadata containing the original physical address that has a mapping relationship with the fingerprint value of the unit data through the metadata management module; and the data deduplication module 40, configured to respond to the existence of the original metadata, The first metadata including the mapping relationship between the original physical address and the fingerprint value of the unit data is established, and the first metadata is stored in the metadata management module for deduplication operation of the unit data.
在一个或多个实施例中,重删判断模块20包括指纹值与逻辑地址判断模块,配置用于判断单位数据是否有相应的指纹值及逻辑地址;响应于单位数据有相应的指纹值及逻辑地址,确认单位数据未进行重删操作;响应于单位数据有相应的指纹值且没有相应的逻辑地址,确认单位数据已进行重删操作。In one or more embodiments, the deduplication judging module 20 includes a fingerprint value and logical address judging module configured to judge whether the unit data has a corresponding fingerprint value and logical address; Address, to confirm that the unit data has not been deduplicated; in response to the unit data having a corresponding fingerprint value and no corresponding logical address, confirming that the unit data has been deduplicated.
在一个或多个实施例中,系统还包括第二元数据组存放模块,配置用于响应于存在原始元数据,建立包含原始物理地址与单位数据的逻辑地址的映射关系的第二元数据组,并将第二元数据组存放至元数据管理模块。In one or more embodiments, the system further includes a second metadata group storage module configured to, in response to the existence of original metadata, establish a second metadata group including a mapping relationship between the original physical address and the logical address of the unit data , and store the second metadata group in the metadata management module.
在一个或多个实施例中,系统还包括逻辑地址置无效模块,配置用于响应于将第二元数据组存放至元数据管理模块,将单位数据的逻辑地址置为无效。In one or more embodiments, the system further includes a logical address invalidation module configured to invalidate the logical address of the unit of data in response to storing the second metadata group into the metadata management module.
在一个或多个实施例中,系统还包括垃圾回收模块,配置用于响应于将第一元数据和第二元数据组均存放至元数据管理模块,通知垃圾回收模块对单位数据进行垃圾回收。In one or more embodiments, the system further includes a garbage collection module configured to notify the garbage collection module to perform garbage collection on the unit data in response to storing both the first metadata and the second metadata group in the metadata management module .
在一个或多个实施例中,第二元数据组存放模块包括键值对模块,配置用于建立包含由原始物理地址指向单位数据的逻辑地址的键值对的元数据,以及包含由单位数据的逻辑地址指向原始物理地址的键值对的元数据。In one or more embodiments, the second metadata group storage module includes a key-value pair module configured to create metadata containing a key-value pair of a logical address pointing to a unit of data from an original physical address, and containing The metadata of the key-value pair whose logical address points to the original physical address.
在一个或多个实施例中,系统还包括元数据存放模块,配置用于响应于不存在原始 元数据,建立包含单位数据的指纹值及其物理地址的映射关系的第三元数据,并建立包含单位数据的逻辑地址及其物理地址的映射关系的第四元数据,并将第三元数据和第四元数据存放至元数据管理模块。In one or more embodiments, the system further includes a metadata storage module configured to, in response to the absence of original metadata, establish third metadata including the mapping relationship between the fingerprint value of the unit data and its physical address, and establish The fourth metadata of the mapping relationship between the logical address of the unit data and its physical address is included, and the third metadata and the fourth metadata are stored in the metadata management module.
关于数据重删装置的具体限定可以参见上文中对于数据重删方法的限定,在此不再赘述。上述数据重删装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For specific limitations on the data deduplication device, refer to the above-mentioned limitations on the data deduplication method, which will not be repeated here. Each module in the above-mentioned data deduplication device can be fully or partially realized by software, hardware and a combination thereof. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.
本申请实施例的第三个方面,还提供了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,图3示出了根据本申请一个或多个实施例提供的实现数据重删方法的非易失性计算机可读存储介质的示意图。如图3所示,非易失性计算机可读存储介质3存储有计算机可读指令31。该计算机可读指令31被处理器执行时实现上述任意一项实施例的方法。In a third aspect of the embodiments of the present application, one or more non-volatile computer-readable storage media storing computer-readable instructions is also provided. FIG. 3 shows the A schematic diagram of a non-volatile computer-readable storage medium implementing a data deduplication method. As shown in FIG. 3 , the non-volatile computer-readable storage medium 3 stores computer-readable instructions 31 . When the computer-readable instructions 31 are executed by the processor, the method in any one of the above-mentioned embodiments is realized.
应当理解,在相互不冲突的情况下,以上针对根据本申请的数据重删方法阐述的所有实施方式、特征和优势同样地适用于根据本申请的数据重删系统和存储介质。It should be understood that all the implementations, features and advantages described above for the data deduplication method according to the present application are equally applicable to the data deduplication system and the storage medium according to the present application if they do not conflict with each other.
本申请实施例的第四个方面,还提供了一种计算机设备,包括如图4所示的存储器402和一个或多个处理器401,该存储器402中存储有计算机可读指令,该计算机可读指令被一个或多个处理器401执行时实现上述任意一项实施例的方法。The fourth aspect of the embodiments of the present application also provides a computer device, including a memory 402 and one or more processors 401 as shown in FIG. When the read instruction is executed by one or more processors 401, the method in any one of the foregoing embodiments is implemented.
如图4所示,为本申请一个或多个实施例提供的执行数据重删方法的计算机设备的一个实施例的硬件结构示意图。以如图4所示的计算机设备为例,在该计算机设备中包括一个处理器401以及一个存储器402,并还可以包括:输入装置403和输出装置404。处理器401、存储器402、输入装置403和输出装置404可以通过总线或者其他方式连接,图4中以通过总线连接为例。输入装置403可接收输入的数字或字符信息,以及产生与数据重删系统的用户设置以及功能控制有关的键信号输入。输出装置404可包括显示屏等显示设备。As shown in FIG. 4 , it is a schematic diagram of a hardware structure of an embodiment of a computer device performing a data deduplication method provided by one or more embodiments of the present application. Taking the computer equipment shown in FIG. 4 as an example, the computer equipment includes a processor 401 and a memory 402 , and may further include: an input device 403 and an output device 404 . The processor 401, the memory 402, the input device 403, and the output device 404 may be connected via a bus or in other ways. In FIG. 4, connection via a bus is taken as an example. The input device 403 can receive input numbers or character information, and generate key signal input related to user settings and function control of the data deduplication system. The output device 404 may include a display device such as a display screen.
存储器402作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本申请实施例中的数据重删方法对应的计算机可读指令/模块。存储器402可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储数据重删方法的使用所创建的数据等。此外,存储器402可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一个或多个实施例中,存储器402可选包括相对于处理器401远程设置的存储器,这些 远程存储器可以通过网络连接至本地模块。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 402, as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs and modules, such as the computer corresponding to the data deduplication method in the embodiment of this application Readable directives/modules. The memory 402 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created by using the data deduplication method, and the like. In addition, the memory 402 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage devices. In one or more embodiments, the memory 402 may optionally include memory that is remotely located relative to the processor 401, and these remote memories may be connected to the local module through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
处理器401通过运行存储在存储器402中的非易失性软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例的数据重删方法。The processor 401 executes various functional applications and data processing of the server by running non-volatile software programs, instructions and modules stored in the memory 402, that is, implements the data deduplication method in the above method embodiment.
本领域技术人员可以理解,图4中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的设备的限定,具体的设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in Figure 4 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the equipment to which the solution of the application is applied. The specific equipment may include More or fewer components are shown in the figures, or certain components are combined, or have different component arrangements.
最后需要说明的是,本文的计算机可读存储介质(例如,存储器)可以是易失性存储器或非易失性存储器,或者可以包括易失性存储器和非易失性存储器两者。作为例子而非限制性的,非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦写可编程ROM(EEPROM)或快闪存储器。易失性存储器可以包括随机存取存储器(RAM),该RAM可以充当外部高速缓存存储器。作为例子而非限制性的,RAM可以以多种形式获得,比如同步RAM(DRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据速率SDRAM(DDR SDRAM)、增强SDRAM(ESDRAM)、同步链路DRAM(SLDRAM)、以及直接Rambus RAM(DRRAM)。所公开的方面的存储设备意在包括但不限于这些和其它合适类型的存储器。Finally, it should be noted that the computer-readable storage medium (eg, memory) herein may be a volatile memory or a nonvolatile memory, or may include both volatile memory and nonvolatile memory. By way of example and not limitation, nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory memory. Volatile memory can include random access memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in various forms such as Synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). Storage devices of the disclosed aspects are intended to include, but are not limited to, these and other suitable types of memory.
本领域技术人员还应当明白的是,结合这里的公开所描述的各种示例性逻辑块、模块、电路和算法步骤可以被实现为电子硬件、计算机软件或两者的组合。为了清楚地说明硬件和软件的这种可互换性,已经就各种示意性组件、方块、模块、电路和步骤的功能对其进行了一般性的描述。这种功能是被实现为软件还是被实现为硬件取决于具体应用以及施加给整个系统的设计约束。本领域技术人员可以针对每种具体应用以各种方式来实现的功能,但是这种实现决定不应被解释为导致脱离本申请实施例公开的范围。Those of skill would also appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described generally in terms of their functionality. Whether such functionality is implemented as software or as hardware depends upon the particular application and design constraints imposed on the overall system. Those skilled in the art may implement the functions in various ways for each specific application, but such implementation decisions should not be interpreted as causing a departure from the scope disclosed in the embodiments of the present application.
以上是本申请公开的示例性实施例,但是应当注意,在不背离权利要求限定的本申请实施例公开的范围的前提下,可以进行多种改变和修改。根据这里描述的公开实施例的方法权利要求的功能、步骤和/或动作不需以任何特定顺序执行。此外,尽管本申请实施例公开的元素可以以个体形式描述或要求,但除非明确限制为单数,也可以理解为多个。The above are the exemplary embodiments disclosed in the present application, but it should be noted that various changes and modifications can be made without departing from the scope of the embodiments disclosed in the present application defined by the claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. In addition, although the elements disclosed in the embodiments of the present application may be described or required in an individual form, they may also be understood as plural unless explicitly limited to a singular number.
应当理解的是,在本文中使用的,除非上下文清楚地支持例外情况,单数形式“一个”旨在也包括复数形式。还应当理解的是,在本文中使用的“和/或”是指包括一个或者一个以上相关联地列出的项目的任意和所有可能组合。上述本申请实施例公开实施例序号仅仅为了描述,不代表实施例的优劣。It should be understood that as used herein, the singular form "a" and "an" are intended to include the plural forms as well, unless the context clearly supports an exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The serial numbers of the embodiments disclosed in the above-mentioned embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments.
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本申请实施例公开的范围(包括权利要求)被限于这些例子;在本申请实施例的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,并存在如上的本申请实施例的不同方面的许多其它变化,为了简明它们没有在细节中提供。因此,凡在本申请实施例的精神和原则之内,所做的任何省略、修改、等同替换、改进等,均应包含在本申请实施例的保护范围之内。Those of ordinary skill in the art should understand that: the discussion of any of the above embodiments is exemplary only, and is not intended to imply that the scope (including claims) disclosed by the embodiments of the present application is limited to these examples; under the idea of the embodiments of the present application , the technical features in the above embodiments or different embodiments can also be combined, and there are many other changes in different aspects of the above embodiments of the present application, which are not provided in details for the sake of brevity. Therefore, within the spirit and principle of the embodiments of the present application, any omissions, modifications, equivalent replacements, improvements, etc., shall be included in the protection scope of the embodiments of the present application.
Claims (10)
- 一种数据重删方法,其特征在于,包括以下步骤:A data deduplication method is characterized in that, comprising the following steps:响应于主机放弃重删数据的操作并将以粒度为单位的所述数据写入硬盘,计算所述数据的指纹值,且将所述数据的指纹值及其逻辑地址写入所述硬盘,并判断所述硬盘中被占用的存储空间是否达到预设阈值;In response to the host giving up the deduplication operation and writing the data in units of granularity to the hard disk, calculating the fingerprint value of the data, and writing the fingerprint value of the data and its logical address into the hard disk, and judging whether the occupied storage space in the hard disk reaches a preset threshold;响应于被占用的存储空间达到预设阈值,从所述硬盘获取单位数据,并基于所述单位数据的指纹值和/或逻辑地址判断所述单位数据是否已进行重删操作;Responding to the occupied storage space reaching a preset threshold, acquiring unit data from the hard disk, and judging whether the unit data has been deduplicated based on the fingerprint value and/or logical address of the unit data;响应于所述单位数据未进行重删操作,获取所述单位数据的指纹值,并通过元数据管理模块查询是否存在包含与所述单位数据的指纹值存在映射关系的原始物理地址的原始元数据;以及In response to the fact that the unit data has not been deduplicated, obtain the fingerprint value of the unit data, and query whether there is original metadata including the original physical address that has a mapping relationship with the fingerprint value of the unit data through the metadata management module ;as well as响应于存在所述原始元数据,建立包含所述原始物理地址与所述单位数据的指纹值的映射关系的第一元数据,并将所述第一元数据存放至所述元数据管理模块以进行所述单位数据的重删操作。In response to the presence of the original metadata, establish first metadata including a mapping relationship between the original physical address and the fingerprint value of the unit data, and store the first metadata in the metadata management module for Perform a deduplication operation on the unit of data.
- 根据权利要求1所述的方法,其特征在于,所述基于所述单位数据的指纹值和/或逻辑地址判断所述单位数据是否已进行重删操作,包括:The method according to claim 1, wherein the judging whether the unit data has been deduplicated based on the fingerprint value and/or logical address of the unit data includes:判断所述单位数据是否有相应的指纹值及逻辑地址;Judging whether the unit data has a corresponding fingerprint value and logical address;响应于所述单位数据有相应的指纹值及逻辑地址,确认所述单位数据未进行重删操作;以及Responding to the fact that the unit of data has a corresponding fingerprint value and logical address, confirming that the unit of data has not been deduplicated; and响应于所述单位数据有相应的指纹值且没有相应的逻辑地址,确认所述单位数据已进行重删操作。In response to the fact that the unit of data has a corresponding fingerprint value but does not have a corresponding logical address, it is confirmed that the unit of data has undergone a deduplication operation.
- 根据权利要求1所述的方法,其特征在于,还包括:The method according to claim 1, further comprising:响应于存在所述原始元数据,建立包含所述原始物理地址与所述单位数据的逻辑地址的映射关系的第二元数据组,并将所述第二元数据组存放至所述元数据管理模块。In response to the existence of the original metadata, establish a second metadata group including the mapping relationship between the original physical address and the logical address of the unit data, and store the second metadata group in the metadata management module.
- 根据权利要求3所述的方法,其特征在于,还包括:The method according to claim 3, further comprising:响应于将所述第二元数据组存放至所述元数据管理模块,将所述单位数据的逻辑地址置为无效。In response to storing the second metadata group into the metadata management module, invalidate the logical address of the unit of data.
- 根据权利要求3所述的方法,其特征在于,还包括:The method according to claim 3, further comprising:响应于将所述第一元数据和第二元数据组均存放至所述元数据管理模块,通知垃圾回收模块对所述单位数据进行垃圾回收。In response to storing both the first metadata and the second metadata group in the metadata management module, the garbage collection module is notified to perform garbage collection on the unit of data.
- 根据权利要求3所述的方法,其特征在于,所述建立包含所述原始物理地址与所述单位数据的逻辑地址的映射关系的第二元数据组,包括:The method according to claim 3, wherein the establishment of the second metadata group including the mapping relationship between the original physical address and the logical address of the unit data comprises:建立包含由所述原始物理地址指向所述单位数据的逻辑地址的键值对的元数据,以及包含由所述单位数据的逻辑地址指向所述原始物理地址的键值对的元数据。Create metadata including key-value pairs pointing from the original physical address to the logical address of the unit of data, and metadata including key-value pairs pointing to the original physical address from the logical address of the unit of data.
- 根据权利要求1所述的方法,其特征在于,还包括:The method according to claim 1, further comprising:响应于不存在所述原始元数据,建立包含所述单位数据的指纹值及其物理地址的映射关系的第三元数据,并建立包含所述单位数据的逻辑地址及其物理地址的映射关系的第四元数据,并将所述第三元数据和第四元数据存放至所述元数据管理模块。In response to the absence of the original metadata, establishing third metadata including a mapping relationship between the fingerprint value of the unit of data and its physical address, and establishing a mapping relationship between the logical address of the unit of data and its physical address fourth metadata, and store the third metadata and fourth metadata in the metadata management module.
- 一种数据重删系统,包括:A data deduplication system, comprising:存储空间判断模块,配置用于响应于主机放弃重删数据的操作并将以粒度为单位的所述数据写入硬盘,计算所述数据的指纹值,且将所述数据的指纹值及其逻辑地址写入所述硬盘,并判断所述硬盘中被占用的存储空间是否达到预设阈值;The storage space judging module is configured to respond to the host giving up the operation of deduplicating data and write the data in the unit of granularity to the hard disk, calculate the fingerprint value of the data, and compare the fingerprint value of the data and its logic writing the address into the hard disk, and judging whether the occupied storage space in the hard disk reaches a preset threshold;重删判断模块,配置用于响应于被占用的存储空间达到预设阈值,从所述硬盘获取单位数据,并基于所述单位数据的指纹值和/或逻辑地址判断所述单位数据是否已进行重删操作;A deduplication judging module, configured to obtain unit data from the hard disk in response to the occupied storage space reaching a preset threshold, and judge whether the unit data has been processed based on the fingerprint value and/or logical address of the unit data Deduplication operation;原始元数据查询模块,配置用于响应于所述单位数据未进行重删操作,获取所述单位数据的指纹值,并通过元数据管理模块查询是否存在包含与所述单位数据的指纹值存在映射关系的原始物理地址的原始元数据;以及The original metadata query module is configured to obtain the fingerprint value of the unit data in response to the fact that the unit data has not been deduplicated, and query whether there is a mapping between the fingerprint value and the unit data through the metadata management module the original metadata of the original physical address of the relationship; and数据重删模块,配置用于响应于存在所述原始元数据,建立包含所述原始物理地址与所述单位数据的指纹值的映射关系的第一元数据,并将所述第一元数据存放至所述元数据管理模块以进行所述单位数据的重删操作。A data deduplication module configured to, in response to the existence of the original metadata, establish first metadata including a mapping relationship between the original physical address and the fingerprint value of the unit data, and store the first metadata to the metadata management module to perform the deduplication operation of the unit data.
- 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1至7中任一项所述的方法的步骤。One or more non-transitory computer-readable storage media storing computer-readable instructions, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors Carrying out the steps of the method as claimed in any one of claims 1 to 7.
- 一种计算机设备,其特征在于,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1至7中任一项所述方法的步骤。A computer device, characterized by comprising a memory and one or more processors, wherein computer readable instructions are stored in the memory, and when the computer readable instructions are executed by the one or more processors, the The one or more processors execute the steps of the method according to any one of claims 1-7.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111090326.4A CN113535708A (en) | 2021-09-17 | 2021-09-17 | Data deduplication method, system, storage medium and equipment |
CN202111090326.4 | 2021-09-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023040200A1 true WO2023040200A1 (en) | 2023-03-23 |
Family
ID=78093359
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/078324 WO2023040200A1 (en) | 2021-09-17 | 2022-02-28 | Data deduplication method and system, and storage medium and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113535708A (en) |
WO (1) | WO2023040200A1 (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113535708A (en) * | 2021-09-17 | 2021-10-22 | 苏州浪潮智能科技有限公司 | Data deduplication method, system, storage medium and equipment |
CN114138198B (en) * | 2021-11-29 | 2024-05-28 | 苏州浪潮智能科技有限公司 | Method, device, equipment and readable medium for deleting data |
CN114253472B (en) * | 2021-11-29 | 2023-09-22 | 郑州云海信息技术有限公司 | Metadata management method, device and storage medium |
CN114416676A (en) * | 2021-12-20 | 2022-04-29 | 北京星网锐捷网络技术有限公司 | Data processing method, device, equipment and storage medium |
CN115048064B (en) * | 2022-07-29 | 2024-10-15 | 苏州浪潮智能科技有限公司 | Data management method, device, equipment and storage medium |
CN115437579B (en) * | 2022-11-04 | 2023-03-24 | 苏州浪潮智能科技有限公司 | Metadata management method and device, computer equipment and readable storage medium |
CN115576956B (en) * | 2022-12-07 | 2023-03-10 | 苏州浪潮智能科技有限公司 | Data processing method, system, equipment and storage medium |
CN117931092B (en) * | 2024-03-20 | 2024-05-24 | 苏州元脑智能科技有限公司 | Data deduplication adjustment method, device, equipment, storage system and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106339444A (en) * | 2016-08-23 | 2017-01-18 | 深圳市金立通信设备有限公司 | Method for instantly deleting file and terminal |
CN106527973A (en) * | 2016-10-10 | 2017-03-22 | 杭州宏杉科技股份有限公司 | A method and device for data deduplication |
CN107122130A (en) * | 2017-04-13 | 2017-09-01 | 杭州宏杉科技股份有限公司 | A kind of data delete method and device again |
CN110727404A (en) * | 2019-09-27 | 2020-01-24 | 苏州浪潮智能科技有限公司 | Data deduplication method and device based on storage end and storage medium |
CN110795031A (en) * | 2019-10-17 | 2020-02-14 | 北京浪潮数据技术有限公司 | Data deduplication method, device and system based on full flash storage |
CN113535708A (en) * | 2021-09-17 | 2021-10-22 | 苏州浪潮智能科技有限公司 | Data deduplication method, system, storage medium and equipment |
-
2021
- 2021-09-17 CN CN202111090326.4A patent/CN113535708A/en active Pending
-
2022
- 2022-02-28 WO PCT/CN2022/078324 patent/WO2023040200A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106339444A (en) * | 2016-08-23 | 2017-01-18 | 深圳市金立通信设备有限公司 | Method for instantly deleting file and terminal |
CN106527973A (en) * | 2016-10-10 | 2017-03-22 | 杭州宏杉科技股份有限公司 | A method and device for data deduplication |
CN107122130A (en) * | 2017-04-13 | 2017-09-01 | 杭州宏杉科技股份有限公司 | A kind of data delete method and device again |
CN110727404A (en) * | 2019-09-27 | 2020-01-24 | 苏州浪潮智能科技有限公司 | Data deduplication method and device based on storage end and storage medium |
CN110795031A (en) * | 2019-10-17 | 2020-02-14 | 北京浪潮数据技术有限公司 | Data deduplication method, device and system based on full flash storage |
CN113535708A (en) * | 2021-09-17 | 2021-10-22 | 苏州浪潮智能科技有限公司 | Data deduplication method, system, storage medium and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113535708A (en) | 2021-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023040200A1 (en) | Data deduplication method and system, and storage medium and device | |
EP3260985B1 (en) | Controller, flash memory apparatus, and method for writing data into flash memory apparatus | |
US10374792B1 (en) | Layout-independent cryptographic stamp of a distributed dataset | |
CN108459826B (en) | Method and device for processing IO (input/output) request | |
US10203899B2 (en) | Method for writing data into flash memory apparatus, flash memory apparatus, and storage system | |
KR101994021B1 (en) | File manipulation method and apparatus | |
US10649905B2 (en) | Method and apparatus for storing data | |
CN111381779B (en) | Data processing method, device, equipment and storage medium | |
US10261908B2 (en) | Method and apparatus for expanding cache size for cache array | |
US10891074B2 (en) | Key-value storage device supporting snapshot function and operating method thereof | |
US9983827B1 (en) | Key-based memory deduplication protection | |
US10061523B2 (en) | Versioning storage devices and methods | |
WO2021073510A1 (en) | Statistical method and device for database | |
CN104199899A (en) | Method and device for storing massive pictures based on Hbase | |
US11449270B2 (en) | Address translation method and system for KV storage device | |
US20150212744A1 (en) | Method and system of eviction stage population of a flash memory cache of a multilayer cache system | |
CN113326005A (en) | Read-write method and device for RAID storage system | |
CN115470156A (en) | RDMA-based memory use method, system, electronic device and storage medium | |
WO2016206070A1 (en) | File updating method and storage device | |
WO2016127807A1 (en) | Method for writing multiple copies into storage device, and storage device | |
CN112764662B (en) | Method, apparatus and computer program product for storage management | |
US20150106884A1 (en) | Memcached multi-tenancy offload | |
JP2018502379A5 (en) | ||
CN113625938B (en) | Metadata storage method and device | |
US20150113204A1 (en) | Data storage device and computing system with the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22868580 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22868580 Country of ref document: EP Kind code of ref document: A1 |