CN115729471A - Method, device, equipment and storage medium for deduplication query - Google Patents

Method, device, equipment and storage medium for deduplication query Download PDF

Info

Publication number
CN115729471A
CN115729471A CN202211482862.3A CN202211482862A CN115729471A CN 115729471 A CN115729471 A CN 115729471A CN 202211482862 A CN202211482862 A CN 202211482862A CN 115729471 A CN115729471 A CN 115729471A
Authority
CN
China
Prior art keywords
target data
fingerprint value
target
physical address
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211482862.3A
Other languages
Chinese (zh)
Inventor
王见
孙京本
李佩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN202211482862.3A priority Critical patent/CN115729471A/en
Publication of CN115729471A publication Critical patent/CN115729471A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a deduplication query method, which comprises the following steps: receiving a write request, and determining that data corresponding to the currently received write request is target data; calculating a fingerprint value of the target data to obtain a target fingerprint value, and inquiring the target fingerprint value in the cuckoo filter; if the target fingerprint value is inquired in the cuckoo filter, the metadata mapping from the current logical address of the target data to the physical address of the target data is increased, otherwise, the target data is stored, the target fingerprint value is added into the cuckoo filter, and the metadata mapping from the current logical address of the target data to the physical address of the target data is increased. By the method and the device, the frequency of searching the deleted metadata in the B + tree on the disk can be reduced, and the performance of the whole storage system is improved. The application also provides a deduplication inquiry device, deduplication inquiry equipment and a computer-readable storage medium, which have the beneficial effects.

Description

Method, device, equipment and storage medium for deduplication query
Technical Field
The present invention relates to the field of storage technologies, and in particular, to a deduplication query method, apparatus, device, and storage medium.
Background
In the full flash memory storage system, the SSD disk is much more expensive than the conventional mechanical disk, so deduplication and compression are important characteristics of the full flash memory storage system. And the deduplication requires the support of metadata, and as the storage capacity supported by the system increases, the metadata also grows linearly, and in the block mapping with 8K as a unit, if the data amount reaches several PB, the metadata is at TB level.
Obviously, the metadata of the TB boundary brings great trouble to data search, so how to effectively manage the metadata to realize fast data reading and writing is an urgent problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a deduplication inquiry method, a deduplication inquiry device, deduplication inquiry equipment and a computer readable storage medium, which can improve the disk application performance of write cache.
In order to achieve the above object, the present application provides a deduplication query method, which has the following specific technical solutions:
receiving a write request, and determining that data corresponding to the currently received write request is target data;
calculating a fingerprint value of the target data to obtain a target fingerprint value, and inquiring the target fingerprint value in a cuckoo filter;
if the target fingerprint value is inquired in the cuckoo filter, increasing the metadata mapping from the current logical address of the target data to the physical address of the target data, otherwise, storing the target data, adding the target fingerprint value into the cuckoo filter, and increasing the metadata mapping from the current logical address of the target data to the physical address of the target data.
Optionally, if the target fingerprint value is found in the cuckoo filter, the method further includes:
inquiring whether metadata mapping from a fingerprint value to a physical address to which the target fingerprint value belongs exists, if so, determining that the physical address in the metadata mapping from the fingerprint value to the physical address to which the target fingerprint value belongs is the physical address of the target data, and executing a step of increasing the metadata mapping from the current logical address of the target data to the physical address of the target data, otherwise, executing a step of storing the target data;
correspondingly, after the target data is stored, the metadata mapping from the target fingerprint value to the physical address of the target data is added.
Optionally, before querying the target fingerprint value in the cuckoo filter, the method further includes:
setting an array structure and a linked list structure corresponding to the cuckoo filter; and the same fingerprint value is stored in two positions in the array structure, and the fingerprint value is a secondary hash value made on part of bits of hash corresponding to the physical address.
Optionally, when the hash value of the data at the physical address is inserted into the array structure, the method further includes:
calculating the insertion position of the hash value in the array structure by a hash algorithm contained in the cuckoo filter.
Optionally, when calculating an insertion position of the hash value in the array structure through a hash algorithm included in the cuckoo filter, if an element exists in the insertion position, the method further includes:
obtaining part of bits of the hash value through a hash back-check function;
judging whether the partial bits are the same;
if the hash values are the same, carrying out accurate HP query on the hash values;
and if the hash value is not the same, inserting the hash value into the linked list structure corresponding to the array structure.
Optionally, the method further includes:
managing the mapping relation between the physical address and the corresponding logical address by using a controller on the solid state disk;
after the storage pool is established, integrating and remapping the physical address; and recoding the logic address by using the pool information of the storage pool and the storage offset position information.
Optionally, determining that data corresponding to the currently received write request is target data includes:
determining data of a data block corresponding to a currently received write request as target data;
correspondingly, calculating the fingerprint value of the target data to obtain a target fingerprint value includes:
and calculating the fingerprint value of the target data by using a preset hash algorithm to obtain a target fingerprint value.
The present application further provides a deduplication query apparatus, including:
the receiving module is used for receiving the write request and determining that the data corresponding to the currently received write request is target data;
the query module is used for calculating a fingerprint value of the target data to obtain a target fingerprint value and querying the target fingerprint value in the cuckoo filter;
and the execution module is used for increasing the metadata mapping from the logical address in the write request corresponding to the target data to the physical address of the target data if the target fingerprint value is inquired in the cuckoo filter, otherwise, storing the target data, adding the target fingerprint value into the cuckoo filter, and increasing the metadata mapping from the logical address in the write request corresponding to the target data to the physical address of the target data.
Optionally, the method further includes:
a mapping query module, configured to query whether there is a metadata mapping from a fingerprint value to a physical address to which the target fingerprint value belongs if the target fingerprint value is queried in the cuckoo filter, determine that the physical address in the metadata mapping from the fingerprint value to the physical address to which the target fingerprint value belongs is the physical address of the target data if the target fingerprint value belongs, and perform a step of adding a metadata mapping from a current logical address of the target data to the physical address of the target data, and otherwise perform a step of storing the target data;
optionally, the method further includes:
the filter setting module is used for setting an array structure and a linked list structure corresponding to the cuckoo filter; and the same fingerprint value is stored in two positions in the array structure, and the fingerprint value is a secondary hash value made on part of bits of hash corresponding to the physical address.
Optionally, the method further includes:
and the position calculation module is used for calculating the insertion position of the hash value in the array structure through a hash algorithm contained in the cuckoo filter.
Optionally, the method further includes:
the inserting module is used for acquiring partial bits of the hash value through a hash back-check function when the element exists in the inserting position; judging whether the partial bits are the same; if the hash values are the same, carrying out accurate HP query on the hash values; and if the hash value is not the same, inserting the hash value into the linked list structure corresponding to the array structure.
Optionally, the method further includes:
the mapping relation management module is used for managing the mapping relation between the physical address and the corresponding logical address by using a controller on the solid state disk; after the storage pool is established, integrating and remapping the physical address; and recoding the logic address by using the pool information of the storage pool and the storage offset position information.
Optionally, the receiving module includes:
the data determining unit is used for determining that the data of the data block corresponding to the currently received write request is target data;
correspondingly, the query module comprises:
and the hash calculation unit is used for calculating the fingerprint value of the target data by using a preset hash algorithm to obtain a target fingerprint value.
The application provides a deduplication query method, which comprises the following steps: receiving a write request, and determining that data corresponding to the currently received write request is target data; calculating a fingerprint value of the target data to obtain a target fingerprint value, and inquiring the target fingerprint value in a cuckoo filter; if the target fingerprint value is inquired in the cuckoo filter, increasing the metadata mapping from the current logical address of the target data to the physical address of the target data, otherwise, storing the target data, adding the target fingerprint value into the cuckoo filter, and increasing the metadata mapping from the current logical address of the target data to the physical address of the target data.
According to the method and the device, after a write request is received, a corresponding target fingerprint value is calculated, the target fingerprint value is inquired in the cuckoo filter, once the target fingerprint value is inquired, the current logical address of the target data is directly increased, if the target fingerprint value is not inquired, the target data is stored, and the target fingerprint value and the metadata mapping of the physical address of the target data are added, so that the B + tree does not need to be inquired, the corresponding metadata mapping can be inquired by directly utilizing the cuckoo filter, the frequency of inquiring from the re-deleted metadata to the B + tree on the disk is reduced, and the performance of the whole storage system is improved.
The application further provides a deduplication inquiry apparatus, deduplication inquiry equipment and a computer-readable storage medium, which have the beneficial effects described above and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flowchart of a deduplication query method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a deduplication query apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a deduplication query device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a schematic flow diagram of a deduplication query method provided in an embodiment of the present invention, and a specific scheme includes:
s101: receiving a write request, and determining target data corresponding to the currently received write request;
this step is intended to receive a write request, which is typically an IO request, i.e. may be a write IO request. Of course other forms of write requests are possible.
After receiving the write request, target data corresponding to the write request, that is, target data to be operated by the write request, or data to be written by the write request for the target data may be further determined.
The determination of the target data is not limited, and the data block corresponding to the currently received write request may be determined first, and then the corresponding target data may be determined from the data block. The target data may be determined based on information such as an address included in the write request.
S102: calculating a fingerprint value of the target data to obtain a target fingerprint value, and inquiring the target fingerprint value in a cuckoo filter;
the step aims to calculate the fingerprint value of the target data, and how to calculate the fingerprint value is not limited, a preset hash algorithm can be used for calculating the fingerprint value of the target data to obtain the target fingerprint value, the hash algorithm can be an original hash algorithm or any variant algorithm evolved on the basis of the hash algorithm, and the hash algorithm can be applied to the step as long as the fingerprint value of the target data can be calculated.
The target fingerprint value then needs to be queried in the cuckoo filter. The cuckoo filter uses an array structure + linked list structure to store the fingerprint values, the same fingerprint value being stored in two locations of the array. The fingerprint value stored in the array is re-hashed on a part of bit bits of the hash corresponding to the physical address.
The cuckoo filter is designed according to the common scheme, only two Hash tables are needed, two times of accesses can be guaranteed to be completed during searching, and compared with K times of accesses of K Hash functions of a bloom filter, the performance of the cuckoo filter for accessing the memory for multiple times is obviously superior to that of the bloom filter under the condition that the data volume is large and the data volume cannot be completely loaded in the memory. The query of the re-deleted metadata does not require absolute accuracy, and the query misjudgment only influences the re-deletion rate of the system and does not influence the data consistency, so the cuckoo filter can be well used for the query of the re-deleted metadata.
When the inserting position of the hash value in the array structure is calculated through a hash algorithm contained in the cuckoo filter, if an element exists in the inserting position, part of bits of the hash value can be obtained through a hash back-check function; and judging whether the partial bits are the same. If the hash values are the same, carrying out accurate HP inquiry on the hash values, otherwise, inserting the hash values into a linked list structure corresponding to the array structure.
Particularly, when the step is executed, the query precision of the hash value query interface queryHpinfo can be reduced through the cuckoo filter, and the precise query is changed into the fuzzy query, so that the query efficiency is improved.
It will be readily appreciated that the present embodiment defaults to storing the fingerprint value of the data in the cuckoo filter prior to this step. Step S103 is performed according to whether the target fingerprint value exists in the cuckoo filter.
S103: if the target fingerprint value is inquired in the cuckoo filter, increasing the metadata mapping from the current logical address of the target data to the physical address of the target data; otherwise, storing the target data, adding the target fingerprint value into the cuckoo filter, and increasing the metadata mapping from the current logical address of the target data to the physical address of the target data.
This step performs different processes depending on whether or not the target fingerprint value is queried. I.e. different processes are respectively applied to whether the target fingerprint value can be queried.
If the target fingerprint value can be inquired, the metadata mapping from the current logical address of the target data to the physical address of the target data can be directly increased. Thereby reducing the frequency of B + tree queries to improve system performance.
If the target fingerprint value cannot be inquired, the cuckoo filter does not store the target data before, the target data can be stored firstly, the target fingerprint value is added into the cuckoo filter, and the metadata mapping from the current logical address of the target data to the physical address of the target data is increased. The target fingerprint is convenient to be inquired subsequently, namely the target fingerprint value of the target data is established on the basis of not inquiring the target fingerprint value, so that the flow execution of finding the inquired target fingerprint value can be seen when the target fingerprint value is inquired next time.
In other words, this step actually involves two distinct implementations. In short, whether the target fingerprint value is queried or not, the metadata mapping from the current logical address to the physical address of the target data needs to be added, and the difference is that if the target fingerprint value can be queried, the target data does not need to be stored, and the target fingerprint value does not need to be added to the cuckoo filter.
In addition, before the metadata mapping from the current logical address of the target data to the physical address of the target data is added, the logical address of the write request corresponding to the target data may be obtained first, and the logical address of the write request corresponding to the target data may be determined as the current logical address of the target data. I.e. the logical address of the write request is taken as the current logical address of the target data. The physical address of the target data can be directly obtained by querying, and is not described here.
For the metadata mapping, the controller on the solid state disk may be used to manage the metadata mapping between the physical address and the corresponding logical address, and certainly, other storage units may also be used to store the metadata mapping, and it should be noted that the storage metadata mapping and the management metadata mapping may be two execution entities respectively, or may be stored and managed directly by one management and control entity, for example, the controller on the solid state disk.
In addition, after the storage pool is created, the physical addresses may be consolidated and remapped, and the logical addresses may be recoded using the pool information for the storage pool and the storage offset location information. How to re-encode the logical address is not limited herein, and may be configured and set by those skilled in the art according to the relationship between the physical address and the logical address.
According to the embodiment of the application, after a write request is received, a corresponding target fingerprint value is calculated, the target fingerprint value is inquired in a cuckoo filter, once the target fingerprint value is inquired, the current logical address of the target data is directly increased, if the target fingerprint value is not inquired, the target data is stored, and metadata mapping of the target fingerprint value and the physical address of the target data is added, so that a B + tree does not need to be inquired, the cuckoo filter can be directly used for inquiring the corresponding metadata mapping, the frequency of inquiring from re-deleted metadata to the B + tree on a disk is reduced, and the performance of the whole storage system is improved.
Based on the above embodiment, as a preferred embodiment, if a target fingerprint value is queried in the cuckoo filter, it may also be queried whether there is a metadata mapping from a fingerprint value to a physical address to which the target fingerprint value belongs, and if so, it is determined that a physical address in the metadata mapping from a fingerprint value to a physical address to which the target fingerprint value belongs is a physical address of the target data, and a metadata mapping from a current logical address of the target data to a physical address of the target data is added. Otherwise, the target data is stored.
There is no limitation on how to query the metadata mapping, and it may be queried in the HP metadata tree whether there is a metadata mapping of a fingerprint value to a physical address to which the target fingerprint value belongs. The HP metadata tree is a mapping of H to P, H is a 64bits fingerprint value, and P is a physical address where data is stored. In addition, the HP metadata tree may be specifically an HP metadata B + tree.
At this time, the corresponding flow of this embodiment is as follows:
the method comprises the following steps of firstly, receiving a write request, and determining data corresponding to the currently received write request as target data;
secondly, calculating a fingerprint value of the target data to obtain a target fingerprint value, and inquiring the target fingerprint value in a cuckoo filter; if the target fingerprint value is inquired in the cuckoo filter, entering a third step; if the target fingerprint value is not inquired in the cuckoo filter, entering a seventh step;
thirdly, increasing the metadata mapping from the current logical address of the target data to the physical address of the target data;
step four, inquiring whether the metadata mapping from the fingerprint value to which the target fingerprint value belongs to the physical address exists; if yes, entering the fifth step; otherwise, entering the sixth step;
specifically, when querying whether a metadata mapping exists, it may be queried in the HP metadata tree whether a metadata mapping exists from a fingerprint value to a physical address to which the target fingerprint value belongs. The HP metadata tree is not limited herein, and in one possible approach, the HP metadata tree may be an HP metadata B + tree.
And fifthly, determining that the physical address in the metadata mapping from the fingerprint value to the physical address of the target fingerprint value is the physical address of the target data, and increasing the metadata mapping from the current logical address of the target data to the physical address of the target data.
And sixthly, storing the target data.
And seventhly, storing the target data, adding the target fingerprint value into the cuckoo filter, and increasing the metadata mapping from the current logical address of the target data to the physical address of the target data.
Therefore, on the basis of the previous embodiment, the query process of the metadata mapping from the fingerprint value to the physical address of the target fingerprint value is added, and the physical address of the target data can be determined more quickly, so that the process of establishing the metadata mapping from the logical address to the physical address of the target data is quickly executed, and the establishment and subsequent query of the target fingerprint value are facilitated.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a deduplication query apparatus provided in an embodiment of the present invention, including:
the receiving module is used for receiving the write request and determining target data corresponding to the currently received write request;
the query module is used for calculating a fingerprint value of the target data to obtain a target fingerprint value and querying the target fingerprint value in the cuckoo filter;
and the execution module is used for increasing the metadata mapping from the logical address in the write request corresponding to the target data to the physical address of the target data if the target fingerprint value is inquired in the cuckoo filter, otherwise, storing the target data, adding the target fingerprint value into the cuckoo filter, and increasing the metadata mapping from the logical address in the write request corresponding to the target data to the physical address of the target data.
Based on the above embodiment, as a preferred embodiment, the method further includes:
a mapping query module, configured to query whether there is a metadata mapping from a fingerprint value to a physical address to which the target fingerprint value belongs if the target fingerprint value is queried in the cuckoo filter, determine, if so, that the physical address in the metadata mapping from the fingerprint value to the physical address to which the target fingerprint value belongs is the physical address of the target data, and perform a step of increasing a metadata mapping from a current logical address of the target data to the physical address of the target data, otherwise, perform a step of storing the target data;
based on the above embodiment, as a preferred embodiment, the method further includes:
the filter setting module is used for setting an array structure and a linked list structure corresponding to the cuckoo filter; and the same fingerprint value is stored in two positions in the array structure, and the fingerprint value is a secondary hash value made on part of bits of hash corresponding to the physical address.
Based on the above embodiment, as a preferred embodiment, the method further includes:
a position calculation module, configured to calculate, through a hash algorithm included in the cuckoo filter, an insertion position of the hash value in the array structure.
Based on the above embodiment, as a preferred embodiment, the method further includes:
the inserting module is used for acquiring partial bits of the hash value through a hash back-check function when the element exists in the inserting position; judging whether the partial bits are the same; if the hash values are the same, carrying out accurate HP query on the hash values; and if the hash value is not the same, inserting the hash value into the linked list structure corresponding to the array structure.
Based on the above embodiment, as a preferred embodiment, the method further includes:
the mapping relation management module is used for managing the mapping relation between the physical address and the corresponding logical address by using a controller on the solid state disk; wherein, after the storage pool is created, the physical addresses are integrated and remapped; and recoding the logic address by using the pool information of the storage pool and the storage offset position information.
Based on the above embodiment, as a preferred embodiment, the receiving module includes:
the data determining unit is used for determining that the data of the data block corresponding to the currently received write request is target data;
correspondingly, the query module comprises:
and the hash calculation unit is used for calculating the fingerprint value of the target data by using a preset hash algorithm to obtain a target fingerprint value.
The present application further provides a computer-readable storage medium, on which a computer program is stored, which, when executed, may implement the steps of the method provided by the above-mentioned embodiments. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The present application further provides a deduplication query device, which may include a memory and a processor, where the memory stores a computer program, and when the processor calls the computer program in the memory, the steps of the method provided in the foregoing embodiment may be implemented. Of course, the deduplication inquiry apparatus may further include various network interfaces, power supplies, and other components. Referring to fig. 3, fig. 3 is a schematic structural diagram of a deduplication query device provided in an embodiment of the present application, where the deduplication query device of the embodiment may include: a processor 2101 and a memory 2102.
Optionally, the deduplication inquiry apparatus may further include a communication interface 2103, an input unit 2104 and a display 2105, and a communication bus 2106.
The processor 2101, the memory 2102, the communication interface 2103, the input unit 2104, the display 2105 all communicate with each other via the communication bus 2106.
In the embodiment of the present application, the processor 2101 may be a Central Processing Unit (CPU), an application specific integrated circuit (asic), a digital signal processor, an off-the-shelf programmable gate array (fpga) or other programmable logic device.
The processor may call a program stored in the memory 2102. In particular, the processor may perform the operations performed by the deduplication querying device of the above embodiments.
The memory 2102 stores one or more programs, which may include program code including computer operating instructions, and in this embodiment, at least one program for implementing the following functions is stored in the memory:
receiving a write request, and determining that data corresponding to the currently received write request is target data;
calculating a fingerprint value of the target data to obtain a target fingerprint value, and inquiring the target fingerprint value in a cuckoo filter;
if the target fingerprint value is inquired in the cuckoo filter, increasing the metadata mapping from the current logical address of the target data to the physical address of the target data, otherwise, storing the target data, adding the target fingerprint value into the cuckoo filter, and increasing the metadata mapping from the current logical address of the target data to the physical address of the target data.
In one possible implementation, the memory 2102 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the computer.
Further, the memory 2102 may include high speed random access memory, and may also include non-volatile memory, such as at least one disk storage device or other volatile solid state storage device.
The communication interface 2103 may be an interface of a communication module, such as an interface of a GSM module.
The present application may also include a display 2105 and an input unit 2104, among others.
The structure of the deduplication query apparatus shown in fig. 3 does not constitute a limitation of the deduplication query apparatus in the embodiment of the present application, and in practical applications, the deduplication query apparatus may include more or less components than those shown in fig. 3, or some components in combination.
The embodiments are described in a progressive mode in the specification, the emphasis of each embodiment is on the difference from the other embodiments, and the same and similar parts among the embodiments can be referred to each other. For the system provided by the embodiment, the description is relatively simple because the system corresponds to the method provided by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A deduplication query method, comprising:
receiving a write request, and determining that data corresponding to the currently received write request is target data;
calculating a fingerprint value of the target data to obtain a target fingerprint value, and inquiring the target fingerprint value in a cuckoo filter;
if the target fingerprint value is inquired in the cuckoo filter, increasing the metadata mapping from the current logical address of the target data to the physical address of the target data, otherwise, storing the target data, adding the target fingerprint value into the cuckoo filter, and increasing the metadata mapping from the current logical address of the target data to the physical address of the target data.
2. The method of claim 1, wherein if the target fingerprint value is found in the cuckoo filter, further comprising:
inquiring whether metadata mapping from the fingerprint value to the physical address to which the target fingerprint value belongs exists;
if the target fingerprint value exists, determining that a physical address in a metadata mapping from the fingerprint value to the physical address to which the target fingerprint value belongs is the physical address of the target data, and executing a step of increasing the metadata mapping from the current logical address of the target data to the physical address of the target data; otherwise, executing the step of storing the target data;
correspondingly, after the target data is stored, the metadata mapping from the target fingerprint value to the physical address of the target data is added.
3. The method of claim 1, wherein before querying the target fingerprint value in the cuckoo filter, the method further comprises:
setting an array structure and a linked list structure corresponding to the cuckoo filter; and the same fingerprint value is stored in two positions in the array structure, and the fingerprint value is a secondary hash value made on part of bits of hash corresponding to the physical address.
4. A deduplication query method as claimed in claim 3, wherein the hash value of the data at the physical address, when inserted into the array structure, further comprises:
calculating, by a hashing algorithm included in the cuckoo filter, an insertion location of the hash value in the array structure.
5. A deduplication query method as claimed in claim 4, wherein when calculating the insertion position of the hash value in the array structure by a hashing algorithm included in the cuckoo filter, if there is an element in the insertion position, further comprising:
obtaining part of bits of the hash value through a hash back-check function;
judging whether the partial bits are the same;
if the hash values are the same, performing accurate HP query on the hash values;
and if not, inserting the hash value into the linked list structure corresponding to the array structure.
6. The deduplication query method of claim 3, further comprising:
managing the mapping relation between the physical address and the corresponding logical address by using a controller on the solid state disk;
wherein, after the storage pool is created, the physical addresses are integrated and remapped; and recoding the logic address by using the pool information of the storage pool and the storage offset position information.
7. The method of claim 1, wherein determining that data corresponding to a currently received write request is target data comprises:
determining data of a data block corresponding to a currently received write request as target data;
correspondingly, calculating the fingerprint value of the target data to obtain a target fingerprint value includes:
and calculating the fingerprint value of the target data by using a preset hash algorithm to obtain a target fingerprint value.
8. A deduplication query apparatus, comprising:
the receiving module is used for receiving the write request and determining that the data corresponding to the currently received write request is target data;
the query module is used for calculating a fingerprint value of the target data to obtain a target fingerprint value and querying the target fingerprint value in the cuckoo filter;
and the execution module is used for increasing the metadata mapping from the logical address in the write request corresponding to the target data to the physical address of the target data if the target fingerprint value is inquired in the cuckoo filter, otherwise, storing the target data, adding the target fingerprint value into the cuckoo filter, and increasing the metadata mapping from the logical address in the write request corresponding to the target data to the physical address of the target data.
9. A deduplication query device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the deduplication query method of any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the deduplication query method of any one of claims 1 to 7.
CN202211482862.3A 2022-11-24 2022-11-24 Method, device, equipment and storage medium for deduplication query Pending CN115729471A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211482862.3A CN115729471A (en) 2022-11-24 2022-11-24 Method, device, equipment and storage medium for deduplication query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211482862.3A CN115729471A (en) 2022-11-24 2022-11-24 Method, device, equipment and storage medium for deduplication query

Publications (1)

Publication Number Publication Date
CN115729471A true CN115729471A (en) 2023-03-03

Family

ID=85298001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211482862.3A Pending CN115729471A (en) 2022-11-24 2022-11-24 Method, device, equipment and storage medium for deduplication query

Country Status (1)

Country Link
CN (1) CN115729471A (en)

Similar Documents

Publication Publication Date Title
CN110018998B (en) File management method and system, electronic equipment and storage medium
CN111125447A (en) Metadata access method, device and equipment and readable storage medium
CN107577436B (en) Data storage method and device
US11580162B2 (en) Key value append
US11221999B2 (en) Database key compression
KR102509913B1 (en) Method and apparatus for maximized dedupable memory
CN111966281B (en) Data storage device and data processing method
CN107273306B (en) Data reading and writing method for solid state disk and solid state disk
CN115114232A (en) Method, device and medium for enumerating historical version objects
US9524236B1 (en) Systems and methods for performing memory management based on data access properties
CN113835639B (en) I/O request processing method, device, equipment and readable storage medium
CN111831691A (en) Data reading and writing method and device, electronic equipment and storage medium
CN107148612B (en) Method and device for expanding user partition
CN109388644B (en) Data updating method and device
CN107229421B (en) Method and device for creating video data storage system, method and device for writing file into video data storage system and method and device for reading video data storage system
CN115437579B (en) Metadata management method and device, computer equipment and readable storage medium
KR102071072B1 (en) Method for managing of memory address mapping table for data storage device
CN116450607A (en) Data processing method, device and storage medium
CN115729471A (en) Method, device, equipment and storage medium for deduplication query
CN115576956A (en) Data processing method, system, equipment and storage medium
CN112486861B (en) Solid state disk mapping table data query method and device, computer equipment and storage medium
CN111104435B (en) Metadata organization method, device and equipment and computer readable storage medium
CN114442961A (en) Data processing method and device, computer equipment and storage medium
CN113703671B (en) Data block erasing method and related device
CN112948376B (en) IP geographical position information query method, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination