CN112463032A - Performance optimization method, system and device for deduplication module of storage system - Google Patents

Performance optimization method, system and device for deduplication module of storage system Download PDF

Info

Publication number
CN112463032A
CN112463032A CN202011238075.5A CN202011238075A CN112463032A CN 112463032 A CN112463032 A CN 112463032A CN 202011238075 A CN202011238075 A CN 202011238075A CN 112463032 A CN112463032 A CN 112463032A
Authority
CN
China
Prior art keywords
data
zero data
zero
identification unit
logical volume
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011238075.5A
Other languages
Chinese (zh)
Inventor
夏方健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202011238075.5A priority Critical patent/CN112463032A/en
Publication of CN112463032A publication Critical patent/CN112463032A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0665Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a performance optimization method, a system and a device of a deduplication module of a storage system, which are all based on an all-zero data identification unit and can: the host issues io data to the logical volume; the logical volume carries out granularity division on the io data sent by the host and sends the io data to the all-zero data identification unit; the all-zero data identification unit identifies all-zero data in the data issued by the logical volume and issues non-all-zero data in the data issued by the logical volume to the deduplication module of the storage system; after identifying all-zero data in data issued by the logical volume, the all-zero data identification unit establishes a mapping relation between a physical address preset in the system and used for storing all-zero data and an identified address of a storage volume of all-zero data; and the deduplication module performs deduplication processing on the non-all-zero data issued by the all-zero data identification unit. The invention is used for optimizing the performance of the deduplication module in the storage system so as to improve the performance and the bandwidth of the storage system.

Description

Performance optimization method, system and device for deduplication module of storage system
Technical Field
The invention relates to the field of storage, in particular to a method, a system and a device for optimizing the performance of a deduplication module of a storage system.
Background
In the big data era, the data deduplication technology is born because the data deduplication technology cannot keep pace with the data increase speed by simply increasing the storage capacity.
Data Deduplication, namely Deduplication (Deduplication), is a technology capable of automatically identifying and deleting duplicate data, and belongs to a high-level data compression mode. After the data deduplication function is started, the system identifies the repeated data through an algorithm, only one copy of the same data is reserved, redundant repeated data is deleted, and the original repeated data is replaced by the reference pointing to the reserved single copy. In this way, the aims of eliminating redundant data and reducing the requirement of storage capacity are achieved.
Solid state disks (SSD disks) have lifetime limitations. The mode of reprocessing after data writing needs to write data on a magnetic disk first, then read the data to delete repeated data when the data is idle, and then write the data into a storage space. Therefore, deduplication compression is typically implemented in an online manner in a full flash array.
The deduplication technology plays a significant role in space saving and io efficiency, but the usage flow of this technology in the prior art (as shown in fig. 4) is generally as follows:
(1) the host sends data (io) down, and then the volume divides the data into different granularities according to the blocking mode adopted by the deduplication algorithm and sends the granularity to the deduplication module;
(2) the deduplication module calculates a fingerprint value of the data through a Hash algorithm, judges whether the fingerprint value is an all-zero data fingerprint value or not, and if not, performs the next process; if yes, comparing whether the data is all-zero data, if yes, directly processing the all-zero data, establishing an LP mapping relation and a PL mapping relation, ending the process, if not, not all-zero data (hash collision), and independently dropping the data (namely storing the data to a PBA);
(3) inquiring the fingerprint database according to the calculated fingerprint value, judging whether the fingerprint database has the fingerprint value, if so, processing according to repeated data, and if not, processing according to non-repeated data;
(4) when non-repeated data is processed, the fingerprint value of the data is inserted into the fingerprint database, then the data is normally landed, and the logical address and the physical address of the data are mapped, so that the same fingerprint value exists in the fingerprint database when the same data is issued next time, and the data can be identified as repeated data.
For the above process, in practical tests, it is found that the performance of the deduplication module of the storage system is approximately 70-80% of that of the non-deduplication module. It can be seen that in the prior art, when a module is deleted repeatedly, the performance and bandwidth of the system are affected.
Therefore, the invention provides a performance optimization method, system and device for a deduplication module of a storage system, which are used for solving the problems.
Disclosure of Invention
In view of the above disadvantages of the prior art, the present invention provides a method, a system, and a device for optimizing performance of a deduplication module of a storage system, which are used for optimizing performance of the deduplication module in the storage system to improve performance and bandwidth of the storage system.
In a first aspect, the present invention provides a performance optimization method for a deduplication module of a storage system, where the performance optimization method is based on an all-zero data identification unit, the all-zero data identification unit is implemented by hardware, and the performance optimization method includes the steps of:
the host issues io data to the logical volume;
the logical volume carries out granularity division on the io data sent by the host and sends the io data to the all-zero data identification unit;
the all-zero data identification unit identifies all-zero data in the data issued by the logical volume and issues non-all-zero data in the data issued by the logical volume to the deduplication module of the storage system;
after identifying all-zero data in data issued by the logical volume, the all-zero data identification unit establishes a mapping relation between a physical address preset in the system and used for storing all-zero data and an identified address of a storage volume of all-zero data;
and the deduplication module performs deduplication processing on the non-all-zero data issued by the all-zero data identification unit.
Further, after the mapping relationship between the physical address for storing the all-zero data preset in the system and the storage volume address of the identified all-zero data is established, the all-zero data identification unit returns the processing result of the all-zero data to the logical volume;
when the received processing result of the logical volume is failure, the logical volume retransmits the data to the all-zero data identification unit; and when the received processing result of the logical volume is successful, finishing all-zero data processing.
Further, the deduplication module performs deduplication processing on the non-all-zero data issued by the all-zero data identification unit, and the implementation method includes the following steps:
s1, calculating the fingerprint value of the non-all-zero data issued by the all-zero data identification unit through a Hash algorithm;
s2, judging whether the fingerprint value exists in the fingerprint database:
if yes, the non-all-zero data issued by the all-zero data identification unit is judged to be the repeated data, the repeated data is not subjected to tray dropping, and then step S3 is executed;
if not, judging that the non-all-zero data issued by the all-zero data identification unit is non-repeated data, disking the non-all-zero data, writing the fingerprint value into a fingerprint database, establishing a mapping relation between the fingerprint value and a physical address of corresponding data, and then executing step S3;
and step S3, establishing the mapping relation between the volume address and the physical address for storing the non-all-zero data.
In a second aspect, the present invention provides a performance optimization system for a deduplication module of a storage system, where the performance optimization system includes an all-zero data identification unit, where the all-zero data identification unit is implemented by hardware, and the performance optimization system further includes:
the host is used for sending the io data to the logical volume;
the logical volume is used for receiving the io data sent by the host, performing granularity division on the received io data and sending the io data to the all-zero data identification unit;
the all-zero data identification unit is used for receiving the data issued by the logical volume, identifying all-zero data in the data issued by the logical volume, and issuing non-all-zero data in the data issued by the logical volume to the deduplication module of the storage system; after identifying all-zero data in data issued by the logical volume, establishing a mapping relation between a physical address preset in the system and used for storing the all-zero data and an identified storage volume address of the all-zero data;
and the deduplication module is used for performing deduplication processing on the non-all-zero data issued by the all-zero data identification unit.
Furthermore, the all-zero data identification unit is further configured to return a processing result of the all-zero data to the logical volume after establishing a mapping relationship between a physical address preset in the system and used for storing the all-zero data and an identified storage volume address of the all-zero data;
the logical volume is also used for retransmitting the data to the all-zero data identification unit when the received processing result is failure; and the controller is further configured to control the end of the current all-zero data processing when the received processing result is successful.
Further, the deduplication module comprises:
the fingerprint value calculating unit is used for calculating the fingerprint value of the non-all-zero data sent by the all-zero data identifying unit through a Hash algorithm;
the judging unit is used for judging whether the fingerprint value exists in a fingerprint database or not;
the first processing unit is used for preventing the data from falling when the judging unit judges that the non-all-zero data issued by the all-zero data identifying unit is the repeated data;
the second processing unit is used for destaging the non-all-zero data when the judging unit judges that the non-all-zero data issued by the all-zero data identifying unit is non-repeated data, writing the fingerprint value into a fingerprint database and establishing a mapping relation between the fingerprint value and a physical address of the corresponding data;
and the address mapping unit is used for establishing a mapping relation between the volume address storing the non-all-zero data and the physical address.
In a third aspect, the present invention provides a terminal, including:
a processor;
a memory for storing instructions for execution by the processor;
wherein the processor is configured to perform the method of the above aspects.
The beneficial effect of the invention is that,
according to the performance optimization method, system and device for the deduplication module of the storage system, all-zero data is identified in advance through hardware, fingerprint calculation (hash value calculation) and byte-by-byte comparison of all-zero data in a traditional mode are omitted, time consumption of system software is saved to a certain extent, and therefore performance of the system during deduplication is improved.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.
FIG. 2 is a schematic block diagram of a system of one embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Fig. 4 is a flow chart illustrating a method for using the deduplication technology in the prior art.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The following explains key terms appearing in the present invention.
Fingerprint value: and one data value, namely a Hash value (Hash value), obtained after the data is processed by the Hash algorithm.
P _ zero: PBA addresses of all zero data preset in the system.
PBA: io physical address where data is stored.
LBA: the volume address where the io data is stored.
HP mapping relation: is the mapping relation between the Hash value of the io data and the PBA of the io data.
P _ zero-L mapping relation: the PL mapping relationship is a mapping relationship between the P _ zero of the io data and the LBA of the io data.
The LP mapping relationship: is the mapping relationship between the LBA and PBA of the io data.
FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.
As shown in fig. 1, the method 100 is based on an all-zero data identification unit, and includes:
step 110, the host issues io data to the logical volume.
And step 120, the logical volume performs granularity division on the io data sent by the host and sends the io data to the all-zero data identification unit.
Specifically, the granularity division may divide the io data sent by the host into different granularities (4k/8k) according to a blocking manner adopted by an existing deduplication algorithm (corresponding to an existing deduplication technology).
And calling the all-zero data identification unit by the volume layer plane to send the different granularities obtained by the division to the deduplication module.
Step 130, the all-zero data identification unit identifies all-zero data in the data issued by the logical volume, and issues non-all-zero data in the data issued by the logical volume to a deduplication module of the storage system; after identifying all-zero data in data issued by the logical volume, the all-zero data identification unit establishes a mapping relationship between a physical address preset in the system and used for storing all-zero data and an identified address of a storage volume of all-zero data.
The all-zero data identification unit is implemented by hardware (such as a control chip), and when the all-zero data identification unit is used, all-zero data in data (with different granularities obtained by the division) issued by the logical volume can be identified, and non-all-zero data issued by the logical volume can be directly issued/transmitted to a deduplication module of the storage system for subsequent processing. The invention adopts a hardware mode to identify the all-zero data in the data issued by the logical volume, which is beneficial to increasing the identification efficiency to a certain extent.
Specifically, the all-zero data identification unit directly processes the all-zero data after identifying the all-zero data, and establishes the mapping relation of P _ zero-L. The mapping relationship of P _ zero-L is the mapping relationship between system P _ zero and LBA (volume address) of all-zero data, which is referred to as PL mapping relationship for short.
The all-zero data identification unit can identify all-zero data in data issued by the logical volume and directly process the identified all-zero data, and can continuously issue non-all-zero data in the data issued by the logical volume to start to perform deduplication processing of the non-all-zero data.
In specific implementation, the logical volume may call the all-zero data identification unit through a set interface (the interface is in a context form).
In specific implementation, the all-zero data identification unit can call an interface to establish a mapping relation of P _ zero-L.
Optionally, the all-zero data identification unit returns a processing result of the all-zero data to the logical volume after establishing a mapping relationship between a physical address preset in the system and used for storing the all-zero data and an identified storage volume address of the all-zero data.
The processing result is a result of mapping relationship between the physical address for storing the all-zero data preset in the system by the all-zero data identification unit and the storage volume address of the identified all-zero data, specifically success/failure.
When the received processing result of the logical volume is failure, the logical volume retransmits the data to the all-zero data identification unit; and when the received processing result of the logical volume is successful, finishing all-zero data processing.
In step 140, the deduplication module performs deduplication processing on the non-all-zero data issued by the all-zero data identification unit.
Specifically, the implementation method of step 140 includes:
step S1, calculating the fingerprint value of the non-all-zero data issued by the all-zero data identification unit through a Hash algorithm;
step S2, determining whether the fingerprint value exists in the fingerprint database:
if yes, the non-all-zero data issued by the all-zero data identification unit is judged to be the repeated data, the repeated data is not subjected to tray dropping, and then step S3 is executed;
if not, judging that the non-all-zero data issued by the all-zero data identification unit is non-repeated data, disking the non-all-zero data, writing the fingerprint value into a fingerprint database, establishing a mapping relation between the fingerprint value and a physical address of corresponding data, and then executing step S3;
step S3, establishing the LBA-PBA mapping relationship of the non-all-zero data, that is, establishing the mapping relationship between the volume address and the physical address for storing the non-all-zero data.
It can be seen that the method 100 identifies the all-zero data in advance through hardware, and omits the steps of performing fingerprint calculation (hash value calculation) and byte-by-byte comparison on the all-zero data in the conventional manner, thereby being beneficial to saving time consumption on system software to a certain extent, and being beneficial to improving the performance of the system when the system is repeatedly deleted.
Fig. 2 is an embodiment of a performance optimization system of a deduplication module of a storage system according to the present invention.
As shown in fig. 2, the performance optimization system 200 includes an all-zero data identification unit 201, where the all-zero data identification unit 201 is implemented by hardware, and the performance optimization system 200 further includes:
the host 202 is used for issuing io data to the logical volume;
the logical volume 203 is used for receiving the io data sent by the host, performing granularity division on the received io data, and sending the io data to the all-zero data identification unit;
an all-zero data identification unit 201, configured to receive data issued by the logical volume, identify all-zero data in the data issued by the logical volume, and issue non-all-zero data in the data issued by the logical volume to a deduplication module of the storage system; the mapping relation between the physical address preset in the system and used for storing the all-zero data and the storage volume address of the identified all-zero data is established after all-zero data in the data issued by the logical volume is identified;
and the deduplication module 204 is configured to perform deduplication processing on the non-all-zero data sent by the all-zero data identification unit.
Optionally, as an embodiment of the present invention, the all-zero data identification unit 201 is further configured to, after a mapping relationship between a physical address preset in the system and used for storing all-zero data and an identified storage volume address of all-zero data is established, return a processing result of all-zero data to the logical volume callback;
the logical volume 203 is further configured to, when the received processing result is a failure, re-issue data to the all-zero data identification unit; and the controller is further configured to control the end of the current all-zero data processing when the received processing result is successful.
Optionally, as an embodiment of the present invention, the deduplication module 204 includes:
a fingerprint value calculating unit 2041, configured to calculate, by using a hash algorithm, a fingerprint value of non-all-zero data sent by the all-zero data identifying unit;
a determining unit 2042, configured to determine whether the fingerprint value exists in the fingerprint database;
a first processing unit 2043, configured to, when the determining unit 2042 determines that the non-all-zero data sent by the all-zero data identifying unit is repeated data, not drop the data;
a second processing unit 2044, configured to, when the determining unit 2042 determines that the non-all-zero data sent by the all-zero data identifying unit is non-duplicate data, drop the non-all-zero data, write the fingerprint value into a fingerprint database, and establish a mapping relationship between the fingerprint value and a physical address of data corresponding to the fingerprint value;
and an address mapping unit 2045, configured to establish a mapping relationship between the volume address and the physical address, where the volume address stores the non-all-zero data.
Fig. 3 is a schematic structural diagram of a terminal 300 according to an embodiment of the present invention, where the terminal 300 may be used to execute the method 100 according to the embodiment of the present invention.
Among them, the terminal 300 may include: a processor 310, a memory 320, and a communication unit 330. The components communicate via one or more buses, and those skilled in the art will appreciate that the architecture of the servers shown in the figures is not intended to be limiting, and may be a bus architecture, a star architecture, a combination of more or less components than those shown, or a different arrangement of components.
The memory 320 may be used for storing instructions executed by the processor 310, and the memory 320 may be implemented by any type of volatile or non-volatile storage terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. The executable instructions in memory 320, when executed by processor 310, enable terminal 300 to perform some or all of the steps in the method embodiments described below.
The processor 310 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory 320 and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions. For example, the processor 310 may include only a Central Processing Unit (CPU). In the embodiment of the present invention, the CPU may be a single operation core, or may include multiple operation cores.
A communication unit 330, configured to establish a communication channel so that the storage terminal can communicate with other terminals. And receiving user data sent by other terminals or sending the user data to other terminals.
The same and similar parts in the various embodiments in this specification may be referred to each other. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to the description in the method embodiment.
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (7)

1. A performance optimization method of a deduplication module of a storage system is characterized in that the performance optimization method is based on an all-zero data identification unit, the all-zero data identification unit is realized by hardware, and the performance optimization method comprises the following steps:
the host issues io data to the logical volume;
the logical volume carries out granularity division on the io data sent by the host and sends the io data to the all-zero data identification unit;
the all-zero data identification unit identifies all-zero data in the data issued by the logical volume and issues non-all-zero data in the data issued by the logical volume to the deduplication module of the storage system;
after identifying all-zero data in data issued by the logical volume, the all-zero data identification unit establishes a mapping relation between a physical address preset in the system and used for storing all-zero data and an identified address of a storage volume of all-zero data;
and the deduplication module performs deduplication processing on the non-all-zero data issued by the all-zero data identification unit.
2. The performance optimization method of the deduplication module of the storage system according to claim 1, wherein the all-zero data identification unit returns a processing result of the all-zero data to the logical volume after establishing a mapping relationship between a physical address for storing the all-zero data preset in the system and a storage volume address of the identified all-zero data;
when the received processing result of the logical volume is failure, the logical volume retransmits the data to the all-zero data identification unit; and when the received processing result of the logical volume is successful, finishing all-zero data processing.
3. The method for optimizing the performance of the deduplication module of the storage system according to claim 1, wherein the deduplication module performs deduplication processing on non-all-zero data issued by an all-zero data identification unit, and the implementation method includes the steps of:
s1, calculating the fingerprint value of the non-all-zero data issued by the all-zero data identification unit through a Hash algorithm;
s2, judging whether the fingerprint value exists in the fingerprint database:
if yes, the non-all-zero data issued by the all-zero data identification unit is judged to be the repeated data, the repeated data is not subjected to tray dropping, and then step S3 is executed;
if not, judging that the non-all-zero data issued by the all-zero data identification unit is non-repeated data, disking the non-all-zero data, writing the fingerprint value into a fingerprint database, establishing a mapping relation between the fingerprint value and a physical address of corresponding data, and then executing step S3;
and S3, establishing the mapping relation between the volume address and the physical address for storing the non-all-zero data.
4. A performance optimization system of a deduplication module of a storage system is characterized by comprising an all-zero data identification unit, wherein the all-zero data identification unit is realized by hardware, and the performance optimization system comprises:
the host is used for sending the io data to the logical volume;
the logical volume is used for receiving the io data sent by the host, performing granularity division on the received io data and sending the io data to the all-zero data identification unit;
the all-zero data identification unit is used for receiving the data issued by the logical volume, identifying all-zero data in the data issued by the logical volume, and issuing non-all-zero data in the data issued by the logical volume to the deduplication module of the storage system; after identifying all-zero data in data issued by the logical volume, establishing a mapping relation between a physical address preset in the system and used for storing the all-zero data and an identified storage volume address of the all-zero data;
and the deduplication module is used for performing deduplication processing on the non-all-zero data issued by the all-zero data identification unit.
5. The system according to claim 4, wherein the all-zero data identification unit is further configured to return a processing result of the all-zero data to the logical volume after establishing a mapping relationship between a physical address preset in the system and used for storing the all-zero data and an address of the storage volume of the identified all-zero data;
the logical volume is also used for retransmitting the data to the all-zero data identification unit when the received processing result is failure; and the controller is further configured to control the end of the current all-zero data processing when the received processing result is successful.
6. The system of claim 4, wherein the deduplication module comprises:
the fingerprint value calculating unit is used for calculating the fingerprint value of the non-all-zero data sent by the all-zero data identifying unit through a Hash algorithm;
the judging unit is used for judging whether the fingerprint value exists in a fingerprint database or not;
the first processing unit is used for preventing the data from falling when the judging unit judges that the non-all-zero data issued by the all-zero data identifying unit is the repeated data;
the second processing unit is used for destaging the non-all-zero data when the judging unit judges that the non-all-zero data issued by the all-zero data identifying unit is non-repeated data, writing the fingerprint value into a fingerprint database and establishing a mapping relation between the fingerprint value and a physical address of the corresponding data;
and the address mapping unit is used for establishing a mapping relation between the volume address storing the non-all-zero data and the physical address.
7. A terminal, comprising:
a processor;
a memory for storing instructions for execution by the processor;
wherein the processor is configured to perform the method of any one of claims 1-3.
CN202011238075.5A 2020-11-09 2020-11-09 Performance optimization method, system and device for deduplication module of storage system Withdrawn CN112463032A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011238075.5A CN112463032A (en) 2020-11-09 2020-11-09 Performance optimization method, system and device for deduplication module of storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011238075.5A CN112463032A (en) 2020-11-09 2020-11-09 Performance optimization method, system and device for deduplication module of storage system

Publications (1)

Publication Number Publication Date
CN112463032A true CN112463032A (en) 2021-03-09

Family

ID=74826664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011238075.5A Withdrawn CN112463032A (en) 2020-11-09 2020-11-09 Performance optimization method, system and device for deduplication module of storage system

Country Status (1)

Country Link
CN (1) CN112463032A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114595486A (en) * 2022-05-10 2022-06-07 深圳佰维存储科技股份有限公司 Zero data identification method and device, readable storage medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114595486A (en) * 2022-05-10 2022-06-07 深圳佰维存储科技股份有限公司 Zero data identification method and device, readable storage medium and electronic equipment
CN114595486B (en) * 2022-05-10 2022-08-05 深圳佰维存储科技股份有限公司 Zero data identification method and device, readable storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN109656895B (en) Distributed storage system, data writing method, device and storage medium
CN110543281A (en) Storage compression implementation method, device, equipment and storage medium
WO2021089036A1 (en) Data transmission method, network device, network system and chip
CN111857574A (en) Write request data compression method, system, terminal and storage medium
US10664193B2 (en) Storage system for improved efficiency of parity generation and minimized processor load
WO2024212783A1 (en) Data write method and apparatus, and solid-state disk, electronic device and non-volatile readable storage medium
CN113794764A (en) Request processing method and medium for server cluster and electronic device
CN111984203A (en) Data deduplication method and device, electronic equipment and storage medium
CN111475335A (en) Method, system, terminal and storage medium for fast recovery of database
CN118312102A (en) IO request processing method and device, storage equipment and storage medium
CN110910249A (en) Data processing method and device, node equipment and storage medium
CN112463032A (en) Performance optimization method, system and device for deduplication module of storage system
CN115904795A (en) Data storage method and device in storage system
CN111338981B (en) Memory fragmentation prevention method and system and storage medium
US8799580B2 (en) Storage apparatus and data processing method
CN116756019A (en) Memory leakage positioning method and device, electronic equipment and readable storage medium
CN115151902A (en) Cluster capacity expansion method and device, storage medium and electronic equipment
CN113986134B (en) Method for storing data, method and device for reading data
KR20200132521A (en) Apparatus for guaranteeing integrity of state database in blockchain-based environment and method thereof
CN111858665B (en) Method, system, terminal and storage medium for improving soft copy reading performance
CN111611104B (en) InfluxDB data backup method, system and terminal equipment
CN109445686B (en) Storage disk and data access method
CN114675995A (en) Data backup method and device and electronic equipment
CN110908821A (en) Method, device, equipment and storage medium for task failure management
CN112445413A (en) Data storage method and device and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210309