CN110727404A - Data deduplication method and device based on storage end and storage medium - Google Patents

Data deduplication method and device based on storage end and storage medium Download PDF

Info

Publication number
CN110727404A
CN110727404A CN201910927592.4A CN201910927592A CN110727404A CN 110727404 A CN110727404 A CN 110727404A CN 201910927592 A CN201910927592 A CN 201910927592A CN 110727404 A CN110727404 A CN 110727404A
Authority
CN
China
Prior art keywords
data
fingerprint
storage
deduplication
response
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910927592.4A
Other languages
Chinese (zh)
Inventor
陈东河
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Wave Intelligent Technology Co Ltd
Original Assignee
Suzhou Wave Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Wave Intelligent Technology Co Ltd filed Critical Suzhou Wave Intelligent Technology Co Ltd
Priority to CN201910927592.4A priority Critical patent/CN110727404A/en
Publication of CN110727404A publication Critical patent/CN110727404A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Abstract

The invention discloses a data deduplication method based on a storage end, which comprises the following steps: receiving data transmitted by a host end; judging whether the load utilization rate of the storage end is greater than a threshold value; in response to the load utilization rate not being greater than the threshold, writing the data into a storage disk after deduplication processing; in response to the load utilization being greater than the threshold, writing the data directly to the storage disk. The invention also discloses a computer device and a readable storage medium. The method disclosed by the invention automatically selects different deduplication modes according to the storage load, thereby ensuring that the data is deduplicated to save the storage space and avoiding the influence of the storage performance loss caused by deduplication on the user service at the peak.

Description

Data deduplication method and device based on storage end and storage medium
Technical Field
The invention relates to the field of data processing, in particular to a data deduplication method and device based on a storage end and a storage medium.
Background
Data deduplication is a main technology for data reduction in enterprise storage, and deduplication is to store only one copy of the same data in storage, so that a large amount of storage space can be saved for a large amount of redundant data through deduplication processing, storage space is saved for enterprises, and storage cost investment is reduced.
The deduplication greatly affects the storage performance while saving space, and is mainly reflected in that operations of fingerprint data calculation, writing, comparison and metadata record mapping writing are additionally generated during writing of a new data block. How to save space through deduplication and ensure that the reduction of storage performance does not affect the service of a user is always a problem to be considered for storage deduplication, namely, the balance between capacity space saving and performance guarantee cannot be achieved.
Therefore, a data deduplication method is urgently needed.
Disclosure of Invention
In view of the above, in order to overcome at least one aspect of the above problems, an embodiment of the present invention provides a data deduplication method based on a storage side, including:
receiving data transmitted by a host end;
judging whether the load utilization rate of the storage end is greater than a threshold value;
in response to the load utilization rate not being greater than the threshold, writing the data into a storage disk after deduplication processing;
in response to the load utilization being greater than the threshold, writing the data directly to the storage disk.
In some embodiments, in response to the load utilization being greater than the threshold, writing the data directly to the storage disk further comprises:
and writing the data into the storage disk after marking the data as not to be deleted again.
In some embodiments, further comprising:
and in response to the load utilization rate not being greater than the threshold value, performing deduplication processing on the data marked as not being deduplicated in the storage disk.
In some embodiments, the deduplication process comprises:
dividing data to be processed into a plurality of data blocks;
calculating a fingerprint of each data block;
sequentially judging whether each fingerprint exists in a fingerprint thermal data cache;
in response to the fingerprint being present in the fingerprint hot data cache, data chunks corresponding to the fingerprint present in the fingerprint hot data cache are deleted.
In some embodiments, further comprising:
in response to the fingerprint not being present in the fingerprint thermal data cache, determining whether the fingerprint is present in the fingerprint repository;
and responding to the fingerprint not existing in the fingerprint database, and writing the data blocks corresponding to the fingerprint not existing in the fingerprint database into the storage disk.
In some embodiments, further comprising:
in response to the fingerprint being present in the fingerprint repository, deleting the data chunks corresponding to the fingerprint present in the fingerprint repository.
In some embodiments, further comprising:
updating the fingerprints existing in the fingerprint database into the fingerprint hot data cache.
In some embodiments, the removing the data chunks further comprises:
acquiring a reference address of the data block which is identical to the deleted data block and is written into the storage disk;
and writing the reference address into the storage disk.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer apparatus, including:
at least one processor; and
a memory storing a computer program operable on the processor, wherein the processor executes the program to perform any of the steps of the storage-side based data deduplication method described above.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer-readable storage medium storing a computer program, which when executed by a processor performs the steps of any of the storage-side based data deduplication methods described above.
The invention has one of the following beneficial technical effects: the method disclosed by the invention automatically selects different deduplication modes according to the storage load, thereby ensuring that the data is deduplicated to save the storage space and avoiding the influence of the storage performance loss caused by deduplication on the user service at the peak.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a data deduplication method based on a storage side according to an embodiment of the present invention;
fig. 2 is a flowchart of a data deduplication method based on a storage side according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a computer device provided in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
It should be noted that, in the embodiment of the present invention, data deduplication refers to storing only one copy of the same data in the storage, and other duplicate data blocks retain one address and are referred to the unique storage block.
According to an aspect of the present invention, an embodiment of the present invention provides a data deduplication method based on a storage side, as shown in fig. 1, which may include the steps of: s1, receiving data transmitted by the host end; judging whether the load utilization rate of the storage end is greater than a threshold value; s2, in response to the fact that the load utilization rate is not larger than the threshold value, writing the data into a storage disk after deduplication processing; s3, responding to the load utilization rate being larger than the threshold value, the data is directly written into the storage disk.
The method determines the deduplication processing method of the data by judging the utilization rate of the load of the storage end, when the load pressure of the storage end is small, the online deduplication mode can be selected to directly perform deduplication processing on the data and then drop the data, the frequency of data writing on the disk for multiple times due to the fact that the data are written into the disk and then subjected to deduplication processing is avoided, the data writing frequency is reduced to reduce abrasion of the disk, and when the load pressure of the storage end is large, the offline deduplication mode can be selected to perform data deduplication processing after the data are dropped and then subjected to load pressure reduction. Therefore, the method can not only save space through the deduplication, but also ensure that the performance of the storage end is not reduced so as not to influence the service of the user.
The following describes in detail a flow chart of the data deduplication method based on the storage side shown in fig. 2.
First, data transmitted from a host is received.
Specifically, the data transmitted by the host may be IO data, and since the IO data to be stored may have data that has been stored repeatedly before, the repeated data in the IO data needs to be deleted, so as to save a large amount of storage space.
Then, different deduplication methods need to be selected according to the load utilization of the storage end.
It should be noted that the load utilization rate may be a CPU or a memory utilization rate.
In some embodiments, in response to the load utilization being less than the threshold, the data is written to a storage disk after deduplication processing.
Specifically, when the utilization rate of the CPU or the memory is less than the set threshold, an online deduplication mode may be selected, which not only avoids the performance degradation of the storage end, but also reduces the wear of the disk.
In some embodiments, in response to the load utilization being greater than the threshold, the data is written directly to the storage disk.
Specifically, when the utilization rate of the CPU or the memory is greater than the set threshold, an offline deduplication mode may be selected, so as to avoid further occupying the performance from affecting the normal service under the condition of a large load pressure of the storage end.
In some embodiments, in response to the load utilization being greater than the threshold, writing the data directly to the storage disk further comprises: and writing the data into the storage disk after marking the data as not to be deleted again. According to some further embodiments, in response to the load utilization not being greater than the threshold, deduplication processing is performed on data in the storage disk that is marked as not being deduplicated.
Specifically, because the data transmitted by the host is not subjected to deduplication processing and is directly stored in the disk due to the load pressure of the storage end, the data may be marked and then written into the disk in order to be distinguished from the data subjected to deduplication, and then the data marked as non-deduplication in the disk may be subjected to deduplication processing when the load pressure is small.
Whether the data is subjected to online deduplication or offline deduplication, the processing method for data deduplication is the same, and only the timing of processing differs.
In some embodiments, as shown in fig. 2, when data is to be subjected to deduplication processing, data to be processed (for example, data transmitted by the host or data marked as not being deduplicated) needs to be divided into a plurality of data chunks, then a fingerprint of each data chunk is calculated, then it is sequentially determined whether each fingerprint exists in a fingerprint hot data cache, and finally, in response to the existence of the fingerprint in the fingerprint hot data cache, data chunks corresponding to the fingerprints existing in the fingerprint hot data cache are deleted.
Specifically, the data block transmitted by the host is large, so that the data block needs to be divided into a plurality of small data blocks, then the fingerprint of each data block is calculated, the fingerprint calculated by each block data is preferentially compared with the fingerprint database hot data in the cache, and if the comparison is successful, the data block is deleted.
It should be noted that the MD5 algorithm may be used to calculate fingerprints of data chunks, and in online deduplication, real-time IO data written by the host is directly partitioned into a plurality of smaller data chunks, and in offline deduplication, data scanned from the disk without deduplication processing is partitioned into a plurality of smaller data chunks.
In some embodiments, as shown in FIG. 2, if the fingerprint is not present in the fingerprint hot data cache, determining whether the fingerprint is present in the fingerprint repository; in response to the fingerprint does not exist in the fingerprint database, writing the data blocks corresponding to the fingerprint which does not exist in the fingerprint database into the storage disk; in response to the fingerprint being present in the fingerprint repository, data chunks corresponding to the fingerprint present in the fingerprint repository are deleted while the fingerprint present in the fingerprint repository is updated into the fingerprint hot data cache.
Specifically, if the comparison of the hot data fails and the fingerprint is not in the hot data cache, searching and comparing the hot data from the fingerprint database, deleting the data blocks after the comparison succeeds, updating the fingerprint data into the fingerprint hot data cache, if the comparison fails and the fingerprint does not exist in the current fingerprint database, recording the fingerprint data of the data blocks into the fingerprint database, updating the fingerprint data into the fingerprint hot data cache, writing the data blocks into the storage disk, and recording the addresses of the data blocks.
It should be noted that, when performing fingerprint calculation comparison, storing the thermal data of the fingerprint library in the cache can accelerate the fingerprint comparison speed.
In some embodiments, whether the deduplication is online or offline, when a data block is deleted, a reference address of the data block that is the same as the deleted data block and that has been written into the storage disk needs to be acquired, and then the reference address is written into the storage disk, and the reference times are updated.
The method provided by the invention can automatically select an online deduplication or offline deduplication processing mode according to the real-time load condition of the storage end, and data written by the host is directly written into the disk after deduplication is completed in the memory during online deduplication, so that the frequency of writing the data into the disk for multiple times due to deduplication processing after the data is written into the disk is avoided, and the disk wear is reduced. And during offline deduplication, background execution is selected when the load of the storage end is smaller than a set storage performance threshold value, and the load peak period of the storage end is avoided. Through the combination of the online deduplication and the offline deduplication, the data is guaranteed to be deduplicated to save the storage space, and the deduplication processing is guaranteed not to influence the performance of the storage end for providing the external service. In addition, the fingerprint hot data cache comparison method is adopted in the deduplication processing process, compared with the method of directly searching fingerprint database data from a disk for comparison, the comparison processing speed can be accelerated, and deduplication efficiency is improved.
According to an embodiment of the present invention, there is also provided a system, which may include a load real-time monitoring module, an unreleased data marking and scanning module, a fingerprint hot data comparison and update module, a fingerprint database management module, and a data deduplication module.
Specifically, the load real-time monitoring module is used for monitoring the current storage load (cpu/memory utilization rate) condition; the non-deduplication data marking and scanning module is used for marking data which are written into the disk and are not subjected to deduplication and finding the data which are not subjected to deduplication from the disk; the fingerprint hot data comparison and update module is used for identifying and updating the hot data in the fingerprint database into a cache in the data deduplication processing process and comparing the hot data with the fingerprint of the deduplication data block; the fingerprint database management module is used for recording the fingerprints of the block data into a fingerprint database when the storage is deleted again, and supporting fingerprint database fingerprint search comparison; and the data deduplication module is used for selecting whether to start online deduplication or offline deduplication according to the load condition monitored by the load real-time monitoring module.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, an embodiment of the present invention further provides a computer apparatus 501, comprising:
at least one processor 520; and
the memory 510, the memory 510 stores a computer program 511 that is executable on the processor, and the processor 520 executes the computer program to perform any of the above steps of the storage-side-based data deduplication method.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, an embodiment of the present invention further provides a computer-readable storage medium 601, where the computer-readable storage medium 601 stores computer program instructions 610, and the computer program instructions 610, when executed by a processor, perform the steps of any one of the above methods for data deduplication based on a storage end.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program to instruct related hardware to implement the methods. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
In addition, the apparatuses, devices, and the like disclosed in the embodiments of the present invention may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television, and the like, or may be a large terminal device, such as a server, and the like, and therefore the scope of protection disclosed in the embodiments of the present invention should not be limited to a specific type of apparatus, device. The client disclosed by the embodiment of the invention can be applied to any one of the electronic terminal devices in the form of electronic hardware, computer software or a combination of the electronic hardware and the computer software.
Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A data deduplication method based on a storage end comprises the following steps:
receiving data transmitted by a host end;
judging whether the load utilization rate of the storage end is greater than a threshold value;
in response to the load utilization rate not being greater than the threshold, writing the data into a storage disk after deduplication processing;
in response to the load utilization being greater than the threshold, writing the data directly to the storage disk.
2. The method of claim 1, wherein in response to the load utilization being greater than the threshold, writing the data directly to the storage disk, further comprising:
and writing the data into the storage disk after marking the data as not to be deleted again.
3. The method of claim 2, further comprising:
and in response to the load utilization rate not being greater than the threshold value, performing deduplication processing on the data marked as not being deduplicated in the storage disk.
4. The method of any one of claims 1-3, wherein the deduplication process comprises:
dividing data to be processed into a plurality of data blocks;
calculating a fingerprint of each data block;
sequentially judging whether each fingerprint exists in a fingerprint thermal data cache;
in response to the fingerprint being present in the fingerprint hot data cache, data chunks corresponding to the fingerprint present in the fingerprint hot data cache are deleted.
5. The method of claim 4, further comprising:
in response to the fingerprint not being present in the fingerprint thermal data cache, determining whether the fingerprint is present in the fingerprint repository;
and responding to the fingerprint not existing in the fingerprint database, and writing the data blocks corresponding to the fingerprint not existing in the fingerprint database into the storage disk.
6. The method of claim 5, further comprising:
in response to the fingerprint being present in the fingerprint repository, deleting the data chunks corresponding to the fingerprint present in the fingerprint repository.
7. The method of claim 6, further comprising:
updating the fingerprints existing in the fingerprint repository into the fingerprint hot data cache.
8. The method of claim 6, wherein data chunks are deleted, further comprising:
acquiring a reference address of the data block which is identical to the deleted data block and is written into the storage disk;
and writing the reference address into the storage disk.
9. A computer device, comprising:
at least one processor; and
memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of the method according to any of claims 1-8.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method of any one of claims 1 to 8.
CN201910927592.4A 2019-09-27 2019-09-27 Data deduplication method and device based on storage end and storage medium Withdrawn CN110727404A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910927592.4A CN110727404A (en) 2019-09-27 2019-09-27 Data deduplication method and device based on storage end and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910927592.4A CN110727404A (en) 2019-09-27 2019-09-27 Data deduplication method and device based on storage end and storage medium

Publications (1)

Publication Number Publication Date
CN110727404A true CN110727404A (en) 2020-01-24

Family

ID=69219511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910927592.4A Withdrawn CN110727404A (en) 2019-09-27 2019-09-27 Data deduplication method and device based on storage end and storage medium

Country Status (1)

Country Link
CN (1) CN110727404A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111381779A (en) * 2020-03-05 2020-07-07 深信服科技股份有限公司 Data processing method, device, equipment and storage medium
CN111506260A (en) * 2020-03-20 2020-08-07 新华三信息技术有限公司 Data processing method, device, equipment and machine-readable storage medium
CN112506877A (en) * 2020-12-03 2021-03-16 深圳市木浪云数据有限公司 Data deduplication method, device and system based on deduplication domain and storage equipment
CN113190523A (en) * 2021-04-08 2021-07-30 金钱猫科技股份有限公司 Distributed file system, method and client based on multi-client cooperation
CN113535708A (en) * 2021-09-17 2021-10-22 苏州浪潮智能科技有限公司 Data deduplication method, system, storage medium and equipment
CN114138198A (en) * 2021-11-29 2022-03-04 苏州浪潮智能科技有限公司 Method, device and equipment for data deduplication and readable medium
CN114442961A (en) * 2022-02-07 2022-05-06 苏州浪潮智能科技有限公司 Data processing method and device, computer equipment and storage medium
WO2023279833A1 (en) * 2021-07-08 2023-01-12 华为技术有限公司 Data processing method and apparatus
CN116756137A (en) * 2023-08-17 2023-09-15 深圳市木浪云科技有限公司 Method, system and equipment for deleting large-scale data object storage

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111381779B (en) * 2020-03-05 2024-02-23 深信服科技股份有限公司 Data processing method, device, equipment and storage medium
CN111381779A (en) * 2020-03-05 2020-07-07 深信服科技股份有限公司 Data processing method, device, equipment and storage medium
CN111506260A (en) * 2020-03-20 2020-08-07 新华三信息技术有限公司 Data processing method, device, equipment and machine-readable storage medium
CN112506877A (en) * 2020-12-03 2021-03-16 深圳市木浪云数据有限公司 Data deduplication method, device and system based on deduplication domain and storage equipment
CN112506877B (en) * 2020-12-03 2024-04-19 深圳市木浪云科技有限公司 Data deduplication method, device and system based on deduplication domain and storage equipment
CN113190523A (en) * 2021-04-08 2021-07-30 金钱猫科技股份有限公司 Distributed file system, method and client based on multi-client cooperation
CN113190523B (en) * 2021-04-08 2022-09-13 金钱猫科技股份有限公司 Distributed file system, method and client based on multi-client cooperation
WO2023279833A1 (en) * 2021-07-08 2023-01-12 华为技术有限公司 Data processing method and apparatus
WO2023040200A1 (en) * 2021-09-17 2023-03-23 苏州浪潮智能科技有限公司 Data deduplication method and system, and storage medium and device
CN113535708A (en) * 2021-09-17 2021-10-22 苏州浪潮智能科技有限公司 Data deduplication method, system, storage medium and equipment
CN114138198A (en) * 2021-11-29 2022-03-04 苏州浪潮智能科技有限公司 Method, device and equipment for data deduplication and readable medium
CN114442961A (en) * 2022-02-07 2022-05-06 苏州浪潮智能科技有限公司 Data processing method and device, computer equipment and storage medium
CN114442961B (en) * 2022-02-07 2023-08-08 苏州浪潮智能科技有限公司 Data processing method, device, computer equipment and storage medium
CN116756137A (en) * 2023-08-17 2023-09-15 深圳市木浪云科技有限公司 Method, system and equipment for deleting large-scale data object storage

Similar Documents

Publication Publication Date Title
CN110727404A (en) Data deduplication method and device based on storage end and storage medium
CN108319654B (en) Computing system, cold and hot data separation method and device, and computer readable storage medium
US8055633B2 (en) Method, system and computer program product for duplicate detection
US10303363B2 (en) System and method for data storage using log-structured merge trees
US9134912B2 (en) Performing authorization control in a cloud storage system
KR102564170B1 (en) Method and device for storing data object, and computer readable storage medium having a computer program using the same
KR20090026296A (en) Predictive data-loader
CN111176560B (en) Cache management method and device, computer equipment and storage medium
US10884926B2 (en) Method and system for distributed storage using client-side global persistent cache
CN110888837B (en) Object storage small file merging method and device
CN112684975B (en) Data storage method and device
CN113326005B (en) Read-write method and device for RAID storage system
CN113535670B (en) Virtual resource mirror image storage system and implementation method thereof
WO2021184996A1 (en) Data storage method and apparatus for database
CN110618974A (en) Data storage method, device, equipment and storage medium
CN111274245B (en) Method and device for optimizing data storage
WO2017020735A1 (en) Data processing method, backup server and storage system
CN111625203A (en) Method, system, device and medium for hierarchical storage
CN113253932B (en) Read-write control method and system for distributed storage system
CN113420082A (en) Data synchronization anomaly detection method and device
CN110287164B (en) Data recovery method and device and computer equipment
US11803483B2 (en) Metadata cache for storing manifest portion
CN110955682A (en) Method and device for deleting cache data, data cache and reading cache data
CN114461635A (en) MySQL database data storage method and device and electronic equipment
CN116820323A (en) Data storage method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200124