CN110727404A - Data deduplication method and device based on storage end and storage medium - Google Patents
Data deduplication method and device based on storage end and storage medium Download PDFInfo
- Publication number
- CN110727404A CN110727404A CN201910927592.4A CN201910927592A CN110727404A CN 110727404 A CN110727404 A CN 110727404A CN 201910927592 A CN201910927592 A CN 201910927592A CN 110727404 A CN110727404 A CN 110727404A
- Authority
- CN
- China
- Prior art keywords
- data
- fingerprint
- storage
- deduplication
- response
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
- G06F3/0641—De-duplication techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0653—Monitoring storage devices or systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0674—Disk device
- G06F3/0676—Magnetic disk device
Abstract
The invention discloses a data deduplication method based on a storage end, which comprises the following steps: receiving data transmitted by a host end; judging whether the load utilization rate of the storage end is greater than a threshold value; in response to the load utilization rate not being greater than the threshold, writing the data into a storage disk after deduplication processing; in response to the load utilization being greater than the threshold, writing the data directly to the storage disk. The invention also discloses a computer device and a readable storage medium. The method disclosed by the invention automatically selects different deduplication modes according to the storage load, thereby ensuring that the data is deduplicated to save the storage space and avoiding the influence of the storage performance loss caused by deduplication on the user service at the peak.
Description
Technical Field
The invention relates to the field of data processing, in particular to a data deduplication method and device based on a storage end and a storage medium.
Background
Data deduplication is a main technology for data reduction in enterprise storage, and deduplication is to store only one copy of the same data in storage, so that a large amount of storage space can be saved for a large amount of redundant data through deduplication processing, storage space is saved for enterprises, and storage cost investment is reduced.
The deduplication greatly affects the storage performance while saving space, and is mainly reflected in that operations of fingerprint data calculation, writing, comparison and metadata record mapping writing are additionally generated during writing of a new data block. How to save space through deduplication and ensure that the reduction of storage performance does not affect the service of a user is always a problem to be considered for storage deduplication, namely, the balance between capacity space saving and performance guarantee cannot be achieved.
Therefore, a data deduplication method is urgently needed.
Disclosure of Invention
In view of the above, in order to overcome at least one aspect of the above problems, an embodiment of the present invention provides a data deduplication method based on a storage side, including:
receiving data transmitted by a host end;
judging whether the load utilization rate of the storage end is greater than a threshold value;
in response to the load utilization rate not being greater than the threshold, writing the data into a storage disk after deduplication processing;
in response to the load utilization being greater than the threshold, writing the data directly to the storage disk.
In some embodiments, in response to the load utilization being greater than the threshold, writing the data directly to the storage disk further comprises:
and writing the data into the storage disk after marking the data as not to be deleted again.
In some embodiments, further comprising:
and in response to the load utilization rate not being greater than the threshold value, performing deduplication processing on the data marked as not being deduplicated in the storage disk.
In some embodiments, the deduplication process comprises:
dividing data to be processed into a plurality of data blocks;
calculating a fingerprint of each data block;
sequentially judging whether each fingerprint exists in a fingerprint thermal data cache;
in response to the fingerprint being present in the fingerprint hot data cache, data chunks corresponding to the fingerprint present in the fingerprint hot data cache are deleted.
In some embodiments, further comprising:
in response to the fingerprint not being present in the fingerprint thermal data cache, determining whether the fingerprint is present in the fingerprint repository;
and responding to the fingerprint not existing in the fingerprint database, and writing the data blocks corresponding to the fingerprint not existing in the fingerprint database into the storage disk.
In some embodiments, further comprising:
in response to the fingerprint being present in the fingerprint repository, deleting the data chunks corresponding to the fingerprint present in the fingerprint repository.
In some embodiments, further comprising:
updating the fingerprints existing in the fingerprint database into the fingerprint hot data cache.
In some embodiments, the removing the data chunks further comprises:
acquiring a reference address of the data block which is identical to the deleted data block and is written into the storage disk;
and writing the reference address into the storage disk.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer apparatus, including:
at least one processor; and
a memory storing a computer program operable on the processor, wherein the processor executes the program to perform any of the steps of the storage-side based data deduplication method described above.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer-readable storage medium storing a computer program, which when executed by a processor performs the steps of any of the storage-side based data deduplication methods described above.
The invention has one of the following beneficial technical effects: the method disclosed by the invention automatically selects different deduplication modes according to the storage load, thereby ensuring that the data is deduplicated to save the storage space and avoiding the influence of the storage performance loss caused by deduplication on the user service at the peak.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a data deduplication method based on a storage side according to an embodiment of the present invention;
fig. 2 is a flowchart of a data deduplication method based on a storage side according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a computer device provided in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
It should be noted that, in the embodiment of the present invention, data deduplication refers to storing only one copy of the same data in the storage, and other duplicate data blocks retain one address and are referred to the unique storage block.
According to an aspect of the present invention, an embodiment of the present invention provides a data deduplication method based on a storage side, as shown in fig. 1, which may include the steps of: s1, receiving data transmitted by the host end; judging whether the load utilization rate of the storage end is greater than a threshold value; s2, in response to the fact that the load utilization rate is not larger than the threshold value, writing the data into a storage disk after deduplication processing; s3, responding to the load utilization rate being larger than the threshold value, the data is directly written into the storage disk.
The method determines the deduplication processing method of the data by judging the utilization rate of the load of the storage end, when the load pressure of the storage end is small, the online deduplication mode can be selected to directly perform deduplication processing on the data and then drop the data, the frequency of data writing on the disk for multiple times due to the fact that the data are written into the disk and then subjected to deduplication processing is avoided, the data writing frequency is reduced to reduce abrasion of the disk, and when the load pressure of the storage end is large, the offline deduplication mode can be selected to perform data deduplication processing after the data are dropped and then subjected to load pressure reduction. Therefore, the method can not only save space through the deduplication, but also ensure that the performance of the storage end is not reduced so as not to influence the service of the user.
The following describes in detail a flow chart of the data deduplication method based on the storage side shown in fig. 2.
First, data transmitted from a host is received.
Specifically, the data transmitted by the host may be IO data, and since the IO data to be stored may have data that has been stored repeatedly before, the repeated data in the IO data needs to be deleted, so as to save a large amount of storage space.
Then, different deduplication methods need to be selected according to the load utilization of the storage end.
It should be noted that the load utilization rate may be a CPU or a memory utilization rate.
In some embodiments, in response to the load utilization being less than the threshold, the data is written to a storage disk after deduplication processing.
Specifically, when the utilization rate of the CPU or the memory is less than the set threshold, an online deduplication mode may be selected, which not only avoids the performance degradation of the storage end, but also reduces the wear of the disk.
In some embodiments, in response to the load utilization being greater than the threshold, the data is written directly to the storage disk.
Specifically, when the utilization rate of the CPU or the memory is greater than the set threshold, an offline deduplication mode may be selected, so as to avoid further occupying the performance from affecting the normal service under the condition of a large load pressure of the storage end.
In some embodiments, in response to the load utilization being greater than the threshold, writing the data directly to the storage disk further comprises: and writing the data into the storage disk after marking the data as not to be deleted again. According to some further embodiments, in response to the load utilization not being greater than the threshold, deduplication processing is performed on data in the storage disk that is marked as not being deduplicated.
Specifically, because the data transmitted by the host is not subjected to deduplication processing and is directly stored in the disk due to the load pressure of the storage end, the data may be marked and then written into the disk in order to be distinguished from the data subjected to deduplication, and then the data marked as non-deduplication in the disk may be subjected to deduplication processing when the load pressure is small.
Whether the data is subjected to online deduplication or offline deduplication, the processing method for data deduplication is the same, and only the timing of processing differs.
In some embodiments, as shown in fig. 2, when data is to be subjected to deduplication processing, data to be processed (for example, data transmitted by the host or data marked as not being deduplicated) needs to be divided into a plurality of data chunks, then a fingerprint of each data chunk is calculated, then it is sequentially determined whether each fingerprint exists in a fingerprint hot data cache, and finally, in response to the existence of the fingerprint in the fingerprint hot data cache, data chunks corresponding to the fingerprints existing in the fingerprint hot data cache are deleted.
Specifically, the data block transmitted by the host is large, so that the data block needs to be divided into a plurality of small data blocks, then the fingerprint of each data block is calculated, the fingerprint calculated by each block data is preferentially compared with the fingerprint database hot data in the cache, and if the comparison is successful, the data block is deleted.
It should be noted that the MD5 algorithm may be used to calculate fingerprints of data chunks, and in online deduplication, real-time IO data written by the host is directly partitioned into a plurality of smaller data chunks, and in offline deduplication, data scanned from the disk without deduplication processing is partitioned into a plurality of smaller data chunks.
In some embodiments, as shown in FIG. 2, if the fingerprint is not present in the fingerprint hot data cache, determining whether the fingerprint is present in the fingerprint repository; in response to the fingerprint does not exist in the fingerprint database, writing the data blocks corresponding to the fingerprint which does not exist in the fingerprint database into the storage disk; in response to the fingerprint being present in the fingerprint repository, data chunks corresponding to the fingerprint present in the fingerprint repository are deleted while the fingerprint present in the fingerprint repository is updated into the fingerprint hot data cache.
Specifically, if the comparison of the hot data fails and the fingerprint is not in the hot data cache, searching and comparing the hot data from the fingerprint database, deleting the data blocks after the comparison succeeds, updating the fingerprint data into the fingerprint hot data cache, if the comparison fails and the fingerprint does not exist in the current fingerprint database, recording the fingerprint data of the data blocks into the fingerprint database, updating the fingerprint data into the fingerprint hot data cache, writing the data blocks into the storage disk, and recording the addresses of the data blocks.
It should be noted that, when performing fingerprint calculation comparison, storing the thermal data of the fingerprint library in the cache can accelerate the fingerprint comparison speed.
In some embodiments, whether the deduplication is online or offline, when a data block is deleted, a reference address of the data block that is the same as the deleted data block and that has been written into the storage disk needs to be acquired, and then the reference address is written into the storage disk, and the reference times are updated.
The method provided by the invention can automatically select an online deduplication or offline deduplication processing mode according to the real-time load condition of the storage end, and data written by the host is directly written into the disk after deduplication is completed in the memory during online deduplication, so that the frequency of writing the data into the disk for multiple times due to deduplication processing after the data is written into the disk is avoided, and the disk wear is reduced. And during offline deduplication, background execution is selected when the load of the storage end is smaller than a set storage performance threshold value, and the load peak period of the storage end is avoided. Through the combination of the online deduplication and the offline deduplication, the data is guaranteed to be deduplicated to save the storage space, and the deduplication processing is guaranteed not to influence the performance of the storage end for providing the external service. In addition, the fingerprint hot data cache comparison method is adopted in the deduplication processing process, compared with the method of directly searching fingerprint database data from a disk for comparison, the comparison processing speed can be accelerated, and deduplication efficiency is improved.
According to an embodiment of the present invention, there is also provided a system, which may include a load real-time monitoring module, an unreleased data marking and scanning module, a fingerprint hot data comparison and update module, a fingerprint database management module, and a data deduplication module.
Specifically, the load real-time monitoring module is used for monitoring the current storage load (cpu/memory utilization rate) condition; the non-deduplication data marking and scanning module is used for marking data which are written into the disk and are not subjected to deduplication and finding the data which are not subjected to deduplication from the disk; the fingerprint hot data comparison and update module is used for identifying and updating the hot data in the fingerprint database into a cache in the data deduplication processing process and comparing the hot data with the fingerprint of the deduplication data block; the fingerprint database management module is used for recording the fingerprints of the block data into a fingerprint database when the storage is deleted again, and supporting fingerprint database fingerprint search comparison; and the data deduplication module is used for selecting whether to start online deduplication or offline deduplication according to the load condition monitored by the load real-time monitoring module.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, an embodiment of the present invention further provides a computer apparatus 501, comprising:
at least one processor 520; and
the memory 510, the memory 510 stores a computer program 511 that is executable on the processor, and the processor 520 executes the computer program to perform any of the above steps of the storage-side-based data deduplication method.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, an embodiment of the present invention further provides a computer-readable storage medium 601, where the computer-readable storage medium 601 stores computer program instructions 610, and the computer program instructions 610, when executed by a processor, perform the steps of any one of the above methods for data deduplication based on a storage end.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program to instruct related hardware to implement the methods. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
In addition, the apparatuses, devices, and the like disclosed in the embodiments of the present invention may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television, and the like, or may be a large terminal device, such as a server, and the like, and therefore the scope of protection disclosed in the embodiments of the present invention should not be limited to a specific type of apparatus, device. The client disclosed by the embodiment of the invention can be applied to any one of the electronic terminal devices in the form of electronic hardware, computer software or a combination of the electronic hardware and the computer software.
Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.
Claims (10)
1. A data deduplication method based on a storage end comprises the following steps:
receiving data transmitted by a host end;
judging whether the load utilization rate of the storage end is greater than a threshold value;
in response to the load utilization rate not being greater than the threshold, writing the data into a storage disk after deduplication processing;
in response to the load utilization being greater than the threshold, writing the data directly to the storage disk.
2. The method of claim 1, wherein in response to the load utilization being greater than the threshold, writing the data directly to the storage disk, further comprising:
and writing the data into the storage disk after marking the data as not to be deleted again.
3. The method of claim 2, further comprising:
and in response to the load utilization rate not being greater than the threshold value, performing deduplication processing on the data marked as not being deduplicated in the storage disk.
4. The method of any one of claims 1-3, wherein the deduplication process comprises:
dividing data to be processed into a plurality of data blocks;
calculating a fingerprint of each data block;
sequentially judging whether each fingerprint exists in a fingerprint thermal data cache;
in response to the fingerprint being present in the fingerprint hot data cache, data chunks corresponding to the fingerprint present in the fingerprint hot data cache are deleted.
5. The method of claim 4, further comprising:
in response to the fingerprint not being present in the fingerprint thermal data cache, determining whether the fingerprint is present in the fingerprint repository;
and responding to the fingerprint not existing in the fingerprint database, and writing the data blocks corresponding to the fingerprint not existing in the fingerprint database into the storage disk.
6. The method of claim 5, further comprising:
in response to the fingerprint being present in the fingerprint repository, deleting the data chunks corresponding to the fingerprint present in the fingerprint repository.
7. The method of claim 6, further comprising:
updating the fingerprints existing in the fingerprint repository into the fingerprint hot data cache.
8. The method of claim 6, wherein data chunks are deleted, further comprising:
acquiring a reference address of the data block which is identical to the deleted data block and is written into the storage disk;
and writing the reference address into the storage disk.
9. A computer device, comprising:
at least one processor; and
memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of the method according to any of claims 1-8.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910927592.4A CN110727404A (en) | 2019-09-27 | 2019-09-27 | Data deduplication method and device based on storage end and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910927592.4A CN110727404A (en) | 2019-09-27 | 2019-09-27 | Data deduplication method and device based on storage end and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110727404A true CN110727404A (en) | 2020-01-24 |
Family
ID=69219511
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910927592.4A Withdrawn CN110727404A (en) | 2019-09-27 | 2019-09-27 | Data deduplication method and device based on storage end and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110727404A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111381779A (en) * | 2020-03-05 | 2020-07-07 | 深信服科技股份有限公司 | Data processing method, device, equipment and storage medium |
CN111506260A (en) * | 2020-03-20 | 2020-08-07 | 新华三信息技术有限公司 | Data processing method, device, equipment and machine-readable storage medium |
CN112506877A (en) * | 2020-12-03 | 2021-03-16 | 深圳市木浪云数据有限公司 | Data deduplication method, device and system based on deduplication domain and storage equipment |
CN113190523A (en) * | 2021-04-08 | 2021-07-30 | 金钱猫科技股份有限公司 | Distributed file system, method and client based on multi-client cooperation |
CN113535708A (en) * | 2021-09-17 | 2021-10-22 | 苏州浪潮智能科技有限公司 | Data deduplication method, system, storage medium and equipment |
CN114138198A (en) * | 2021-11-29 | 2022-03-04 | 苏州浪潮智能科技有限公司 | Method, device and equipment for data deduplication and readable medium |
CN114442961A (en) * | 2022-02-07 | 2022-05-06 | 苏州浪潮智能科技有限公司 | Data processing method and device, computer equipment and storage medium |
WO2023279833A1 (en) * | 2021-07-08 | 2023-01-12 | 华为技术有限公司 | Data processing method and apparatus |
CN116756137A (en) * | 2023-08-17 | 2023-09-15 | 深圳市木浪云科技有限公司 | Method, system and equipment for deleting large-scale data object storage |
-
2019
- 2019-09-27 CN CN201910927592.4A patent/CN110727404A/en not_active Withdrawn
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111381779B (en) * | 2020-03-05 | 2024-02-23 | 深信服科技股份有限公司 | Data processing method, device, equipment and storage medium |
CN111381779A (en) * | 2020-03-05 | 2020-07-07 | 深信服科技股份有限公司 | Data processing method, device, equipment and storage medium |
CN111506260A (en) * | 2020-03-20 | 2020-08-07 | 新华三信息技术有限公司 | Data processing method, device, equipment and machine-readable storage medium |
CN112506877A (en) * | 2020-12-03 | 2021-03-16 | 深圳市木浪云数据有限公司 | Data deduplication method, device and system based on deduplication domain and storage equipment |
CN112506877B (en) * | 2020-12-03 | 2024-04-19 | 深圳市木浪云科技有限公司 | Data deduplication method, device and system based on deduplication domain and storage equipment |
CN113190523A (en) * | 2021-04-08 | 2021-07-30 | 金钱猫科技股份有限公司 | Distributed file system, method and client based on multi-client cooperation |
CN113190523B (en) * | 2021-04-08 | 2022-09-13 | 金钱猫科技股份有限公司 | Distributed file system, method and client based on multi-client cooperation |
WO2023279833A1 (en) * | 2021-07-08 | 2023-01-12 | 华为技术有限公司 | Data processing method and apparatus |
WO2023040200A1 (en) * | 2021-09-17 | 2023-03-23 | 苏州浪潮智能科技有限公司 | Data deduplication method and system, and storage medium and device |
CN113535708A (en) * | 2021-09-17 | 2021-10-22 | 苏州浪潮智能科技有限公司 | Data deduplication method, system, storage medium and equipment |
CN114138198A (en) * | 2021-11-29 | 2022-03-04 | 苏州浪潮智能科技有限公司 | Method, device and equipment for data deduplication and readable medium |
CN114442961A (en) * | 2022-02-07 | 2022-05-06 | 苏州浪潮智能科技有限公司 | Data processing method and device, computer equipment and storage medium |
CN114442961B (en) * | 2022-02-07 | 2023-08-08 | 苏州浪潮智能科技有限公司 | Data processing method, device, computer equipment and storage medium |
CN116756137A (en) * | 2023-08-17 | 2023-09-15 | 深圳市木浪云科技有限公司 | Method, system and equipment for deleting large-scale data object storage |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110727404A (en) | Data deduplication method and device based on storage end and storage medium | |
CN108319654B (en) | Computing system, cold and hot data separation method and device, and computer readable storage medium | |
US8055633B2 (en) | Method, system and computer program product for duplicate detection | |
US10303363B2 (en) | System and method for data storage using log-structured merge trees | |
US9134912B2 (en) | Performing authorization control in a cloud storage system | |
KR102564170B1 (en) | Method and device for storing data object, and computer readable storage medium having a computer program using the same | |
KR20090026296A (en) | Predictive data-loader | |
CN111176560B (en) | Cache management method and device, computer equipment and storage medium | |
US10884926B2 (en) | Method and system for distributed storage using client-side global persistent cache | |
CN110888837B (en) | Object storage small file merging method and device | |
CN112684975B (en) | Data storage method and device | |
CN113326005B (en) | Read-write method and device for RAID storage system | |
CN113535670B (en) | Virtual resource mirror image storage system and implementation method thereof | |
WO2021184996A1 (en) | Data storage method and apparatus for database | |
CN110618974A (en) | Data storage method, device, equipment and storage medium | |
CN111274245B (en) | Method and device for optimizing data storage | |
WO2017020735A1 (en) | Data processing method, backup server and storage system | |
CN111625203A (en) | Method, system, device and medium for hierarchical storage | |
CN113253932B (en) | Read-write control method and system for distributed storage system | |
CN113420082A (en) | Data synchronization anomaly detection method and device | |
CN110287164B (en) | Data recovery method and device and computer equipment | |
US11803483B2 (en) | Metadata cache for storing manifest portion | |
CN110955682A (en) | Method and device for deleting cache data, data cache and reading cache data | |
CN114461635A (en) | MySQL database data storage method and device and electronic equipment | |
CN116820323A (en) | Data storage method, device, electronic equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200124 |