CN115993939A - Method and device for deleting repeated data of storage system - Google Patents

Method and device for deleting repeated data of storage system Download PDF

Info

Publication number
CN115993939A
CN115993939A CN202310279522.9A CN202310279522A CN115993939A CN 115993939 A CN115993939 A CN 115993939A CN 202310279522 A CN202310279522 A CN 202310279522A CN 115993939 A CN115993939 A CN 115993939A
Authority
CN
China
Prior art keywords
data block
data
check
module
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310279522.9A
Other languages
Chinese (zh)
Other versions
CN115993939B (en
Inventor
万春勇
骆政亟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Zhong'an Shulian Information Technology Co ltd
Original Assignee
Shaanxi Zhong'an Shulian Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi Zhong'an Shulian Information Technology Co ltd filed Critical Shaanxi Zhong'an Shulian Information Technology Co ltd
Priority to CN202310279522.9A priority Critical patent/CN115993939B/en
Publication of CN115993939A publication Critical patent/CN115993939A/en
Application granted granted Critical
Publication of CN115993939B publication Critical patent/CN115993939B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for deleting repeated data of a storage system. The method for deleting the repeated data of the storage system comprises the following steps: reading the data blocks which are not stored; extracting a plurality of sections of check bits of an unrecorded data block; the method comprises the steps of performing one-to-one retrieval comparison on a plurality of sections of check bits of an unrecorded data block and a plurality of sections of check bits of a stored data block; if the check bits of each segment of the non-stored data block are consistent with the check bits of each segment of the stored data block, marking the non-stored data block as a repeated data block, deleting the repeated data block, and storing index information of the repeated data block; otherwise, the non-stored data block is marked as a non-repeated data block and the non-repeated data block is saved. Compared with the traditional calculation repeated data deleting method, the deleting method provided by the invention avoids the calculation of the hash value, thereby greatly reducing the calculation cost. In addition, the reliability and the erasure rate of the stored data are ensured, and the method is a good choice for the data erasure of an actual storage system.

Description

Method and device for deleting repeated data of storage system
Technical Field
The invention relates to the technical field of data storage, in particular to a method and a device for deleting repeated data of a storage system.
Background
With the rapid development of information technology, the required data storage volume increases explosively, thereby providing greater challenges for data storage. The data de-duplication technique is an effective method for saving the storage space of the system and improving the performance of the storage system.
The repeated data deleting technology is used for comparing and analyzing the stored data, namely, repeated data blocks in the data are deleted, namely, the repeated data are only stored in the system, and other repeated copies are used for establishing an index, so that the storage space of the system is saved, and the cost of the system is further saved. The deduplication technology has attracted a great deal of interest in the industry, being adopted by several large data centers. In recent years, with the increasing storage requirements of mobile data storage systems (e.g., flash memory), deduplication technology has also been applied in mobile data storage systems.
Conventional duplicate data determination is accomplished by performing a hash value calculation on the data and then comparing the value to a stored hash value. If the same hash value is detected, the data is considered to be repeated data, the data is stored in a copy index mode, and otherwise, the data is considered to be new data. Although the hash algorithm has good collision resistance (i.e. different data blocks correspond to the same hash value), the calculation and comparison of the hash values are a complex process, increase the calculation overhead of the data de-duplication system, and limit the performance of the data de-duplication system.
Disclosure of Invention
The invention aims to provide a method and a device for deleting repeated data of a storage system, which greatly reduce the calculation cost, ensure the reliability and the deleting rate of the stored data and at least solve the problems that the repeated data deleting system in the prior art needs to calculate a hash value and has large calculation cost.
To achieve the above object, an aspect of the present invention provides a method for deduplication of a storage system, including: reading the data blocks which are not stored; each non-stored data block comprises data bits and a plurality of sections of check bits; extracting a plurality of sections of check bits of an unrecorded data block; the method comprises the steps of performing one-to-one retrieval comparison on a plurality of sections of check bits of an unrecorded data block and a plurality of sections of check bits of a stored data block; if the check bits of each segment of the non-stored data block are consistent with the check bits of each segment of the stored data block, marking the non-stored data block as a repeated data block, deleting the repeated data block, and storing index information of the repeated data block; the index information is used for data reading; if the check bits of each segment of the non-stored data block are not consistent with the check bits of each segment of the stored data block, the non-stored data block is marked as a non-repeated data block, and the non-repeated data block is stored.
Further, the check bits are composed of a plurality of pieces of data check bits and a piece of error correction check bits.
Further, the data check bit is obtained through check code calculation; the error correction check bits are calculated by error correction codes.
Further, the multi-section check bits of the data block which is not stored are searched and compared with the multi-section check bits of the data block which is stored one by adopting parallel processing or serial processing.
Further, storing the non-duplicate data block includes: the data bits of the non-stored data block are stored in the data bit area and the parity bits of the non-stored data block are stored in the parity bit area.
Further, the data reading includes: reading the stored data block; performing error correction decoding on the read stored data block; performing segment verification on the multi-segment verification bits of the decoded data block; judging whether each section of check bits of the data block subjected to the sectional check pass the check; if all the segments of check bits of the data block after the segment check pass the check, outputting the data bits of the data block after the segment check to all index positions according to the index information; and if the check bits of each segment of the data block subjected to the segment check do not pass the check, outputting error information at the index position corresponding to the data block.
Another aspect of the present invention provides a data de-duplication apparatus for a storage system, including: the system comprises a data input module, an extraction module, a retrieval comparison module, a marking module, a repeated data deleting module, a first data storage module and a second data storage module; the data input module is used for reading the data blocks which are not stored; the extraction module is used for extracting a plurality of sections of check bits of the data block which is not stored; the retrieval comparison module is used for carrying out one-to-one retrieval comparison on the multi-section check bits of the non-stored data block and the multi-section check bits of the stored data block so as to judge whether each section check bit of the non-stored data block is consistent with each section check bit of the stored data block; the marking module is used for marking the data blocks which are not stored as repeated data blocks or non-repeated data blocks; the repeated data deleting module is used for deleting repeated data blocks; the first data storage module is used for storing index information of the repeated data blocks; the second data storage module is used for storing non-repeated data blocks.
Further, the storage system deduplication apparatus further includes a data reading apparatus including: the system comprises a data reading module, an error correction decoding module, a segmentation checking module, a judging module, an output module and an error reporting module; the data reading module is used for reading the stored data blocks; the error correction decoding module is used for performing error correction decoding on the read stored data block; the segmentation check module is used for carrying out segmentation check on the multi-segment check bits of the decoded data block; the judging module is used for judging whether each section of check bits of the data block subjected to the sectional check pass the check; the output module is used for outputting the data bits of the data block subjected to the segmentation verification to all index positions; the error reporting module is used for outputting error information at the index position corresponding to the data block.
The technical scheme of the invention provides a method and a device for deleting repeated data of a storage system. The method for deleting the repeated data of the storage system comprises the following steps: reading the data blocks which are not stored; each non-stored data block comprises data bits and a plurality of sections of check bits; extracting a plurality of sections of check bits of an unrecorded data block; the method comprises the steps of performing one-to-one retrieval comparison on a plurality of sections of check bits of an unrecorded data block and a plurality of sections of check bits of a stored data block; if the check bits of each segment of the non-stored data block are consistent with the check bits of each segment of the stored data block, marking the non-stored data block as a repeated data block, deleting the repeated data block, and storing index information of the repeated data block; the index information is used for data reading; if the check bits of each segment of the non-stored data block are not consistent with the check bits of each segment of the stored data block, the non-stored data block is marked as a non-repeated data block, and the non-repeated data block is stored. Compared with the traditional calculation repeated data deleting method, the deleting method provided by the invention avoids the calculation of the hash value, thereby greatly reducing the calculation cost. In addition, the reliability and the erasure rate of the stored data are ensured, and the method is a good choice for the data erasure of an actual storage system (particularly a mobile data storage system such as a flash memory and the like).
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:
FIG. 1 is a flow chart of an alternative method of deduplication of a storage system according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an alternative storage system data storage method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a codeword structure after encoding an alternative memory system data block according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an alternative storage system deduplication principle in accordance with an embodiment of the present invention;
FIG. 5 is a flowchart of an alternative method for deduplication of a storage system according to an embodiment of the present invention;
FIG. 6 is a flowchart of an alternative method of reading data from a storage system according to an embodiment of the invention;
FIG. 7 is a schematic diagram of an alternative storage system deduplication apparatus in accordance with an embodiment of the present invention;
FIG. 8 is a schematic diagram of an alternative storage system deduplication data reading apparatus according to an embodiment of the present invention.
Wherein the above figures include the following reference numerals:
10. a data entry module; 20. an extraction module; 30. a search comparison module; 40. a marking module; 50. a deduplication module; 60. a first data storage module; 70. a second data storage module; 80. a data reading module; 90. an error correction decoding module; 100. a segment checking module; 110. a judging module; 120. an output module; 130. an error reporting module; 140. a storage system; 141. a data bit region; 142. checking the bit region; 201. original data bits; 202. checking the bit sequence; 203. error correction check bit sequences; 301. a first data stream; 302. a second data stream; 303. and (5) final data flow.
Description of the embodiments
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.
FIG. 1 is a flow chart of an alternative method of deduplication of a storage system according to an embodiment of the present invention. The invention provides a method for deleting repeated data of a storage system, which comprises the following steps:
step S102: reading the data blocks which are not stored; each non-stored data block comprises data bits and a plurality of sections of check bits;
step S104: extracting a plurality of sections of check bits of an unrecorded data block;
step S106: the method comprises the steps of performing one-to-one retrieval comparison on a plurality of sections of check bits of an unrecorded data block and a plurality of sections of check bits of a stored data block;
step S108: if the check bits of each segment of the non-stored data block are consistent with the check bits of each segment of the stored data block, marking the non-stored data block as a repeated data block, deleting the repeated data block, and storing index information of the repeated data block; the index information is used for data reading;
step S110: if the check bits of each segment of the non-stored data block are not consistent with the check bits of each segment of the stored data block, the non-stored data block is marked as a non-repeated data block, and the non-repeated data block is stored. In the parallel relationship between step S108 and step S110, the decision after step S106 goes to step S108 or step S110.
Compared with the traditional method for calculating the repeated data deletion, the method avoids the calculation of the hash value, thereby greatly reducing the calculation cost, ensuring the reliability and the repeated deletion rate of the stored data, and being a good choice for the repeated data deletion of an actual storage system (particularly a mobile data storage system such as a flash memory and the like).
As an optimization scheme of the invention, the multi-section check bits of the non-stored data block and the multi-section check bits of the stored data block are subjected to one-to-one search comparison and are processed in parallel or in series. The multi-section check bits of the non-stored data block can be searched and compared with the multi-section check bits of the stored data block one by using parallel processing, so that the method has higher comparison speed, but higher parallel processing capability is required.
FIG. 2 is a schematic diagram of an alternative storage system data storage method according to an embodiment of the invention. As can be seen from fig. 2, the stored data bits typically need to be checked to improve the reliability of the stored data. The purpose of the check bits is to check the integrity of the data and when errors occur in the stored data, an error correction mechanism may be activated to correct the error bits. In the memory system 140, the data bits and the parity bits of the data segment are typically stored separately, the data bits are stored in the data bit region 141 of the memory system, and the parity bits are stored in the parity bit region 142 of the memory system. Typically, the length of the check bits is much smaller than the length of the data bits.
The method of calculating the check bits is related to the check code and error correction code adopted. In general, the check code may employ Cyclic Redundancy Check (CRC), hamming check, etc., and the error correction code may employ BCH code, reed-solomon (RS) code, low Density Parity Check (LDPC) code, etc.
Fig. 3 is a schematic diagram of a codeword structure after encoding an optional storage system data block according to an embodiment of the present invention. As can be seen from fig. 3, the original data bits 201 are divided into several segments, and the segments are verified to form corresponding check bit sequences 202, which are used as a determination flag for the validity of the data sequences of the segments. The reason for adopting the segment check is that the segment check can improve the check performance and reduce the collision probability of check bits during repeated data deletion under the condition of long data bits.
In order to more clearly illustrate the above procedure, a method of calculating the check bits will be described below by taking CRC check as an example. The CRC has the advantage that the bit length of the input information can be arbitrarily selected, and the CRC has higher flexibility.
Let the data be divided into L segments, each segment having a length of k data bits. Assume that the first segment data is [ m ] 0 ,m 1 ,…,m k-1 ]The corresponding polynomial m (x) =m 0 +m 1 x+…+m k-1 x k-1 . The degree of the CRC generator polynomial g (x) is r. Polynomial x r m (x) is summed up over g (x), p (x) =x r m(x)modg(x),
I.e. a polynomial p (x) of degree r is obtained, the coefficients [ p ] 0 ,p 1 ,…,p r-1 ]The corresponding sequence of length r is the check bit sequence 202.
When the check bit sequences 202 of the L pieces of data are all found, a sequence of length L (k+r) is obtained. Then, the sequence is error correction coded to obtain an error correction check bit sequence 203. As described above, the error correction code may employ BCH code, RS code, LDPC code, or the like.
In order to more clearly illustrate the above procedure, a calculation method of the error correction check bit sequence 203 will be described below taking an LDPC code as an example. LDPC codes are a class of linear codes defined by a sparse check matrix. Let D be the information sequence to be encoded and P be the check bit sequence. For a given D, the error correction check sequence P is calculated such that D and P constitute a vector c= [ D, P]Needs to satisfy the check equation C ∙ H T = 0
Where H is the check matrix of the LDPC code, T represents the matrix transpose, and ∙ represents the modulo-2 multiplication. The check matrix of an LDPC code is typically a sparse matrix, which may be constructed in a manner that will not be described in detail herein. Assuming that the length of the data bit D is M and the length of the encoded codeword is N, the length of P is N-M.
FIG. 4 is a schematic diagram of an alternative storage system deduplication principle in accordance with an embodiment of the present invention. As shown, the data of the first data stream 301 and the second data stream 302 need to be stored in a storage system. Each data stream contains 4 data blocks, which if stored directly to the storage system, require storage space of 8 data blocks. However, if the repeated data blocks are identified by the repeated data deleting technology, the repeated data (for example, the data block a) only stores one copy in the system, other repeated copies establish index information, and finally, only 4 data blocks need to be stored to obtain the final data stream 303, thereby saving the storage space of the system.
FIG. 5 is a flowchart of an alternative method for deduplication of a storage system according to an embodiment of the present invention. As can be seen from fig. 5, the method comprises the steps of:
s401: and (5) reading data.
S402: and extracting check bits.
S403: the check bits are retrieved.
S404: judging whether the detection device exists or not, and if so, turning to step S405; otherwise, go to step S406.
S405: deleting the data block and storing index information.
S406: the data block is saved.
S407: and starting the next repeated data deleting process.
In order to more clearly illustrate the data recovery algorithm proposed by the present invention, the execution of the algorithm is described in detail below.
Step S401: and (5) reading data. The data reading mode is to read by blocks, and each block of data is an encoded data block shown in fig. 3 and comprises data bits and a plurality of sections of check bits.
Step S402: and extracting check bits. The check bits in the data block read in step S401 are extracted, which typically comprise a plurality of segments.
Step S403: the check bits are retrieved. The check bits extracted in step S402 are compared with stored check bits by retrieval. For each segment of check bits of each data block, the search may be performed in parallel or in serial.
Step S404: and judging whether check bits of the data block to be stored exist. If and only if each segment of check bits is consistent with each segment of check bit of a certain stored data block, respectively, it is considered that the check bit exists, otherwise, it is considered that the check bit does not exist.
For example, the number of data block segments of the storage system l=8. Assuming that the check bit sequence of 8-segment data is p in turn 0 、p 1 … p 7 Assume that the error correction check bit sequence is P. If the check bit sequence of a stored data segment is p' in turn 0 、p´ 1 …p´ 7 The error correction check bit sequence is P' and meets the requirement
p 0 =p´ 0 、p 1 =p´ 1 … p 7 =p´ 7 、P=P´,
Each segment of parity bits is considered to be identical to each segment of parity bits of the data block, respectively.
If yes, go to step S405; otherwise, go to step S406.
Step S405: deleting the data block and storing index information. The data block is repeated data blocks, and only index information is saved for data reading.
Step S406: the data block is saved. The specific method is that the data bits of the data block are stored in a data bit area, and the check bits are stored in a check bit area.
Step S407: and starting the next repeated data deleting process.
FIG. 6 is a flowchart of an alternative method of reading data from a storage system according to an embodiment of the invention. As can be seen from fig. 6, the method comprises the steps of:
s501: the data is read.
S502: and (5) error correction decoding.
S503: and (5) checking data.
S504: judging whether each segment of data passes the verification, if so, turning to step S505; otherwise, go to step S506.
S505: and outputting data according to the index.
S506: and outputting error information according to the index.
S507: the next segment of data reading flow is started.
In order to more clearly illustrate the method for reading data after the memory system is duplicated and deleted, the execution process of the method is specifically described below.
Step S501: the data is read. The data reading mode is to read in blocks, and each block of data corresponds to the encoded data block shown in fig. 3. The read data block can also contain bit errors due to noise interference during the read process, etc.
Step S502: and (5) error correction decoding. The method used for error correction decoding is related to the coding scheme employed and aims to correct errors in the data reading process.
Step S503: and (5) checking data. And carrying out segment data verification on the decoded sequence, and judging whether each segment of verification passes or not.
In order to more clearly illustrate the above procedure, a procedure of data checking will be described below by taking CRC checking as an example. For example, the number of data block segments of the storage system l=8. Assume that the 8 pieces of data read are sequentially d 0 、d 1 … d 7 The read check bit sequence is p in turn 0 、p 1 … p 7 Assuming p 0 、p´ 1 … p´ 7 To adopt d 0 、d 1 … d 7 The calculated check bit sequence, if p i =p´ i
The check bits read by the ith segment are considered to be identical to the check bits calculated for that segment of data (0.ltoreq.i < 8).
Step S504: judging whether each segment of data passes the verification, if so, turning to step S505; otherwise, go to step S506.
Step S505: and if each segment of data passes the verification, outputting the data bit sequence to all index positions according to the index output data.
Step S506: if the data of each segment fails to pass the verification, error information is output at the corresponding index position.
Step S507: the next segment of data reading flow is started.
FIG. 7 is a schematic diagram of an alternative storage system deduplication apparatus in accordance with an embodiment of the present invention. As can be seen from fig. 7, the storage system deduplication apparatus includes: a data entry module 10, an extraction module 20, a retrieval comparison module 30, a tagging module 40, a deduplication module 50, a first data storage module 60, and a second data storage module 70; the data entry module 10 is used for reading the data blocks which are not stored; the extracting module 20 is configured to extract a plurality of segments of check bits of an unrecorded data block; the search comparison module 30 is configured to perform a one-to-one search comparison on the multiple segments of check bits of the non-stored data block and the multiple segments of check bits of the stored data block to determine whether each segment of check bits of the non-stored data block is consistent with each segment of check bits of the stored data block; the marking module 40 is configured to mark the data block that is not stored as a duplicate data block or a non-duplicate data block; the data de-duplication module 50 is configured to de-duplicate data blocks; the first data storage module 60 is configured to store index information of the repeated data blocks; the second data storage module 70 is used to hold non-duplicate data blocks.
FIG. 8 is a schematic diagram of an alternative storage system deduplication data reading apparatus according to an embodiment of the present invention. As can be seen from fig. 8, the data reading apparatus includes: the device comprises a data reading module 80, an error correction decoding module 90, a segmentation checking module 100, a judging module 110, an output module 120 and an error reporting module 130; the data reading module 80 is used for reading the stored data blocks; the error correction decoding module 90 is configured to perform error correction decoding on the read stored data block; the segment checking module 100 is configured to perform segment checking on the multi-segment check bits of the decoded data block; the judging module 110 is configured to judge whether each segment of check bits of the data block after the segment check passes the check; the output module 120 is configured to output the data bits of the data block after the segment verification to all index positions; the error reporting module 130 is configured to output error information at an index position corresponding to the data block.
The elements and model steps of the examples described in the embodiments disclosed herein may be implemented in electronic hardware, computer software, or a combination of both, and to clearly illustrate the interchangeability of hardware and software, the components and steps of the examples have been described generally in terms of functionality in the foregoing description. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A method for deduplication in a storage system, comprising:
reading the data blocks which are not stored; each of the non-stored data blocks comprises data bits and a plurality of sections of check bits;
extracting a plurality of segments of check bits of the non-stored data block;
the multi-section check bits of the non-stored data block are searched and compared with the multi-section check bits of the stored data block one by one;
if the check bits of each segment of the non-stored data block are consistent with the check bits of each segment of the stored data block, marking the non-stored data block as a repeated data block, deleting the repeated data block, and storing index information of the repeated data block; the index information is used for data reading;
and if the check bits of each segment of the non-stored data block are not consistent with the check bits of each segment of the stored data block, marking the non-stored data block as a non-repeated data block and storing the non-repeated data block.
2. The method of claim 1, wherein,
the check bit is composed of a plurality of sections of data check bits and a section of error correction check bits.
3. The method of claim 2, wherein,
the data check bit is obtained through check code calculation; the error correction check bits are calculated by error correction codes.
4. The method of claim 1, wherein,
and carrying out one-to-one retrieval comparison on the plurality of sections of check bits of the non-stored data block and the plurality of sections of check bits of the stored data block, wherein parallel processing or serial processing is adopted.
5. The method of claim 1, wherein the saving the non-duplicate data block comprises:
and storing the data bits of the non-stored data block in a data bit area, and storing the check bits of the non-stored data block in a check bit area.
6. The method of claim 1, wherein the data reading comprises:
reading the stored data block;
performing error correction decoding on the read stored data block;
performing segment verification on the decoded multi-segment verification bits of the data block;
judging whether each section of check bits of the data block subjected to the sectional check pass the check;
if all the segments of check bits of the data block after the segment check pass the check, outputting the data bits of the data block after the segment check to all index positions according to the index information;
and if the check bits of each segment of the data block after the segment check do not pass the check, outputting error information at the index position corresponding to the data block.
7. A storage system data deduplication apparatus, which is applied to the storage system data deduplication method as described in any one of claims 1 to 6, characterized in that the storage system data deduplication apparatus comprises:
a data entry module (10), the data entry module (10) being for reading an unrecorded data block;
-an extraction module (20), the extraction module (20) being configured to extract a plurality of segments of parity bits of the non-stored data block;
a search comparison module (30), wherein the search comparison module (30) is used for performing one-to-one search comparison on the multi-segment check bits of the non-stored data block and the multi-segment check bits of the stored data block to determine whether each segment check bit of the non-stored data block is consistent with each segment check bit of the stored data block;
a tagging module (40), the tagging module (40) being configured to tag the non-stored data block as a duplicate data block or a non-duplicate data block;
-a de-duplication module (50), the de-duplication module (50) being configured to delete the de-duplication data block;
-a first data storage module (60), the first data storage module (60) being adapted to store index information of the repeated data blocks;
-a second data storage module (70), said second data storage module (70) being adapted to hold said non-duplicate data blocks.
8. The storage system de-duplication apparatus of claim 7 further comprising a data reading device, comprising:
-a data reading module (80), the data reading module (80) being adapted to read the stored data blocks;
an error correction decoding module (90), the error correction decoding module (90) being configured to perform error correction decoding on the read stored data block;
the segmentation verification module (100) is used for carrying out segmentation verification on the decoded multi-segment verification bits of the data block;
the judging module (110) is used for judging whether each segment of check bits of the data block subjected to the segment check pass the check or not;
an output module (120), where the output module (120) is configured to output data bits of the data block after the segment verification to all index positions;
and the error reporting module (130) is used for outputting error information at the index position corresponding to the data block.
CN202310279522.9A 2023-03-22 2023-03-22 Method and device for deleting repeated data of storage system Active CN115993939B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310279522.9A CN115993939B (en) 2023-03-22 2023-03-22 Method and device for deleting repeated data of storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310279522.9A CN115993939B (en) 2023-03-22 2023-03-22 Method and device for deleting repeated data of storage system

Publications (2)

Publication Number Publication Date
CN115993939A true CN115993939A (en) 2023-04-21
CN115993939B CN115993939B (en) 2023-06-09

Family

ID=85992349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310279522.9A Active CN115993939B (en) 2023-03-22 2023-03-22 Method and device for deleting repeated data of storage system

Country Status (1)

Country Link
CN (1) CN115993939B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116991329A (en) * 2023-09-25 2023-11-03 深圳市明泰智能技术有限公司 Data redundancy prevention method and system for self-service terminal equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102207939A (en) * 2010-03-31 2011-10-05 联想(北京)有限公司 Multi-hardware system data processing apparatus and method for deleting duplicated data
CN102915278A (en) * 2012-09-19 2013-02-06 浪潮(北京)电子信息产业有限公司 Data deduplication method
CN105912622A (en) * 2016-04-05 2016-08-31 重庆大学 Data de-duplication method for lossless compressed files
CN105930101A (en) * 2016-05-04 2016-09-07 中国人民解放军国防科学技术大学 Weak fingerprint repeated data deletion mechanism based on flash memory solid-state disk
CN106209113A (en) * 2016-07-29 2016-12-07 中国石油大学(华东) A kind of decoding method of polarization code
CN106775452A (en) * 2016-11-18 2017-05-31 郑州云海信息技术有限公司 A kind of data monitoring and managing method and system
US20180018235A1 (en) * 2016-07-15 2018-01-18 Quantum Corporation Joint de-duplication-erasure coded distributed storage
CN111177092A (en) * 2019-12-09 2020-05-19 成都信息工程大学 Deduplication method and device based on erasure codes

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102207939A (en) * 2010-03-31 2011-10-05 联想(北京)有限公司 Multi-hardware system data processing apparatus and method for deleting duplicated data
CN102915278A (en) * 2012-09-19 2013-02-06 浪潮(北京)电子信息产业有限公司 Data deduplication method
CN105912622A (en) * 2016-04-05 2016-08-31 重庆大学 Data de-duplication method for lossless compressed files
CN105930101A (en) * 2016-05-04 2016-09-07 中国人民解放军国防科学技术大学 Weak fingerprint repeated data deletion mechanism based on flash memory solid-state disk
US20180018235A1 (en) * 2016-07-15 2018-01-18 Quantum Corporation Joint de-duplication-erasure coded distributed storage
CN106209113A (en) * 2016-07-29 2016-12-07 中国石油大学(华东) A kind of decoding method of polarization code
CN106775452A (en) * 2016-11-18 2017-05-31 郑州云海信息技术有限公司 A kind of data monitoring and managing method and system
CN111177092A (en) * 2019-12-09 2020-05-19 成都信息工程大学 Deduplication method and device based on erasure codes

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PRITESHKUMAR PRAJAPATI ET AL.: "A Review on Secure Data Deduplication: Cloud Storage Security Issue", JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES, vol. 34, no. 7, XP055940137, DOI: 10.1016/j.jksuci.2020.10.021 *
贺秦禄;边根庆;邵必林;张维琪;: "移动闪存的重复数据删除技术", 西安电子科技大学学报, no. 01 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116991329A (en) * 2023-09-25 2023-11-03 深圳市明泰智能技术有限公司 Data redundancy prevention method and system for self-service terminal equipment
CN116991329B (en) * 2023-09-25 2023-12-08 深圳市明泰智能技术有限公司 Data redundancy prevention method and system for self-service terminal equipment

Also Published As

Publication number Publication date
CN115993939B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
CN108880556B (en) LZ 77-based lossless data compression method, error code recovery method, encoder and decoder
CN108768403B (en) LZW-based lossless data compression and decompression method, LZW encoder and decoder
CN101499806B (en) Encoding devices, decoding devices, encoding/decoding devices, and recording/reproduction devices
CN101656541B (en) Coding method and device of RS codes
US11258465B2 (en) Content aware decoding method and system
CN115993939B (en) Method and device for deleting repeated data of storage system
CN113297001B (en) RAID (redundant array of independent disks) coding and decoding method and coding and decoding circuit
CN114328000B (en) DNA storage cascade coding and decoding method for 1 type 2 type segment error correction inner code
CN113297000A (en) RAID (redundant array of independent disks) coding circuit and coding method
CN104242957A (en) Decoding processing method and decoder
JP6046403B2 (en) Encoding method and decoding method of error correction code
US10649841B2 (en) Supporting multiple page lengths with unique error correction coding via galois field dimension folding
JP7429223B2 (en) Turbo product code decoding method, device, decoder and computer storage medium
US10862512B2 (en) Data driven ICAD graph generation
KR20160075001A (en) Operating method of flash memory system
CN111464267A (en) Communication data checking method and device, computer equipment and storage medium
US10506388B1 (en) Efficient short message compression
CN113131947B (en) Decoding method, decoder and decoding device
CN114138543A (en) Data strip coding method, system, device and medium
CN112527548A (en) Flash memory controller, storage device and reading method
WO2020107301A1 (en) Encoding method, decoding method, and storage controller
CN113014267B (en) Decoding method, device, readable storage medium, chip and computer program product
CN113168360B (en) Data driven ICAD graphics generation
RU2811072C1 (en) Decoding method, decoder and decoding device
KR101906036B1 (en) Error detection method of lz78 compression data and encoder using the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Wan Chunyong

Inventor after: Luo Zhengcheng

Inventor before: Wan Chunyong

Inventor before: Luo Zhengji

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Method and device for deleting duplicate data in storage systems

Granted publication date: 20230609

Pledgee: Xi'an innovation financing Company limited by guarantee

Pledgor: Shaanxi Zhong'an Shulian Information Technology Co.,Ltd.

Registration number: Y2024980008036