CN111258502A - Data deleting method, device, equipment and computer readable storage medium - Google Patents

Data deleting method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN111258502A
CN111258502A CN202010031758.7A CN202010031758A CN111258502A CN 111258502 A CN111258502 A CN 111258502A CN 202010031758 A CN202010031758 A CN 202010031758A CN 111258502 A CN111258502 A CN 111258502A
Authority
CN
China
Prior art keywords
data block
data
metadata
fingerprint
deleting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010031758.7A
Other languages
Chinese (zh)
Inventor
张国军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202010031758.7A priority Critical patent/CN111258502A/en
Publication of CN111258502A publication Critical patent/CN111258502A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0625Power saving in storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Abstract

The embodiment of the application provides a data deleting method, a device, equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a pre-stored first data block; if a second data block identical to the first data block exists in a preset data model base, deleting the first data block; the preset data model library at least comprises data blocks with reference times exceeding a time threshold; determining metadata of the second data block as first metadata of the first data block, wherein the metadata of the second data block includes an identification of the second data block.

Description

Data deleting method, device, equipment and computer readable storage medium
Technical Field
The embodiment of the application relates to the technical field of computers, and relates to but is not limited to a data deleting method, a data deleting device, data deleting equipment and a computer-readable storage medium.
Background
With the popularization of flash memory media, compression and deduplication become a key technology in full flash memory storage, storage space can be saved for customers through compression and deduplication, and storage cost is reduced. The data deduplication is to calculate fingerprints of pre-stored data blocks through Hash algorithms such as a Secure Hash Algorithm 1(SHA1, Secure Hash Algorithm 1) and the like, compare the fingerprints with existing fingerprints in a fingerprint library, determine whether the same data blocks exist or not, and delete the pre-stored data blocks if the same data blocks exist, so as to achieve the purpose of deleting the repeated data. However, when the fingerprint is calculated by data deduplication, a strong hash algorithm such as SHA1 is usually adopted to calculate the fingerprint, and the calculation complexity is high; in addition, when identifying duplicate fingerprints, it is necessary to search for the existence of duplicate fingerprints from the number of fingerprints in the order of billions or even billions in the fingerprint library, which results in a relatively large consumption ratio of a Central Processing Unit (CPU) and a relatively large influence on the performance of storage.
Disclosure of Invention
In view of this, embodiments of the present application provide a data deleting method, apparatus, device and computer-readable storage medium.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a data deleting method, which comprises the following steps:
acquiring a pre-stored first data block;
if a second data block identical to the first data block exists in a preset data model base, deleting the first data block; the preset data model library at least comprises data blocks with reference times exceeding a time threshold;
determining metadata of the second data block as first metadata of the first data block, wherein the metadata of the second data block includes an identification of the second data block.
An embodiment of the present application provides a data deleting device, where the device includes:
the first acquisition module is used for acquiring a pre-stored first data block;
the first deleting module is used for deleting the first data block if a second data block which is the same as the first data block exists in a preset data model base; the preset data model library at least comprises data blocks with reference times exceeding a time threshold;
a first determining module, configured to determine metadata of the second data block as first metadata of the first data block, where the metadata of the second data block includes an identifier of the second data block.
An embodiment of the present application provides a data deletion apparatus, where the apparatus at least includes:
a processor; and
a memory for storing a computer program operable on the processor;
wherein the computer program realizes the steps of the data deletion method when being executed by a processor.
An embodiment of the present application provides a computer-readable storage medium, in which computer-executable instructions are stored, and the computer-executable instructions are configured to execute the steps of the data deleting method.
According to the data deleting method, the data deleting device, the data deleting equipment and the computer readable storage medium, the data blocks with the reference times exceeding the time threshold are added into the preset data model base, when the pre-stored first data block is obtained, if the same second data block exists in the preset data model base, the first data block is directly deleted, and therefore the deleting efficiency of data deleting can be improved, CPU consumption can be reduced, and storage performance is improved.
Drawings
In the drawings, which are not necessarily drawn to scale, like reference numerals may describe similar components in different views. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed herein.
Fig. 1 is a schematic flow chart of an implementation of a data deletion method according to an embodiment of the present application;
fig. 2 is a schematic diagram of an implementation process for establishing a preset data model library according to an embodiment of the present application;
FIG. 3 is a diagram illustrating a default database of data models according to an embodiment of the present application;
fig. 4 is a schematic flowchart of a process of performing pre-determination through a keyword according to an embodiment of the present application;
FIG. 5 is a schematic diagram of another default database of data models provided in accordance with an embodiment of the present application;
fig. 6 is a schematic flow chart illustrating an implementation of another data deletion method according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a data deleting device according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of another data deleting device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a data deletion apparatus according to an embodiment of the present application.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
The following description will be added if a similar description of "first \ second \ third" appears in the application file, and in the following description, the terms "first \ second \ third" merely distinguish similar objects and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may be interchanged under certain circumstances in a specific order or sequence, so that the embodiments of the application described herein can be implemented in an order other than that shown or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
The embodiment of the present application provides a data deleting method, which is applied to data deleting equipment, and the method provided by the embodiment can be implemented by a computer program, and when the computer program is executed, each step in the method provided by the embodiment is completed. In some embodiments, the computer program may be executed by a processor in a data deletion device. Fig. 1 is a schematic flow chart of an implementation of a data deletion method provided in an embodiment of the present application, and as shown in fig. 1, the method includes:
in step S101, the data deleting device obtains a pre-stored first data block.
In this embodiment of the present application, when the pre-stored first data block is a write request sent by an external device, the data block carried in the request. The data deleting device may obtain the pre-stored first data block by dividing data carried in the write request into n data blocks (where n is a positive integer) when receiving a write request of the data, so as to obtain the pre-stored first data block.
Step S102, if a second data block identical to the first data block exists in a preset data model base, deleting the first data block by the data deleting device.
In the embodiment of the application, after a pre-stored first data block is obtained, the first data block is compared with data blocks stored in a preset data model base, whether a second data block identical to the first data block exists is judged, and if the second data block identical to the first data block exists in the preset data model base, a data deleting device deletes the first data block. In other embodiments, the data blocks stored in the database may also be called data models.
In the embodiment of the application, the preset data model library at least comprises data blocks of which the reference times exceed a time threshold. It should be noted that each data block in the preset model database is a data block already stored in the storage space. In this embodiment of the present application, a time threshold may be set, for example, the set time threshold is 100 times, and a data block with reference times exceeding 100 times is added as a data model to the data model base.
In some embodiments, the preset data model base further includes a full-repeating data block, and since the full-repeating data block is also a data block with more data, the full-repeating data block is also added to the data model base as a data model, so as to obtain the preset data model base. Illustratively, a full-repeating data block includes: all 0 data blocks, all 1 data blocks, etc.
In the embodiment of the present application, when a data block whose number of references exceeds a threshold number and/or a full-duplication data block is added to a data model library, it is generally necessary to give a model Identifier (ID) to each data block, where each model ID represents a unique data block, and the model ID may be a number or a name. Illustratively, all 0 data blocks correspond to a model ID of 0, all 1 data blocks correspond to a model ID of 1, and each data block in the database is identified so that each data block has a corresponding model ID.
In the embodiment of the present application, determining whether a second data block identical to the first data block exists in a preset data model library may be implemented in the following manner:
the first method is as follows: and judging whether a second data block which is the same as the first data block exists in the data model base or not by comparing the bytes of the first data block with the bytes of each data block in the data model, and determining that the first data block is the same as the second data block when the bytes of the data blocks are the same. Of course, this approach is inefficient, so a second approach is proposed.
The second method comprises the following steps: the method comprises the steps of judging whether a target data block with the same key value exists in a data model base or not by obtaining a first key (key) value in the first data block, and comparing bytes of the target data block and the first data block when the target data block with the same key value exists in the data model base so as to determine whether the bytes of the first data block and the target data block are the same.
In the embodiment of the application, when a second data block identical to the first data block exists in a preset data model base, the data block identical to the first data block exists in the storage space, and therefore the first data block is directly deleted.
Step S103, the data deleting device determines the metadata of the second data block as the first metadata of the first data block, where the metadata of the second data block includes an identifier of the second data block.
In the embodiment of the present application, when a first data block is deleted, first metadata of the first data block needs to be saved, and since the first data block is the same as a second data block, metadata of the second data block is determined as metadata of the second data block. Since the metadata of the second data block includes the model ID of the second data block, when the metadata of the second data block is determined to be the first metadata of the first data block, the first metadata of the first data block also includes the model ID.
Bearing the above example, the second data block is all 0 data, all 0 data blocks correspond to model IDs of 0, and the first data block is also all 0 data, so the metadata of the second data block is determined as the first metadata of the first data block, i.e. the model ID of 0 is recorded in the first metadata of the first data block.
In some embodiments, the metadata of the second data chunk further includes a type of the second metadata, the type characterized in the database of data models. When the metadata of the second data block is determined to be the first metadata of the first data block, the first data block can be deleted by determining the type through data model base comparison. In the embodiment of the present application, deletion performed by the database model library is determined as a first type, and deletion performed by the fingerprint library is determined as a second type, where the first type is exemplarily indicated by the numeral 1 and the second type is indicated by the numeral 2. When the metadata of the first data block has a type number of 1, the characterization is a deletion by the database of data models.
According to the data deletion method provided by the embodiment of the application, the data blocks with the reference times exceeding the time threshold value are added into the preset data model base, when the pre-stored first data block is obtained, whether the same second data block exists in the preset data model base or not is judged, and when the same second data block exists, the first data block is directly deleted.
In some embodiments, before step S101, the method further comprises: step S100, establishing a preset data model base, in this embodiment of the present application, step S100, establishing a preset data model base may be implemented through the following steps, and fig. 2 is a schematic diagram of an implementation flow for establishing a preset data model base provided in this embodiment of the present application, as shown in fig. 2, including:
step S100A, the data deleting device determines the reference times of each fingerprint in the fingerprint library; and the fingerprints in the fingerprint database correspond to the stored data blocks one by one.
In the embodiment of the application, each fingerprint in the fingerprint database is calculated by a hash algorithm based on each stored data block. The data deletion device may set a reference count field for each fingerprint in the fingerprint library, the reference count field representing the number of times the data block of each fingerprint is referenced. When the data block is written for the first time, the reference count is 1, repeated writing is carried out later, data is not repeatedly written, original data are directly referred, and the reference count is increased by 1, so that the reference times of each data block are obtained. The larger the number of references is, the more times of writing is indicated, that is, the data block is a frequently-occurring data block, and therefore, the data block with the larger number of references is added to the data model library as a data model. In the embodiment of the application, the data deleting device can acquire the number of times of reference of each fingerprint in the fingerprint library.
In step S100B, the data deletion apparatus determines a fingerprint in which the number of references exceeds the number threshold as a target fingerprint.
In this embodiment of the application, when the data deletion device acquires the number of times of reference of each fingerprint in the fingerprint library, a number threshold may be set to select the fingerprints in the fingerprint library, for example, the number threshold is 200 times, and fingerprints whose number of times of reference exceeds 200 times are determined as the target fingerprint.
Step S100C, the data deleting device adds the data block corresponding to the target fingerprint into a data model library to obtain a preset data model library.
Taking advantage of the above example, the data blocks corresponding to the target fingerprints of the fingerprints with the reference times exceeding 200 times are added into the database of the data model, so as to obtain the preset data model. In some embodiments, the full replication data blocks may also be added to a preset data model library. Fig. 3 is a schematic diagram of a preset data model library provided in an embodiment of the present application, and as shown in fig. 3, the preset data model library includes full-repetition data blocks and data blocks whose reference times exceed a time threshold. Wherein the metadata of each data block comprises: model ID and key value.
According to the method provided by the embodiment of the application, the data blocks with the reference times exceeding the time threshold value are added into the preset data model base, so that when the pre-stored first data block is obtained, judgment is carried out through the preset data model base, and the deleting efficiency of the data blocks with high occurrence frequency is improved.
In some embodiments, after step S101 and before step S102, the method further comprises: and pre-judging the first data block through a key value. Further, pre-judging the first data block by the keyword may be implemented by steps S401 to S403 in fig. 4.
Step S401, the data deleting device obtains a first key value of the first data block.
In the embodiment of the present application, generally, the manner of obtaining the first key value of the first data block needs to be the same as the manner of obtaining the first key value of each data block in the data model library. For example, if each data block in the data model library is a key value of each data model determined according to a hash algorithm, obtaining the first key value of the first data block also needs to be determined by using the hash algorithm; if each data block in the data model library is a key value determined according to M bytes corresponding to M preset positions of each data block, determining the first key value of the first data block is a first key value determined by also extracting M bytes corresponding to M preset positions in the first data block. It should be noted that the key values corresponding to the data blocks in the data model library are all different.
For example, fig. 5 is a schematic diagram of another preset data model library provided in the embodiment of the present application, and as shown in fig. 5, there are two data blocks in the data model library, including: data block 1 and data block 2, wherein data block 1 is AB12C34DE, data block 2 is AB57C89DE, at this time, the key value determined according to M bytes corresponding to M preset positions is firstly extracted from the front, middle and rear parts, each 1 byte composition is used as the key value, if the key values of 1 st, 5 th and 9 th bytes are extracted, the key values of the two data models are the same, and are not advisable, and the 1 st, 4 th and 9 th bytes are extracted instead. At this time, the key value of data block 1 is: A2E, the key value for data chunk 2 is A7E. And correspondingly storing the obtained key values in the metadata of each data block. That is, data chunk 1 corresponds to model ID of 1001 and key of A2E, and data chunk 2 corresponds to model ID of 1002 and key of A7E. When the first data block is received, the 1 st, 4 th and 9 th bytes of the first data block also need to be extracted to form a key value. An example first data block is 123456789. The same 1 st, 4 th and 9 th bytes of the extracted first data block constitute a key value of 149. For another example, the first data block is: AB1234CDE, also extracts bytes 1, 4 and 9 of the first data block to form a key value of A2E, and for example the first data block is: AB57C89DE, also extracts bytes 1, 4 and 9 of the first data block to form a key value of A7E.
Step S402, if a target data block with the same value as the first key value exists in the data model base, the data deleting device compares the target data block with the first data block in bytes.
In the embodiment of the application, as each data block in the data model base has a corresponding key value, when the first key value of the first data block is obtained, whether a target data block with the same key value as the first key value exists in the data model can be judged. Taking the above example in mind, when the first data block is 123456789, the first key value is 149. There is no target data block in the database model library that is the same as 149 at this time. When the first data block is AB1234CDE and the first key value is A2E, there is a target data block AB12C34DE corresponding to A2E in the data model. In the embodiment of the application, when a target data block identical to the first key value exists in the data model base, the target data block and the first data block are subjected to byte comparison. In this embodiment of the present application, if the bytes of the first data block and the target data block are the same, step S403 is executed. And if the bytes of the first data block are different from the bytes of the target data block, judging whether to delete or compress the data block through a fingerprint database.
In step S403, if the bytes of the first data block and the target data block are the same, the target data block is determined to be a second data block that is the same as the first data block.
In this embodiment of the application, when a target data block having the same value as the first key value exists in the data model base, the data deleting device may perform byte comparison between the first data block and the target data block to determine whether each byte of the first data block is the same as that of the target data block. Taking the above example, the first data block is AB1234CDE, the target data block is AB12C34DE, and the first data block is not identical to the target data block by byte alignment, so that there is no second data block identical to the first data block in the database model at this time. Another example is: if the first data block is AB1234CDE, the target data block is AB1234 CDE. At this time, it may be determined that the first data block is the same as the target data block, and at this time, it may be determined that the target data block is the second data block that is the same as the first data block.
In the method provided by the embodiment of the application, when the pre-stored first data block is obtained, whether the same first key exists in the data model base is judged by obtaining the first key value of the pre-stored first data block, when the same first key value exists, the target data block is determined, and byte comparison is performed on the target data block and the first data block to determine whether the first data block is the same as the target data block. By introducing the key value, some data blocks which are not the preset data model base are quickly removed, the only one target data block which is possibly matched is directly obtained for byte matching, and then the comparison efficiency of the data blocks is improved, so that the deleting efficiency of the data blocks with high occurrence frequency is improved, the CPU consumption is reduced, and the storage performance is also improved.
Fig. 6 is a schematic flow chart illustrating an implementation of another data deletion method provided in the embodiment of the present application, and as shown in fig. 6, the method includes:
in step S601, the data deleting device acquires a pre-stored first data block.
Step S602, the data deleting device obtains a first key value of the first data block.
In this embodiment of the application, the first key value is determined by extracting M bytes corresponding to M preset positions in the first data block.
Step S603, the data deleting device determines whether a target data block identical to the first key value exists in the data model base.
In the embodiment of the application, the key value of each data block in the data model base is also determined by extracting M bytes corresponding to M preset positions in each data block.
In this embodiment of the application, if there is a target data block in the data model base that is the same as the first key value, that is, it indicates that there may be a second data block in the data model base, then step S604 is performed. If the target data block identical to the first key value does not exist in the database of data models, that is, it indicates that the identical second data block does not exist in the database of data models, then step S608 is performed.
Step S604, the data deleting device performs byte comparison on the target data block and the first data block, and determines whether each byte of the first data block is the same as that of the target data block.
In this embodiment of the application, if the bytes of the first data block and the target data block are the same, that is, the same target data block exists in the data model library, then step S605 is performed. If the bytes of the first data block and the target data block are not the same, it indicates that there is no identical second data block in the data model library, and then step S607 is performed.
In step S605, the data deleting device determines the target data block as a second data block identical to the first data block.
Step S606, the data deleting device deletes the first data block.
In the embodiment of the present application, since the second data block identical to the first data block already exists, in order to save the storage space, the first data block is deleted.
In step S607, the data deleting device determines the metadata of the second data block as the first metadata of the first data block.
In the embodiment of the application, after a first data block is deleted, only the metadata of the first data block needs to be recorded, and the metadata of a second data block is determined as the first metadata of the first data block, that is, the metadata of the first data block is the same as the metadata of the second data block, and when the first data block is read, the first data block is replaced by reading the second data block. In the embodiment of the present application, when step S607 is completed, the process ends.
In step S608, the data deletion apparatus determines whether a third data block identical to the first data block exists in a storage space other than the data model library.
In the embodiment of the present application, since there is no second data block identical to the first data block in the data model library, it can only be determined by the fingerprint library whether there is a third data block identical to the first data block in the storage space other than the data model library. In the embodiment of the present application, the storage space outside the data model library may be regarded as a data storage space of a disk or a physical tape. Each data block in the data storage space of a disk or physical tape has a one-to-one fingerprint in the fingerprint repository. Therefore, in step S608, the data deleting device determines whether there is a third data block identical to the first data block, and determines whether there is a third data block identical to the first data block.
In this embodiment of the application, if a fingerprint identical to the first fingerprint exists in the fingerprint database, that is, if a third data block identical to the first data block exists, step S609 is performed, and if a fingerprint identical to the first fingerprint does not exist in the fingerprint database, that is, if a third data block identical to the first data block does not exist, step S611 is performed.
In step S609, the data deleting device deletes the first data block.
In the embodiment of the application, when the third data block exists, the first data block is deleted, so that the storage space is saved.
In step S610, the data deleting device determines the metadata of the third data block as the second metadata of the first data block.
In this embodiment of the present application, the metadata of the third data includes a storage location of the third data block, that is, when the metadata of the third data block is determined as the second metadata of the first data block, the first metadata of the first data block includes the storage location, and when there is a read request, the third data block is directly read through the storage location. After step S610 is performed, the flow ends.
Step S611, the data deleting device compresses the first data block to obtain a compressed first data block.
In the embodiment of the present application, since there is no third data block in the storage space, which is the same as the first data block, that is, the first data block is a new data block, at this time, in order to save space, the first data block is compressed.
Step S612, the data deleting device stores the compressed first data block.
In this embodiment of the present application, when the data deleting device stores the compressed first data block, the compressed first data block is stored in a storage space other than the data model library. Illustratively, the first data block is stored in a data storage space of a disk or a physical tape. After step S612 is executed, the flow ends.
According to the data deleting method provided by the embodiment of the application, when the pre-stored first data block is obtained, key value judgment is firstly carried out, byte comparison is then carried out, if the first data block is not in the data model base, judgment is carried out through the fingerprint base, whether re-deletion or compression is carried out is determined according to the fingerprint judgment result, the deleting efficiency of the data block with high occurrence frequency can be improved, the CPU consumption can be reduced, and the storage performance is improved.
In this embodiment, when writing an upper layer service, the model data deleting device processes the data, and the compressing and deduplication device processes the data that does not conform to the preset data model.
Fig. 7 is a schematic structural diagram of a data deleting device according to an embodiment of the present application, as shown in fig. 7, where a model data deleting apparatus 710 includes: the data model library 711, the preprocessing module 712 and the model identification module 713, wherein the compression and deduplication device 720 comprises the following modules: a compression and deduplication module 721. Data interaction between the data model library 711, the preprocessing module 712, the model identification module 713, and the compression and deduplication module 721 is described below with reference to fig. 7.
For data written by upper layer services, the data blocks not in the data model base 711 are quickly identified through the processing of the preprocessing module 712, and the data blocks not in the data model base 711 are processed by the compression and deduplication module 721.
After preprocessing, the data blocks in the data model library 711 are delivered to the model identification module 713, and the model identification module 713 compares the data blocks with the data models preset in the data model library 711 in bytes.
In this embodiment, if all bytes of the data block are the same as those of the preset data model in the data model library 711, the data block is the same as the preset data model, and the model data is deleted. If the data block does not pass the preset data model byte comparison in the data model base 711, the compression and deduplication module 721 is used to perform the compression and deduplication processing.
In the embodiment of the application, if model data deletion is adopted, only metadata is written, identification is carried out through the type of the metadata and the model ID, the type field indicates whether the data block adopts the model data deletion or the compression and the deduplication, and the model ID field indicates which preset data model the data block is the same with. If compression and deduplication are used, the physical address field in the metadata indicates the actual storage location of the block of data on the physical medium.
The data model library is the most critical part in the embodiment of the present application, and which data models are selected to be placed in the data model library and directly relate to the hit probability when the model data is deleted, in the embodiment of the present application, the following two types of data models are selected as preset data models, and with continued reference to fig. 4, the data model library includes:
full-repetition data blocks, for example: all 0 data blocks, all 1 data blocks, … …, covering all the full repetition data blocks;
and finding out a plurality of data blocks with the highest reference times from the compression and deduplication module as preset data blocks when the data blocks with the high occurrence times of the client environment appear. The data blocks with high occurrence frequency in the client environment need to be found from the existing data of the client and are included in the data model base, so that the purpose of improving the deletion hit probability of the model data is achieved. In the fingerprint database of the compression and deduplication module, each fingerprint represents a unique data block, a reference count field is set in the fingerprint database, and the reference field represents the number of times the data block of each fingerprint is referenced: when data is written for the first time, the reference count is 1, repeated writing is carried out later, the data is not repeatedly written, the original data is directly referred, and the reference count is increased by 1.
In the embodiment of the application, the data blocks with the reference counts exceeding the threshold value are found by searching the fingerprint database, and the data blocks are incorporated into the data model database.
The data model library is internally provided with a plurality of preset data models, byte comparison is carried out one by one, and the method is a very low-efficiency means, so that some data blocks which are not the preset data models are quickly eliminated through a preprocessing step, and only one data model which is possibly matched is directly obtained to carry out byte matching.
The preprocessing is carried out in a key comparison-based mode, firstly, a unique key is generated for each preset model (each data block in a data model base), a key value is generated by adopting the same method when the upper layer applies the written data block (first data block), whether the data models of the data model base have the same key or not is searched, if yes, the data model corresponding to the key is obtained, and the next step of model identification is carried out for byte comparison processing.
There are many ways to generate keys, and in the embodiment of the present application, a simple and efficient scheme is adopted: at least 1 byte is extracted as a key from the front, middle, and rear of the data block.
The following is an example of how the pretreatment is carried out:
in the data model library, there are 2 preset data models:
data model 1: AB12C34 DE;
data model 2: AB57C89 DE;
firstly, 1 byte is extracted from the front part, the middle part and the rear part of each data model to be used as a key, if the 1 st, 5 th and 9 th bytes are extracted, the keys of the 2 data models are the same, and the 1 st, 4 th and 9 th bytes are extracted instead. After the 1 st, 4 th and 9 th bytes are extracted as keys, the data model in the model library is as follows:
the upper layer application writes the data block, and also extracts the 1 st, 4 th and 9 th bytes of the data block as keys, if the upper layer application writes the 3 th data block, as follows:
the key of data 1 is 149, and the key does not exist in the model library, so that the data 1 does not conform to the model deletion;
the key of the data 2 is A2E, is the same as the key of the data model 1001, and enters the next step of model identification for byte comparison;
the key of the data 3 is A7E, is the same as the key of the data model 1002, and enters the next step of model identification for byte comparison;
after preprocessing, finding out a unique data model which is possibly matched from a data model base, then comparing bytes, if the byte comparison is completely consistent, directly deleting the model, only writing metadata, and only recording the model ID of the data block in the metadata. If the byte comparison is not the same, the data is sent to a compression and deduplication module to carry out normal compression and deduplication processing.
In the embodiment of the application, data blocks with high occurrence frequency are brought into a data model base, a unique key is generated for each data model in the data model base, the key is calculated by the same method by the upper layer application, whether the same key exists in the model base is searched, if yes, the data model corresponding to the key is obtained for byte comparison, when the byte comparison is completely the same, the data model is matched, then model data deletion is carried out, and only metadata is written. Compared with the existing data deduplication method, the model data deduplication method is simpler in calculation and lower in CPU consumption, the CPU consumption can be reduced by deleting the model data aiming at some data blocks with high occurrence frequency, and the overall storage performance is improved.
Based on the foregoing embodiments, the embodiments of the present application provide a data deleting device, where each module included in the device and each unit included in each module may be implemented by a processor in a computer device; of course, the implementation can also be realized through a specific logic circuit; in the implementation process, the processor may be a Central Processing Unit (CPU), a Microprocessor Unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.
Fig. 8 is a schematic structural diagram of another data deletion apparatus provided in an embodiment of the present application, and as shown in fig. 8, the data deletion apparatus 800 includes:
a first obtaining module 801, configured to obtain a pre-stored first data block;
a first deleting module 802, configured to delete the first data block if a second data block that is the same as the first data block exists in a preset data model library; the preset data model library at least comprises data blocks with reference times exceeding a time threshold;
a first determining module 803, configured to determine metadata of the second data block as first metadata of the first data block, where the metadata of the second data block includes an identifier of the second data block.
In some embodiments, the data deleting apparatus 800 further includes:
the second determination module is used for determining the reference times of each fingerprint in the fingerprint library; wherein, the fingerprints in the fingerprint database correspond to the stored data blocks one by one;
a third determining module, configured to determine, as the target fingerprint, a fingerprint in which the number of times of reference exceeds a number threshold;
and the adding module is used for adding the data block corresponding to the target fingerprint into a data model base to obtain a preset data model base.
In some embodiments, the data deleting apparatus 800 further includes:
a second obtaining module, configured to obtain a first key value of the first data block;
the comparison module is used for comparing the bytes of the target data block with the bytes of the first data block if the target data block with the same first key value exists in the data model base;
a fourth determining module, configured to determine the target data block as a second data block that is the same as the first data block if the bytes of the first data block and the target data block are the same.
In some embodiments, the second obtaining module comprises:
the first extraction unit is used for extracting M bytes corresponding to M preset positions in the first data block, wherein M is a positive integer;
a first determining unit, configured to determine a first key value of the first data block according to the M bytes.
In some embodiments, the data deleting apparatus 800 further includes:
a first determining module, configured to determine whether a third data block that is the same as the first data block exists or not if a target data block that is the same as the first key value does not exist in the data model database, or if a second data block that is the same as the first data block does not exist in the model database;
a second deleting module, configured to delete the first data block if a third data block identical to the first data block exists;
a fifth determining module, configured to determine metadata of the third data block as second metadata of the first data block, where the metadata of the third data includes a location where the third data block is stored.
In some embodiments, the first determining module comprises:
a second determining unit for determining a first fingerprint of the first data block;
and the third determining unit is used for determining that a third data block identical to the first data block exists if the fingerprint identical to the first fingerprint exists in the fingerprint database.
In some embodiments, the data deleting means further comprises:
the compression module is used for compressing the first data block to obtain a compressed first data block if the third data block which is the same as the first data block does not exist in the fingerprint database;
and the storage module is used for storing the compressed first data block.
The above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.
According to the data deleting device provided by the embodiment of the application, the data blocks with the reference times exceeding the time threshold are added into the preset data model base, when the pre-stored first data blocks are obtained through the first obtaining module, if the same second data blocks exist in the preset data model base, the first data blocks are directly deleted through the first deleting module, and therefore the data deleting device deletes some data blocks with high occurrence frequency according to the data model, the data deleting efficiency can be improved, the CPU consumption can be reduced, and the storage performance is improved.
It should be noted that, in the embodiment of the present application, if the data deleting method is implemented in the form of a software functional module and is sold or used as a standalone product, the data deleting method may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.
Accordingly, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps in the data deleting method provided in the above embodiment.
An embodiment of the present application provides a data deleting device, fig. 9 is a schematic diagram of a composition structure of the data deleting device provided in the embodiment of the present application, and as shown in fig. 9, the data deleting device 900 includes: a processor 901, at least one communication bus 902, a user interface 903, at least one external communication interface 904 and memory 905. Wherein the communication bus 902 is configured to enable connective communication between these components. The user interface 903 may include a display screen, and the external communication interface 904 may include a standard wired interface and a wireless interface, among others. Wherein the processor 901 is configured to execute the program of the data deleting method stored in the memory to realize the steps in the data deleting method provided in the above embodiment
The above description of the data deletion apparatus and storage medium embodiments is similar to the description of the method embodiments described above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the data deletion apparatus and storage medium of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.
Here, it should be noted that: the above description of the storage medium and device embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.
Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing an AC to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.
The above description is only for the embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for deleting data, comprising:
acquiring a pre-stored first data block;
if a second data block identical to the first data block exists in a preset data model base, deleting the first data block; the preset data model library at least comprises data blocks with reference times exceeding a time threshold;
determining metadata of the second data block as first metadata of the first data block, wherein the metadata of the second data block includes an identification of the second data block.
2. The method of claim 1, further comprising:
determining the number of times of reference of each fingerprint in a fingerprint library; wherein, the fingerprints in the fingerprint database correspond to the stored data blocks one by one;
determining the fingerprint with the reference times exceeding a time threshold value as a target fingerprint;
and adding the data block corresponding to the target fingerprint into a data model base to obtain a preset data model base.
3. The method of claim 1, further comprising:
acquiring a first key value of the first data block;
if a target data block identical to the first key value exists in the data model base, carrying out byte comparison on the target data block and the first data block;
and if the bytes of the first data block are the same as those of the target data block, determining the target data block as a second data block which is the same as the first data block.
4. The method of claim 3, wherein obtaining the first key value of the first data block comprises:
extracting M bytes corresponding to M preset positions in the first data block, wherein M is a positive integer;
and determining a first key value of the first data block according to the M bytes.
5. The method of claim 3, further comprising:
if the target data block which is the same as the first key value does not exist in the data model base; or if no second data block which is the same as the first data block exists in the model database, judging whether a third data block which is the same as the first data block exists in a storage space except the data model database;
if a third data block identical to the first data block exists; deleting the first data block;
determining metadata of the third data block as second metadata of the first data block, wherein the metadata of the third data includes a location where the third data block is stored.
6. The method of claim 5, wherein the determining whether a third data block identical to the first data block exists comprises:
determining a first fingerprint of the first data block;
and if the fingerprint identical to the first fingerprint exists in the fingerprint database, determining that a third data block identical to the first data block exists.
7. The method of claim 6, further comprising:
if the third data block which is the same as the first data block does not exist in the fingerprint database, compressing the first data block to obtain a compressed first data block;
and storing the compressed first data block.
8. An apparatus for deleting data, the apparatus comprising:
the first acquisition module is used for acquiring a pre-stored first data block;
the first deleting module is used for deleting the first data block if a second data block which is the same as the first data block exists in a preset data model base; the preset data model library at least comprises data blocks with reference times exceeding a time threshold;
a first determining module, configured to determine metadata of the second data block as first metadata of the first data block, where the metadata of the second data block includes an identifier of the second data block.
9. A data deletion apparatus, characterized in that the apparatus comprises at least:
a processor; and
a memory for storing a computer program operable on the processor;
wherein the computer program when executed by a processor implements the steps of the data deletion method of any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon computer-executable instructions configured to perform the steps of the data deletion method of any of claims 1 to 7.
CN202010031758.7A 2020-01-13 2020-01-13 Data deleting method, device, equipment and computer readable storage medium Pending CN111258502A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010031758.7A CN111258502A (en) 2020-01-13 2020-01-13 Data deleting method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010031758.7A CN111258502A (en) 2020-01-13 2020-01-13 Data deleting method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111258502A true CN111258502A (en) 2020-06-09

Family

ID=70945210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010031758.7A Pending CN111258502A (en) 2020-01-13 2020-01-13 Data deleting method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111258502A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130151756A1 (en) * 2011-12-07 2013-06-13 Quantum Corporation Data de-duplication and solid state memory device
US20140325147A1 (en) * 2012-03-14 2014-10-30 Netapp, Inc. Deduplication of data blocks on storage devices
CN107229420A (en) * 2017-05-27 2017-10-03 郑州云海信息技术有限公司 Date storage method, read method, delet method and data operation system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130151756A1 (en) * 2011-12-07 2013-06-13 Quantum Corporation Data de-duplication and solid state memory device
US20140325147A1 (en) * 2012-03-14 2014-10-30 Netapp, Inc. Deduplication of data blocks on storage devices
CN107229420A (en) * 2017-05-27 2017-10-03 郑州云海信息技术有限公司 Date storage method, read method, delet method and data operation system

Similar Documents

Publication Publication Date Title
US8751462B2 (en) Delta compression after identity deduplication
US9690802B2 (en) Stream locality delta compression
Breitinger et al. A fuzzy hashing approach based on random sequences and hamming distance
CN103098035B (en) Storage system
US8010502B2 (en) Methods and systems for data recovery
US20050210054A1 (en) Information management system
EP2657884B1 (en) Identifying multimedia objects based on multimedia fingerprint
KR20090075885A (en) Managing storage of individually accessible data units
US9355250B2 (en) Method and system for rapidly scanning files
CN112116436B (en) Intelligent recommendation method and device, computer equipment and readable storage medium
CN109271545B (en) Feature retrieval method and device, storage medium and computer equipment
CN109360605B (en) Genome sequencing data archiving method, server and computer readable storage medium
US20140012879A1 (en) Database management system, apparatus, and method
Pahade et al. A survey on multimedia file carving
Billard et al. Making sense of unstructured flash-memory dumps
CN106484691B (en) data storage method and device of mobile terminal
CN111061428A (en) Data compression method and device
CN111258502A (en) Data deleting method, device, equipment and computer readable storage medium
CN107943849B (en) Video file retrieval method and device
CN113312619B (en) Malicious process detection method and device based on small sample learning, electronic equipment and storage medium
CN104637496A (en) Computer system and audio comparison method
US8265428B2 (en) Method and apparatus for detection of data in a data store
CN114138552B (en) Data dynamic repeating and deleting method, system, terminal and storage medium
US11494093B2 (en) Method and apparatus for processing data of in-memory database
CN117112846B (en) Multi-information source license information management method, system and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination