CN108121504B - Data deleting method and device - Google Patents

Data deleting method and device Download PDF

Info

Publication number
CN108121504B
CN108121504B CN201711137647.9A CN201711137647A CN108121504B CN 108121504 B CN108121504 B CN 108121504B CN 201711137647 A CN201711137647 A CN 201711137647A CN 108121504 B CN108121504 B CN 108121504B
Authority
CN
China
Prior art keywords
data
count
reference count
upper limit
limit value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711137647.9A
Other languages
Chinese (zh)
Other versions
CN108121504A (en
Inventor
田文刚
游俊
徐林波
陈亮
张立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Huawei Technology Co Ltd
Original Assignee
Chengdu Huawei Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Huawei Technology Co Ltd filed Critical Chengdu Huawei Technology Co Ltd
Priority to CN201711137647.9A priority Critical patent/CN108121504B/en
Publication of CN108121504A publication Critical patent/CN108121504A/en
Application granted granted Critical
Publication of CN108121504B publication Critical patent/CN108121504B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data deleting method and device, and belongs to the field of computers. The method comprises the following steps: when an operation request of first data is acquired and second data which is repeated with the first data is stored, acquiring a reference count of the second data; the operation request is a write request or a delete request, and the reference count is used for indicating the number of times the second data is referred to; determining whether the reference count reaches a count upper limit value; when the reference count reaches the upper limit value of the count, maintaining the reference count as the upper limit value of the count; the problem that the memory space occupied by reference counting is large can be solved; since the reference count is no longer increased with the increase of the write request, the storage space occupied by the reference count can be maintained in a smaller range, and therefore, only a smaller storage space needs to be set for the reference count, and the effect of saving the storage space occupied by the reference count can be achieved.

Description

Data deleting method and device
Technical Field
The embodiment of the application relates to the field of computers, in particular to a data deleting method and device.
Background
Data De-duplication (Data De-duplication) is also a weighing deletion, a Data storage technique. For the repeated data in the data set, only one part of the repeated data can be reserved through a data de-duplication technology, and other data in the repeated data are deleted, so that redundant data can be eliminated.
The process of the data deleting device for deleting the data again comprises the following steps: 1. dividing a file into at least one block of data, and calculating fingerprints (Fingerprint, FP) of each block of data by adopting a Hash algorithm; 2. for each fingerprint of the data, comparing the fingerprint with the stored fingerprint; 3. if the fingerprint already exists, the data block is indicated as repeated data, the data does not need to be stored, and the reference count corresponding to the data is increased by 1; if the fingerprint does not exist, the data is unique, and the data and the corresponding fingerprint are stored. Wherein the reference count is used to indicate a number of times the data is referenced.
When the number of times of referencing a certain block of data is gradually increased, the number of reference counts is increased, and at this time, a larger storage space needs to be allocated for the reference counts, and the storage space occupied by the reference counts is larger.
Disclosure of Invention
The application provides a data deleting method and device, which can solve the problem that the memory space occupied by reference counting is large.
In a first aspect, an embodiment of the present application provides a data deletion method, where the data deletion method includes: when an operation request of first data is acquired and second data which is repeated with the first data is stored, acquiring a reference count of the second data; determining whether the reference count reaches a count upper limit value; and when the reference count reaches the upper limit value of the count, maintaining the reference count as the upper limit value of the count. Wherein the operation request is a write request or a delete request, and the reference count is used to indicate the number of times the second data is referenced.
When the write request is obtained and second data which is repeated with the first data is stored, obtaining a reference count of the second data; determining whether the reference count reaches a count upper limit value; when the reference count reaches the upper limit value of the count, maintaining the reference count as the upper limit value of the count; the problem that the memory space occupied by reference counting is large can be solved; since the reference count is no longer increased with the increase of the write request, the storage space occupied by the reference count can be maintained in a smaller range, and therefore, only a smaller storage space needs to be set for the reference count, and the effect of saving the storage space occupied by the reference count can be achieved.
In addition, when a deletion request for first data is acquired, and the first data and stored second data are acquired, a reference count of the second data is acquired; determining whether the reference count reaches a count upper limit value; when the reference count reaches the upper limit value of the count, maintaining the reference count as the upper limit value of the count; the problem of data recovery error caused by inaccurate reference counting can be solved; after the reference count reaches the upper limit of the count, the reference count is not reduced according to the deletion request, namely, the reference count is not reduced to 0, so that the data deletion device does not recover the data block corresponding to the data because the reference count reaches 0, the problem that the data block is recovered by mistake to cause the loss of the stored data is avoided, and the accuracy of recovering the data block can be ensured.
Optionally, when the reference count reaches the count upper limit value, the method further includes: scanning whether a block address pointing to the second data exists; when there is no block address pointing to the second data, the second data and the reference count of the second data are deleted.
When the reference count of the second data reaches the upper limit of the count, the reference count of the second data does not change any more no matter whether the received operation request is a write request of the second data or a delete request of the second data, and therefore, if the data deleting device recovers the data block corresponding to the second data when the reference count is 0, the data deleting device may not recover the data block corresponding to the second data. In this embodiment, whether a block address pointing to the second data exists is scanned, and when the block address pointing to the second data does not exist, it is described that the number of times that the second data is actually referred to is 0, and at this time, the reference count of the second data and the second data is deleted, so that the problem that a data block corresponding to data whose reference count reaches the count upper limit value cannot be recovered is solved; whether the block address pointing to the second data does not exist through full disk scanning or not enables the data deleting device to determine whether the second data is actually referred or not, and recycling of the data block corresponding to the data with the reference count reaching the upper limit of the count is achieved.
Optionally, the count upper limit value is determined according to a number of bits of a storage space in the memory for storing the reference count, and a value range of the number of bits is greater than or equal to 1 bit and less than or equal to 32 bits.
By setting a smaller bit number for the memory space of the reference count, the reference count does not occupy excessive memory space, and the effect of saving the memory space occupied by the reference count can be achieved.
In addition, when the data deleting device caches the reference count in the form of metadata, the cache space occupied by the reference count can be reduced by setting a smaller number of bits for the reference count, so that more metadata can be cached in the cache.
Optionally, when the operation request is a write request, after determining whether the reference count reaches the count upper limit value, the method further includes: when the reference count does not reach the count upper limit value, the reference count is increased by 1.
Optionally, when the operation request is a delete request, after determining whether the reference count reaches the count upper limit value, the method further includes: when the reference count does not reach the count upper limit value, the reference count is decremented by 1.
Optionally, after subtracting 1 from the reference count, the method further includes: determining whether the reference count reaches 0; when the reference count reaches 0, the second data and the reference count are deleted.
In a second aspect, an embodiment of the present application provides a data deleting device, where the data deleting device includes at least one unit, and the at least one unit is configured to implement the data deleting method provided in the first aspect.
In a third aspect, an embodiment of the present application provides a data deleting device, where the data deleting device includes: one or more processors, and a memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for implementing the data deletion method according to the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, in which one or more programs are stored, and when the one or more programs are executed, the method for deleting data provided in the first aspect is implemented.
Drawings
Fig. 1 is a schematic structural diagram of a data block group according to an embodiment of the present application;
FIG. 2 is a flow chart of a data deletion method provided by an embodiment of the present application;
FIG. 3 is a schematic diagram of a reference count acquisition process provided by one embodiment of the present application;
FIG. 4 is a flow chart of a data deletion method provided by another embodiment of the present application;
FIG. 5 is a flow chart of a data deletion method provided by another embodiment of the present application;
FIG. 6 is a block diagram of a data deletion apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a data deletion apparatus according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The use of "first," "second," and similar terms in the description and claims of this application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. Also, the use of the terms "a" or "an" and the like do not denote a limitation of quantity, but rather denote the presence of at least one.
First, several terms referred to in the present application will be described.
Data De-duplication (Data De-duplication) is also a weighing deletion, a Data storage technique. Deduplication is commonly used in disk-based backup systems to reduce redundant data in storage systems and improve the utilization of storage space. For the repeated data in the data set, only one part of the repeated data can be reserved through a data de-duplication technology, and other data in the repeated data are deleted, so that redundant data can be eliminated.
Scenarios where duplicate data may occur include, but are not limited to:
1. for locally stored files originating from the same host operating system, at least two files with the same name but different contents are included; and/or, at least two files with the same content but different names; and/or, at least two files with the same content only in part are included;
2. for locally stored files originating from different host operating systems, at least two files are included in a duplicate;
3. for files stored in other storage devices and originated from different host operating systems, at least two repeated files are included;
4. a repeated part in a disk image file under the virtual machine environment;
5. repeating data in a bare disk mapping mode in a virtual machine environment.
Optionally, the present application takes as an example that the execution subject of each embodiment is a data deleting device with deduplication capability, where the data deleting device includes but is not limited to: a mobile phone, a tablet computer, a wearable device, a smart home, a laptop portable computer, a desktop computer, and other terminals; alternatively, a server; or network devices such as base stations, routers, switches, etc.
Alternatively, the data deleting device may also be referred to as a deduplication device, or the like, and the name of the data deleting device is not limited in this embodiment.
Optionally, an internal memory is installed in the data deleting device, and the internal memory is used for storing the data acquired by the data deleting device, and the data deleting device stores the data in the internal memory by using a data deduplication technology; alternatively, the data deleting means may be connected to an external storage mounted in a device separate from the data deleting means, and the data deleting means may store data in the external storage by a deduplication technology.
Alternatively, the Memory may be a Random Access Memory (RAM), a hard disk, a removable hard disk, an optical disk, or other forms of storage media known in the art, and the form of the Memory is not limited in this embodiment.
When supplementary note is needed, all the memories mentioned in the present application are memories storing data by a data deleting device unless otherwise specified.
Illustratively, in the present application, each partition of the memory is divided into a plurality of data block groups, each data block group including a data area and a data index area. The start of the data area is an array formed by fingerprints and reference counts, and other parts of the data area are used for storing data corresponding to the fingerprints in a block form; the data index area is used for describing data of the file to be stored in which data blocks. Of course, each data block group may also include other contents, and this embodiment is not listed here.
Optionally, the reference count and the fingerprint are both stored in memory in the form of Metadata (Metadata). Metadata (also called intermediate data and relay data) is used to describe the attributes of data, and can track the changes of data in the using process.
The fingerprint is a value obtained by calculating a data block by adopting a hash algorithm, and the hash algorithm is a Message Digest V5 (MD 5) algorithm; alternatively, the Hash Algorithm is a Secure Hash Algorithm (Secure Hash Algorithm V1, SHA-1); of course, the hash algorithm may be other algorithms, and this embodiment does not limit this. The reference count is used to indicate the number of times the data block is referenced.
Referring to fig. 1, one partition includes n data block groups 10, and each data block group 10 includes a data area 101 and a data index area 102. The beginning of the data area 101 includes a fingerprint 1011 and a reference count 1012. Where the fingerprint 1011, reference count 1012 and the location of the data block are associated (same shaded representation). The data index area 102 is located before the data area 101, and the data index area 102 includes at least one pointer, such as: pointer 1021 in data index area 102 points to data in data block 1013 of data area 101 (the shading indicates a pointing relationship), and pointer 1022 in data index area 102 also points to data in data block 1013 of data area 101 (the shading indicates a pointing relationship). Wherein n is a positive integer.
It should be noted that the storage structures between the data block, the fingerprint, and the reference count are merely illustrative, and in actual implementation, the data block, the fingerprint, and the reference count may be stored according to other storage structures, which is not limited in this embodiment.
As can be seen from fig. 1, the greater the number of bits of storage space in the memory used to store the reference count, the greater the total storage space occupied by the reference count. Such as: the number of bits of storage space in each data chunk used to store the reference count is 32 bits, and at this time, the total storage space occupied by the reference count is 32 × n bits. If the number of bits of storage space for storing the reference count in each data block group is 1 bit, the total storage space occupied by the reference count is n bits.
Referring to fig. 2, a flowchart of a data deleting method according to an embodiment of the present application is shown. The data deleting method comprises the following steps:
in step 201, when an operation request of first data is acquired and second data which is identical to the first data is stored, a reference count of the second data is acquired.
The reference count is used to indicate a number of times the second data is referenced.
Optionally, the reference count is set to 1 when the second data is first written to the memory.
In this embodiment, the operation request is a write request or a delete request. The write request is for requesting that data be written in a specified data block of the memory. The delete request is for requesting deletion of data on a specified data block in memory.
Alternatively, the operation request may be generated by the data deleting device during operation; alternatively, the operation request may be sent to the data deleting device by another device and received by the data deleting device, such as: the data deleting device is connected with the control equipment, and the control equipment generates an operation request and sends the operation request to the data deleting device. Wherein the control devices include, but are not limited to: a mobile phone, a tablet computer, a wearable device, a smart home, a laptop portable computer, a desktop computer, and other terminals; or, a server.
In one example, when the operation request is a write request, if data carried in the write request includes at least one piece of first data, the data deleting device splits the data after receiving the write request to obtain corresponding first data on each data block, and calculates a fingerprint of each piece of first data through a hash algorithm; then, for the fingerprint of each piece of first data, the data deleting device searches whether the fingerprint identical to the fingerprint is stored; if the fingerprint identical to the fingerprint is stored, the second data which is repeated with the first data is stored, and the data deleting device acquires the reference count at the position associated with the position of the fingerprint; if the fingerprint identical to the fingerprint is not stored, it is indicated that second data overlapping the first data is not stored, the data deleting device writes the first data into the memory, and stores the fingerprint and the reference count of the first data in the memory.
Optionally, when the number of the first data is at least two, the data block sizes of the different first data may be the same or may also be different.
Referring to fig. 3, after acquiring a write request, the data deletion apparatus splits data 30 in the write request into 3 blocks of data 301, 302, and 303. The data deleting device calculates fingerprints of the data 301, 302 and 303 respectively by using a hash algorithm, and obtains a fingerprint 304 of the data 301, a fingerprint 305 of the data 302 and a fingerprint 306 of the data 303.
The data deleting device searches the memory for whether the fingerprint identical to the fingerprint 304 is stored, determines that the fingerprint identical to the fingerprint 304 is stored, and then indicates that the data overlapping with the data 301 is stored in the memory, and acquires the reference count corresponding to the fingerprint.
The data deleting device searches the memory for whether the fingerprint identical to the fingerprint 305 is stored, and determines that the fingerprint identical to the fingerprint 305 is not stored, which means that the data identical to the data 302 is not stored in the memory, and stores the data 302, the fingerprint 305 and the reference count of the data 302.
The data deleting device searches the memory for whether the fingerprint identical to the fingerprint 306 is stored, and determines that the fingerprint identical to the fingerprint 306 is stored, which indicates that the data overlapping with the data 303 is stored in the memory.
In another example, when the operation request is a deletion request, it is described that the same data as the first data indicated by the deletion request has been stored in the memory, and at this time, the data deletion apparatus directly acquires the reference count of the data.
Optionally, the deletion request carries data index information of the first data. And after receiving the deletion request, the data deletion device determines corresponding data according to the data index information in the deletion request, and acquires the reference count of the associated position according to the position of the data.
At step 202, it is determined whether the reference count has reached a count upper limit value.
Wherein the count upper limit value is determined according to the number of bits of the storage space in the memory for storing the reference count. The greater the number of bits of storage space in memory used to store the reference count, the greater the count upper limit value.
Taking the storage structure shown in fig. 1 as an example, if the number of bits of the storage space for storing the reference count in each data block group is 32 bits (i.e. 4 bytes), then the count upper limit value is 232The total memory space occupied by the reference count is 32 x n bits. If the number of bits of the memory space for storing the reference count in each data block group is 4 bits, the count upper limit value is 24The total memory space occupied by the reference count is 4 x n bits. It can be seen that, by reducing the upper limit of the reference count, the total memory space occupied by the reference count is decreased by multiple.
Optionally, in this application, in order to reduce the memory space occupied by the reference count, the number of bits of the memory space occupied by the reference count is smaller. Since the number of bits is small, the count upper limit value of the corresponding reference count is also small, such as: when the number of bits is 4, the upper limit value of the count is 2416; when the number of bits is 16, the upper limit value of the count is 216=65536。
Optionally, the number of bits of the storage space occupied by the reference count is set in the data deletion apparatus by a developer, and illustratively, the value range of the number of bits is greater than or equal to 1 bit and less than or equal to 32 bits; correspondingly, the value range of the upper limit value of the count is more than or equal to 2 and less than or equal to 232. Of course, the value range of the digit can also be set to other value ranges, such as: 4 bits or more and 16 bits or less; accordingly, the value range of the count upper limit value is not less than 16 and not more than 65536. The present embodiment does not limit the value range of the digits.
Optionally, the memory space occupied by the reference count does not change dynamically as the reference count increases, such as: the memory space occupied by the reference count is 32 bits, and even if the reference count is 1, the memory space is still 32 bits.
Optionally, the count upper limit value is less than a maximum number of references of the data. Such as: for a certain block data 1, the data 1 may be referred at most 1000 times (maximum number of reference times), and the count upper limit value is less than 1000.
After acquiring the reference count of the second data, the data deleting device detects whether the reference count is equal to a count upper limit value; if they are equal, it indicates whether the reference count has reached the upper limit of the count, and step 203 is executed; and when the reference count is not equal to the upper limit of the count, the reference count is updated according to the type of the operation request.
When the reference count does not reach the upper count limit, the data deleting device updates the reference count according to the type of the operation request, and the method comprises the following steps: when the operation request is a write request, adding 1 to the reference count of the second data; and when the operation request is a deletion request, subtracting 1 from the reference count of the second data.
Optionally, after the data deleting means decrements the reference count of the second data by 1, determining whether the reference count reaches 0; when the reference count reaches 0, the second data and the reference count are deleted.
Optionally, the data deleting means may also delete the fingerprint of the second data when the reference count reaches 0.
Step 303, when the reference count reaches the upper limit value of the count, the reference count is maintained as the upper limit value of the count.
If the reference count reaches the upper limit of the count, when the operation request is a write request, the reference count cannot be increased continuously due to the limitation of the storage space of the reference count.
If the reference count reaches the upper limit count value, when the operation request is a delete request, if the data deleting device decrements the reference count of the second data by 1, at this time, the reference count may not be equal to the actual number of times that the second data is actually referred to because the data deleting device receives the write request before receiving the delete request and the reference count keeps the upper limit count value unchanged after receiving the write request, thereby causing the data deleting device to recover the second data by mistake. Therefore, in order to improve the accuracy of recovering the data block by the data deleting device, when the operation request is a delete request, the reference count is not updated by the data deletion, and the reference count is maintained as the upper limit value of the count.
Such as: the data deleting device receives a write request for the data 1, the data 1 is the same as the stored data 2, at this time, the reference count of the data 2 is added by 1, and the updated reference count reaches the upper limit value of the count. The data deleting means receives a write request for data 3, the data 3 being the same as the stored data 2, and at this time, maintains the reference count as the upper limit value of the count. Then, the data deleting device receives a deletion request for the data 2, and at this time, the number of times that the data 2 is actually referred is a count upper limit value; if the reference count of the data 2 is subtracted by 1, the obtained reference count is equal to the count upper limit value-1; at this time, the number of times of reference is 1 less than the number of times of data 2 being actually referred to, and the resulting reference count is inaccurate. In order to avoid that the data deleting device erroneously recovers the data block corresponding to the data due to inaccurate reference times, the data deleting device still maintains the reference count as the upper limit of the count after receiving the deletion request for the data 2.
In summary, in the data deleting method provided in this embodiment, when the write request is obtained and the second data that is identical to the first data is already stored, the reference count of the second data is obtained; determining whether the reference count reaches a count upper limit value; when the reference count reaches the upper limit value of the count, maintaining the reference count as the upper limit value of the count; the problem that the memory space occupied by reference counting is large can be solved; since the reference count is no longer increased with the increase of the write request, the storage space occupied by the reference count can be maintained in a smaller range, and therefore, only a smaller storage space needs to be set for the reference count, and the effect of saving the storage space occupied by the reference count can be achieved.
In addition, when a deletion request for first data is acquired, and the first data and stored second data are acquired, a reference count of the second data is acquired; determining whether the reference count reaches a count upper limit value; when the reference count reaches the upper limit value of the count, maintaining the reference count as the upper limit value of the count; the problem of data recovery error caused by inaccurate reference counting can be solved; after the reference count reaches the upper limit of the count, the reference count is not reduced according to the deletion request, namely, the reference count is not reduced to 0, so that the data deletion device does not recover the data block corresponding to the data because the reference count reaches 0, the problem that the data block is recovered by mistake to cause the loss of the stored data is avoided, and the accuracy of recovering the data block can be ensured.
In addition, when the data deleting device caches the reference count in the form of metadata (for example, when the reference count is fully cached through a full flash memory system), the cache space occupied by the reference count can be reduced by setting a smaller bit number for the reference count, so that more metadata can be cached in the cache.
It should be added that, when the operation request is a deletion request, the data deletion apparatus deletes the pointer pointing to the second data in the data index area according to the data index information carried in the deletion request. Wherein the data index information is used to indicate a pointer to the second data.
Optionally, in order to ensure that the data block corresponding to the data whose reference count reaches the upper limit of the count can be recycled, the data deleting device scans whether a block address pointing to the second data exists; when there is no block address pointing to the second data, the second data and the reference count of the second data are deleted.
Optionally, the data deleting means also deletes the fingerprint of the second data when there is no block address pointing to the second data.
Optionally, the data deleting device performs full-disk scanning every preset time interval to determine whether a block address pointing to the second data exists; or, when the storage space is smaller than the preset threshold, full-disk scanning is performed to determine whether a block address pointing to the second data exists.
Optionally, the Block Address is a Logical Block Address (LBA).
Optionally, the block address is used to represent a pointer in the data index area.
Such as: if there is no longer a pointer in fig. 1 to the data in data block 1013 in data area 101, then the data deletion means determines that data block 1013 is no longer referenced and then reclaims the data block.
When the reference count of the second data reaches the upper limit of the count, the reference count of the second data does not change any more no matter whether the received operation request is a write request of the second data or a delete request of the second data, and therefore, if the data deleting device recovers the data block corresponding to the second data when the reference count is 0, the data deleting device may not recover the data block corresponding to the second data. In this embodiment, whether a block address pointing to the second data exists is scanned, and when the block address pointing to the second data does not exist, it is described that the number of times that the second data is actually referred to is 0, and at this time, the reference count of the second data and the second data is deleted, so that the problem that a data block corresponding to data whose reference count reaches the count upper limit value cannot be recovered is solved; whether the block address pointing to the second data does not exist through full disk scanning or not enables the data deleting device to determine whether the second data is actually referred or not, and recycling of the data block corresponding to the data with the reference count reaching the upper limit of the count is achieved.
In order to more clearly understand the data deleting method provided by the present application, the following describes the deduplication processes when the operation request is a write request and the operation request is a delete request respectively.
The deduplication process when the operation request is a write request refers to the embodiment shown in fig. 4; the deduplication process when the operation request is a delete request is described with reference to the embodiment shown in fig. 5.
Referring to fig. 4, a flowchart of a data deleting method according to an embodiment of the present application is shown. The data deleting method comprises the following steps:
step 401, a write request of first data is obtained.
Alternatively, the write request may be generated by the data deleting device during operation; alternatively, the write request may be sent to the data deleting device by another device and received by the data deleting device.
At step 402, it is determined whether there is second data stored that is identical to the first data.
The data deleting device calculates the fingerprint of the first data through a Hash algorithm; then, whether the fingerprint identical to the fingerprint is stored is searched; if the same fingerprint as the fingerprint is stored, it indicates that the second data overlapping with the first data is stored, and step 404 is executed; if the same fingerprint as the fingerprint is not stored, it is determined that the second data overlapping with the first data is not stored, and step 403 is executed.
Step 403, writing the first data into the memory, and storing the fingerprint and reference count of the first data in the memory, and ending the process.
At step 404, a reference count for the second data is obtained.
In step 405, it is determined whether the reference count of the second data reaches a count upper limit value.
Optionally, if the count upper limit value is reached, the reference count is not updated, the reference count is maintained as the count upper limit value, and the process is ended; if the count upper limit value is not reached, step 406 is performed.
Step 406, add 1 to the reference count of the second data to obtain an updated reference count, and store the updated reference count.
Referring to fig. 5, a flowchart of a data deleting method according to an embodiment of the present application is shown. The data deleting method comprises the following steps:
step 501, a deletion request of first data is obtained, wherein the deletion request includes data index information of the first data, and the data index information is used for indicating a pointer pointing to stored second data.
Alternatively, the deletion request may be generated by the data deletion apparatus during operation; alternatively, the deletion request may be sent to the data deletion apparatus by another device and received by the data deletion apparatus.
At step 502, a reference count for the second data is obtained.
At step 503, it is determined whether the reference count of the second data reaches the count upper limit value.
Optionally, if the count upper limit value is reached, the reference count is not updated, the reference count is maintained as the count upper limit value, and the process is ended; if the count upper limit value is not reached, step 504 is performed.
Step 504, subtracting 1 from the reference count of the second data, to obtain an updated reference count.
At step 505, it is determined whether the updated reference count is equal to 0.
Alternatively, if the updated reference count is 0, then step 506 is performed; if the updated reference count is not 0, step 507 is performed.
Step 506, delete the second data, the reference count of the second data, and the fingerprint of the second data, and the process ends.
Step 507, the updated reference count is stored.
Referring to fig. 6, a block diagram of a data deleting device according to an embodiment of the present application is shown. The device comprises the following units: a count acquisition unit 610, a count determination unit 620, and a count management unit 630.
A count obtaining unit 610, configured to obtain a reference count of second data when an operation request of the first data is obtained and the second data that overlaps with the first data is already stored; the operation request is a write request or a delete request, and the reference count is used for indicating the number of times that the second data is referred to;
a count determination unit 620 for determining whether the reference count reaches a count upper limit value;
a count management unit 630, configured to maintain the reference count as the count upper limit when the reference count reaches the count upper limit.
In summary, the data deleting apparatus provided in this embodiment obtains the reference count of the second data when the write request is obtained and the second data that is identical to the first data is already stored; determining whether the reference count reaches a count upper limit value; when the reference count reaches the upper limit value of the count, maintaining the reference count as the upper limit value of the count; the problem that the memory space occupied by reference counting is large can be solved; since the reference count is no longer increased with the increase of the write request, the storage space occupied by the reference count can be maintained in a smaller range, and therefore, only a smaller storage space needs to be set for the reference count, and the effect of saving the storage space occupied by the reference count can be achieved.
In addition, when a deletion request for first data is acquired, and the first data and stored second data are acquired, a reference count of the second data is acquired; determining whether the reference count reaches a count upper limit value; when the reference count reaches the upper limit value of the count, maintaining the reference count as the upper limit value of the count; the problem of data recovery error caused by inaccurate reference counting can be solved; after the reference count reaches the upper limit of the count, the reference count is not reduced according to the deletion request, namely, the reference count is not reduced to 0, so that the data deletion device does not recover the data block corresponding to the data because the reference count reaches 0, the problem that the data block is recovered by mistake to cause the loss of the stored data is avoided, and the accuracy of recovering the data block can be ensured.
Optionally, when the reference count reaches the count upper limit value, the apparatus further includes: a scanning unit and a data deleting unit.
A scanning unit for scanning whether a block address pointing to the second data exists;
and a data deleting unit configured to delete the second data and the reference count of the second data when there is no block address pointing to the second data.
When the reference count of the second data reaches the upper limit of the count, the reference count of the second data does not change any more no matter whether the received operation request is a write request of the second data or a delete request of the second data, and therefore, if the data deleting device recovers the data block corresponding to the second data when the reference count is 0, the data deleting device may not recover the data block corresponding to the second data. Whether a block address pointing to the second data exists or not is scanned, and when the block address pointing to the second data does not exist, the fact that the number of times that the second data is actually quoted is 0 is shown, at the moment, the quote count of the second data and the second data is deleted, so that the problem that a data block corresponding to the data with the quote count reaching the upper limit of the count cannot be recycled is solved; whether the block address pointing to the second data does not exist through full disk scanning or not enables the data deleting device to determine whether the second data is actually referred or not, and recycling of the data block corresponding to the data with the reference count reaching the upper limit of the count is achieved.
Optionally, the count upper limit value is determined according to a number of bits of a storage space in the memory for storing the reference count, and a value range of the number of bits is greater than or equal to 1 bit and less than or equal to 32 bits.
Optionally, the count management unit 630 is further configured to: and when the operation request is a write request and the reference count does not reach the upper limit value of the count, adding 1 to the reference count.
Optionally, the count management unit 630 is further configured to: and when the operation request is a deletion request and the reference count does not reach the upper limit value of the count, subtracting 1 from the reference count.
Optionally, the count determining unit 620 is further configured to determine whether the reference count reaches 0 after subtracting 1 from the reference count;
and the data deleting unit is further used for deleting the second data and the reference count when the reference count reaches 0.
For relevant content see the above method examples.
It should be noted that: in the above embodiment, when the device implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
An exemplary embodiment of the present application also provides a computer-readable storage medium in which one or more programs are stored, the one or more programs, when executed, being for implementing the above-described data deletion method.
An exemplary embodiment of the present application further provides a data deleting device, which includes the device provided in the embodiment shown in fig. 2 or the alternative embodiment provided based on the embodiment shown in fig. 2.
Please refer to fig. 7, which illustrates a schematic structural diagram of a data deleting apparatus according to an embodiment of the present application. For example, the data deleting device may be a server for implementing the functions of the above-described method examples. The data deleting device 700 may include: a processor 701, a memory 702, and a bus 703.
The memory 702 is coupled to the processor 701 by a bus 703.
The memory 702 is used for storing the program codes and data of the data deleting device 700, and the memory 702 may be the same as the memory for storing data by the data deleting device in the above embodiment; alternatively, it may be different.
The memory 702 stores one or more programs configured to be executed by the one or more processors 701, the one or more programs containing instructions for implementing the data deletion method described above.
Alternatively, the memory 702 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
It will be appreciated that fig. 7 merely illustrates a simplified design of data deletion apparatus 700. In practical applications, the data deleting device 700 may include any number of processors, memories, etc., and all data deleting devices that can implement the embodiments of the present application are within the scope of the embodiments of the present application.
The above description mainly introduces the scheme provided in the embodiment of the present application from the perspective of a data deleting device. It is understood that the data deleting device includes hardware structures and/or software modules corresponding to the respective functions in order to implement the functions. The various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present teachings.
The steps of a method or algorithm described in connection with the disclosure of the embodiments of the application may be embodied in hardware or in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in Random Access Memory (RAM), flash Memory, Read Only Memory (ROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable disk, a compact disc Read Only Memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Of course, the processor and the storage medium may reside as discrete components in a data deletion apparatus.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
In the present embodiments, the terms "first," "second," "third," and the like (if any) are used for distinguishing between types of objects and not necessarily for describing a particular sequential or chronological order, it being understood that the objects so used may be interchanged under appropriate circumstances such that embodiments of the present application may be practiced in other sequences than those illustrated or otherwise described herein.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (14)

1. A method for deleting data, the method comprising:
when an operation request of first data is acquired and second data which is repeated with the first data is stored, acquiring a reference count of the second data; the operation request is a write request or a delete request, and the reference count is used for indicating the number of times the second data is referred to;
determining whether the reference count reaches a count upper limit value, the count upper limit value being determined according to a number of bits of a storage space in a memory for storing the reference count;
when the reference count reaches the upper count value, maintaining the reference count as the upper count value.
2. The method of claim 1, wherein when the reference count reaches the count upper limit value, the method further comprises:
scanning whether a block address pointing to the second data exists;
deleting the second data and the reference count of the second data when there is no block address pointing to the second data.
3. The method of claim 1, wherein the number of bits ranges from 1 bit or more to 32 bits or less.
4. The method according to any one of claims 1 to 3, wherein, when the operation request is the write request, after the determining whether the reference count reaches a count upper limit value, the method further comprises:
and when the reference count does not reach the upper limit value of the count, adding 1 to the reference count.
5. The method according to any one of claims 1 to 3, wherein, after determining whether the reference count reaches a count upper limit value when the operation request is the delete request, the method further comprises:
and when the reference count does not reach the upper limit value of the count, subtracting 1 from the reference count.
6. The method of claim 5, wherein after subtracting 1 from the reference count, further comprising:
determining whether the reference count reaches 0;
deleting the second data and the reference count when the reference count reaches 0.
7. An apparatus for deleting data, the apparatus comprising:
the device comprises a count acquisition unit, a counting unit and a counting unit, wherein the count acquisition unit is used for acquiring a reference count of second data when an operation request of first data is acquired and the second data which is repeated with the first data is stored; the operation request is a write request or a delete request, and the reference count is used for indicating the number of times the second data is referred to;
a count determination unit configured to determine whether the reference count reaches a count upper limit value that is determined according to a number of bits of a storage space in a memory for storing the reference count;
and the counting management unit is used for maintaining the reference counting as the counting upper limit value when the reference counting reaches the counting upper limit value.
8. The apparatus of claim 7, wherein when the reference count reaches the upper count value, the apparatus further comprises:
a scanning unit for scanning whether a block address pointing to the second data exists;
a data deleting unit configured to delete the second data and the reference count of the second data when there is no block address pointing to the second data.
9. The apparatus of claim 7, wherein the number of bits ranges from 1 bit or more to 32 bits or less.
10. The apparatus according to any one of claims 7 to 9, wherein the count management unit is further configured to:
and when the operation request is the write request and the reference count does not reach the upper limit value of the count, adding 1 to the reference count.
11. The apparatus according to any one of claims 7 to 9, wherein the count management unit is further configured to:
and when the operation request is the deletion request and the reference count does not reach the upper limit value of the count, subtracting 1 from the reference count.
12. The apparatus of claim 11,
the count determination unit is further configured to determine whether the reference count reaches 0 after subtracting 1 from the reference count;
the data deleting unit is further configured to delete the second data and the reference count when the reference count reaches 0.
13. A data deleting apparatus, characterized in that the data deleting apparatus comprises: one or more processors; and a memory;
the memory stores one or more programs configured for execution by the one or more processors, the one or more programs including instructions for implementing the data deletion method of any of claims 1 to 6.
14. A computer-readable storage medium storing one or more programs which, when executed by a processor, implement the data deletion method according to any one of claims 1 to 6.
CN201711137647.9A 2017-11-16 2017-11-16 Data deleting method and device Active CN108121504B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711137647.9A CN108121504B (en) 2017-11-16 2017-11-16 Data deleting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711137647.9A CN108121504B (en) 2017-11-16 2017-11-16 Data deleting method and device

Publications (2)

Publication Number Publication Date
CN108121504A CN108121504A (en) 2018-06-05
CN108121504B true CN108121504B (en) 2021-01-29

Family

ID=62227745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711137647.9A Active CN108121504B (en) 2017-11-16 2017-11-16 Data deleting method and device

Country Status (1)

Country Link
CN (1) CN108121504B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113721836A (en) * 2021-06-15 2021-11-30 荣耀终端有限公司 Data deduplication method and device
CN113885785B (en) * 2021-06-15 2022-07-26 荣耀终端有限公司 Data deduplication method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239518A (en) * 2014-09-17 2014-12-24 华为技术有限公司 Repeated data deleting method and device
WO2016200403A1 (en) * 2015-06-12 2016-12-15 Hewlett Packard Enterprise Development Lp Disk storage allocation
CN106257402A (en) * 2015-06-19 2016-12-28 Hgst荷兰公司 The equipment detected for the single pass entropy transmitted for data and method
CN106886370A (en) * 2017-01-24 2017-06-23 华中科技大学 A kind of data safety delet method and system based on SSD duplicate removal technologies

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239518A (en) * 2014-09-17 2014-12-24 华为技术有限公司 Repeated data deleting method and device
WO2016200403A1 (en) * 2015-06-12 2016-12-15 Hewlett Packard Enterprise Development Lp Disk storage allocation
CN106257402A (en) * 2015-06-19 2016-12-28 Hgst荷兰公司 The equipment detected for the single pass entropy transmitted for data and method
CN106886370A (en) * 2017-01-24 2017-06-23 华中科技大学 A kind of data safety delet method and system based on SSD duplicate removal technologies

Also Published As

Publication number Publication date
CN108121504A (en) 2018-06-05

Similar Documents

Publication Publication Date Title
CN107209714B (en) Distributed storage system and control method of distributed storage system
US10019364B2 (en) Access-based eviction of blocks from solid state drive cache memory
KR101137299B1 (en) Hierarchical storage management for a file system providing snapshots
CN110018998B (en) File management method and system, electronic equipment and storage medium
US20170083412A1 (en) System and method for generating backups of a protected system from a recovery system
US8386717B1 (en) Method and apparatus to free up cache memory space with a pseudo least recently used scheme
CN111381779B (en) Data processing method, device, equipment and storage medium
JP6147933B2 (en) Controller, flash memory device, method for identifying data block stability, and method for storing data in flash memory device
JP6526235B2 (en) Data check method and storage system
CN107329704B (en) Cache mirroring method and controller
CN105917303B (en) Controller, method for identifying stability of data block and storage system
CN109522154B (en) Data recovery method and related equipment and system
CN110998537B (en) Expired backup processing method and backup server
KR20140042430A (en) Computing system and data management method thereof
CN110908589B (en) Data file processing method, device, system and storage medium
CN108475230B (en) Storage system and system garbage recycling method
CN115004147A (en) Using de-duplicated main storage
CN109918352B (en) Memory system and method of storing data
US10430292B2 (en) Snapshot deletion in a distributed storage system
CN110557964A (en) Data writing method, client server and system
CN107798063B (en) Snapshot processing method and snapshot processing device
CN108121504B (en) Data deleting method and device
CN104965835A (en) Method and apparatus for reading and writing files of a distributed file system
CN115857801A (en) Data migration method and device, electronic equipment and storage medium
CN115840731A (en) File processing method, computing device and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant