CN111125033A - Space recovery method and system based on full flash memory array - Google Patents

Space recovery method and system based on full flash memory array Download PDF

Info

Publication number
CN111125033A
CN111125033A CN201811289335.4A CN201811289335A CN111125033A CN 111125033 A CN111125033 A CN 111125033A CN 201811289335 A CN201811289335 A CN 201811289335A CN 111125033 A CN111125033 A CN 111125033A
Authority
CN
China
Prior art keywords
data block
data
storage unit
fingerprint
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811289335.4A
Other languages
Chinese (zh)
Other versions
CN111125033B (en
Inventor
夏文
古亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN201811289335.4A priority Critical patent/CN111125033B/en
Publication of CN111125033A publication Critical patent/CN111125033A/en
Application granted granted Critical
Publication of CN111125033B publication Critical patent/CN111125033B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a space recovery method and a space recovery system based on a full flash memory array, which are used for improving the space recovery efficiency. The method in the embodiment of the application comprises the following steps: acquiring compressed data in a performance layer; dividing the compressed data into first data blocks and calculating hash values of the first data blocks; matching the hash value with a duplicate removal fingerprint library to determine whether a matching fingerprint exists; if not, compressing the first data block, writing the compressed first data block back to the capacity layer in a log additional writing mode by taking the preset length as a storage unit, updating the fingerprint of the first data block into a duplicate removal fingerprint library, and updating the metadata information of the first data block to a metadata area of the capacity layer; constructing a data bit chart, wherein the data bit chart is used for recording the space occupation state corresponding to each or a plurality of storage units; and scanning the data bit diagram, acquiring the space occupation state of each storage unit, and performing space recovery according to the space occupation state of each storage unit.

Description

Space recovery method and system based on full flash memory array
Technical Field
The present application relates to the field of data storage technologies, and in particular, to a space recycling method and system based on a full flash memory array.
Background
Generally, in order to save the storage space of data, when a file is stored, data in the file is decompressed to reduce the occupied space of the data.
The duplicate removal is to uniquely identify a data block by calculating a secure hash digest (such as a SHA1 fingerprint) of the data block, so that character-by-character matching of data is avoided, the storage system can quickly and conveniently identify duplicate data only by simply maintaining an index table of the secure hash digest, and the purpose of saving storage space can be achieved only by recording corresponding data pointer information for the duplicate data content.
The data compression is also a redundant data elimination technology, and the redundant data information is eliminated mainly through a coding mode, namely on the premise that the original data information is not lost, the original content is converted, and a repeated byte sequence is represented by codes with fewer bytes, so that the aims of eliminating partial redundant data and finally saving storage space are fulfilled.
For the existing compression deduplication technology, when the file data is updated (deleted or changed), that is, in the address space storing the original compressed data, the original compressed data is deleted, or the original compressed data is updated, for example: the storage space of a compressed non-repeated data block in an original file is assumed to be 2K, when the file data is deleted, 2K space fragments can appear, and when the file data is changed, if the data length of the compressed non-repeated data block is 1K after updating, 1K space fragments can appear, when the space fragments are recovered at a later stage, the physical storage addresses of the space fragments cannot be rapidly obtained, so that the space fragments cannot be rapidly directionally recovered, and the efficiency of space recovery is reduced.
Disclosure of Invention
The embodiment of the application provides a space recovery method based on a full flash memory array, which is used for improving the efficiency of space recovery.
A first aspect of the embodiments of the present application provides a space reclamation method based on a full flash memory array, including:
acquiring compressed data in the performance layer;
dividing the compressed data into first data blocks, and calculating hash values of the first data blocks;
matching the hash value of the first data chunk with a duplicate removal fingerprint library in the capacity layer to determine whether a matching fingerprint exists;
if the matched fingerprint does not exist, determining that the first data block is a non-repeated data block, compressing the first data block, writing the compressed first data block back to the capacity layer in a log additional writing mode by taking a preset length as a storage unit, and updating the fingerprint of the first data block into the duplicate removal fingerprint library, wherein the log additional writing is used as a different-place updating mode for improving the IO performance of the capacity layer;
constructing a data bitmap table, wherein the data bitmap table is used for recording the space occupation state corresponding to each storage unit or a plurality of storage units;
and scanning the data bit diagram, acquiring the space occupation state of each storage unit, and performing space recovery according to the space occupation state of each storage unit.
Preferably, the space occupation state of the storage unit includes a first state and a second state, the first state is invalid occupation, and the second state is valid occupation;
when the first data block in the storage unit is changed or deleted, the space occupation state of the corresponding storage unit is recorded in the data bit diagram as the first state, otherwise, the space occupation state of the corresponding storage unit is recorded in the data bit diagram as the second state.
Preferably, the scanning the data bit map to obtain the space occupation state of each storage unit, and performing space recovery according to the space occupation state includes:
scanning the data bit diagram, and determining that the current space occupation state of each storage unit is the first state or the second state;
and when the current space occupation state of the storage unit is the first state, recovering the space occupied by the storage unit.
Preferably, the method further comprises:
performing count management on the number of references of the fingerprints in the duplicate fingerprint library;
the performing count management on the number of references of the fingerprints in the duplicate fingerprint library comprises:
when the matched fingerprint of the first data block exists in the duplicate removal fingerprint library, performing incremental operation on the reference times of the matched fingerprint;
and the combination of (a) and (b),
and when the first data block which refers to the matched fingerprint in the duplicate fingerprint database is updated, performing decreasing operation on the reference times of the matched fingerprint.
Preferably, the method further comprises:
and when the reference times of the first fingerprint in the duplicate fingerprint removing library is zero, the space occupation state of the storage unit corresponding to the first data block for referencing the first fingerprint in the data bit diagram is the first state.
Preferably, the method further comprises:
if the matching fingerprint exists, determining that the first data block is repeated data, and writing back metadata information of the first data block to a metadata area of the capacity layer, wherein the metadata information comprises a corresponding relation among a logical address of the first data block in the compressed data, the matching fingerprint and a physical address of the matching fingerprint.
Preferably, after writing the compressed first data block back to the capacity layer in a manner of journal append writing, the method further includes:
updating metadata information of the first data block to a file metadata area of the capacity layer or the de-duplication fingerprint database, wherein the metadata information includes: and the compressed physical storage address of the first data block and the compressed length of the first data block are used for decompressing the first data block according to the metadata information at a later stage.
In a first aspect, an embodiment of the present application provides a space reclamation system based on a full flash memory array, where the full flash memory array includes a performance layer and a capacity layer, and the system includes:
an acquisition unit configured to acquire compressed data in the performance layer;
a segmentation calculation unit configured to segment the compressed data into first data blocks, and calculate hash values of the first data blocks;
a matching unit, configured to match the hash value of the first data chunk with a duplicate removal fingerprint library in the capacity layer to determine whether a matching fingerprint exists;
the compression unit is used for determining that the first data block is a non-repeated data block when the matched fingerprint does not exist, compressing the first data block, writing the compressed first data block back to the capacity layer in a log additional writing mode by taking a preset length as a storage unit, and updating the fingerprint of the first data block into the duplicate removal fingerprint library, wherein the log additional writing is used as an allopatric updating mode and is used for improving the IO performance of the capacity layer;
the data bitmap table is used for recording the space occupation state corresponding to each storage unit or a plurality of storage units;
and the scanning unit is used for scanning the data bit diagram, acquiring the space occupation state of each storage unit and carrying out space recovery according to the space occupation state of each storage unit.
Preferably, the space occupation state of the storage unit includes a first state and a second state, the first state is invalid occupation, and the second state is valid occupation;
when the first data block in the storage unit is changed or deleted, the space occupation state of the corresponding storage unit is recorded in the data bit diagram as the first state, otherwise, the space occupation state of the corresponding storage unit is recorded in the data bit diagram as the second state.
Preferably, the scanning unit includes:
a scanning module, configured to scan the data bit map, and determine that a current space occupation state of each storage unit is the first state or the second state;
and the recovery module is used for recovering the space occupied by the storage unit when the current space occupied state of the storage unit is the first state.
Preferably, the system further comprises:
a counting unit for performing counting management on the number of references of the fingerprints in the duplicate fingerprint library;
the counting unit includes:
the first counting module is used for executing incremental operation on the reference times of the matched fingerprint when the matched fingerprint of the first data block exists in the duplicate removal fingerprint library;
and the combination of (a) and (b),
and the second counting module is used for performing decreasing operation on the reference times of the matched fingerprint when the first data block which refers to the matched fingerprint in the de-duplication fingerprint library is updated.
Preferably, when the number of times of reference of the first fingerprint in the deduplication fingerprint library is zero, the space occupation state of the storage unit corresponding to the first data block that references the first fingerprint in the data bit map is the first state.
Preferably, the system further comprises:
and the deduplication unit is used for determining that the first data block is duplicated data when the matching fingerprint exists, and writing back metadata information of the first data block to a metadata area of the capacity layer, wherein the metadata information includes a corresponding relation between a logical address of the first data block in the compressed data, the matching fingerprint and a physical address of the matching fingerprint.
Preferably, the system further comprises:
an updating unit, configured to update metadata information of the first data block into a file metadata area of the capacity layer or the deduplication fingerprint library, where the metadata information includes: and the compressed physical storage address of the first data block and the compressed length of the first data block are used for decompressing the first data block according to the metadata information at a later stage.
The application further provides a space recycling system based on the full flash memory array, which includes a processor, and the processor is used for implementing the space recycling method based on the full flash memory array provided by the first aspect of the embodiment of the application when executing the computer program stored in the memory.
The present application further provides a readable storage medium, on which a computer program is stored, where the computer program is used to implement the space reclamation method based on a full flash memory array provided in the first aspect of the embodiments of the present application when the computer program is executed by a processor.
According to the technical scheme, the embodiment of the application has the following advantages:
in the embodiment of the application, after the non-repeated data blocks are compressed, the non-repeated data blocks are stored in the capacity layer in a storage unit with a preset length in a log additional writing mode, a data bit chart is constructed, for correspondingly recording the space occupation state of each storage unit, so that when the first data block in the storage unit is updated, the data bit map corresponding to the memory cell is changed from the second state to the first state, so that upon recovery in the back space, that is, the space occupation state of each storage unit in the capacity layer can be obtained by scanning the data bit diagram, and when the space occupation state of the storage unit is the first state, the space occupied by the storage unit is quickly and directionally recovered, the data volume stored by the data bitmap table is small, the rapid scanning of the data bit diagram realizes the rapid directional recovery of the space debris in the capacity layer.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a space reclamation method based on a full flash memory array according to the embodiment of the present application;
FIG. 2 is a schematic diagram of a physical architecture of a full flash memory array according to an embodiment of the present application;
FIG. 3A is a schematic diagram illustrating that the compressed first data block is updated in a different place in a manner of additional writing in a log in the embodiment of the present application;
FIG. 3B is a diagram of a data bit map in an embodiment of the present application;
FIG. 4 is a schematic diagram of another embodiment of a space reclamation method based on a full flash memory array according to an embodiment of the present application;
FIG. 5A is a diagram illustrating logical addresses and physical addresses before and after data deduplication in an embodiment of the present application;
fig. 5B is a schematic diagram of a data logical organization relationship of metadata information in a metadata area of a capacity layer in the embodiment of the present application.
FIG. 6 is a schematic diagram of another embodiment of a space reclamation method based on a full flash memory array according to an embodiment of the present application;
FIG. 7 is a schematic diagram of an embodiment of a space reclamation system based on a full flash memory array in an embodiment of the present application;
fig. 8 is a schematic diagram of another embodiment of a space reclamation system based on a full flash memory array in the embodiment of the present application.
Detailed Description
The embodiment of the application provides a space recovery method and a space recovery system based on a full flash memory array, which are used for improving the space recovery efficiency.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Generally, in order to save the storage space of data, when a file is stored, data in the file is decompressed to reduce the occupied space of the data.
The duplicate removal is to uniquely identify a data block by calculating a secure hash digest (such as a SHA1 fingerprint) of the data block, so that character-by-character matching of data is avoided, the storage system can quickly and conveniently identify duplicate data only by simply maintaining an index table of the secure hash digest, and the purpose of saving storage space can be achieved only by recording corresponding data pointer information for the duplicate data content.
The data compression is also a redundant data elimination technology, and the redundant data information is eliminated mainly through a coding mode, namely on the premise that the original data information is not lost, the original content is converted, and a repeated byte sequence is represented by codes with fewer bytes, so that the aims of eliminating partial redundant data and finally saving storage space are fulfilled.
For the existing compression deduplication technology, the hash value of the compressed data block is generally compared with the fingerprint in the deduplication fingerprint database, and if the matched fingerprint exists, the compressed data block is determined to be the duplicate data, and only the pointer information of the duplicate data needs to be recorded; if there is no matched fingerprint, determining that the compressed data block is a non-duplicated data block, compressing the compressed data block and then storing the compressed data block to achieve the purpose of saving storage space, and this compression storage manner, when the file data is updated (deleted or changed), generally performs in-place update on the data, that is, in an address space storing the original compressed data, deleting the original compressed data, or updating the original compressed data, such as: the storage space of a compressed non-repeated data block in an original file is assumed to be 2K, when the file data is deleted, 2K space fragments can appear, and when the file data is changed, if the data length of the compressed non-repeated data block is 1K after updating, 1K space fragments can appear, when the space fragments are recovered at a later stage, the physical storage addresses of the space fragments cannot be rapidly obtained, so that the space fragments cannot be rapidly directionally recovered, and the efficiency of space recovery is reduced.
Based on the problem, the application provides a space recovery method based on a full flash memory array, which is used for improving the efficiency of space recovery.
For convenience of understanding, the following describes a space reclamation method for a full flash memory array in the present application, and referring to fig. 1, an embodiment of the space reclamation method based on the full flash memory array in the present application includes:
101. acquiring compressed data in the performance layer;
generally, for a device with a processor, IO performance of a storage system is a main factor affecting performance of the device system, and when an external memory of the device is deployed as a full flash memory array, a physical architecture of the full flash memory array is divided into a capacity layer and a performance layer, where the capacity layer refers to an SSD solid state disk with a slow IO response or a normal hard disk, and the performance layer refers to an SSD solid state disk with a fast IO response, specifically refer to the physical architecture of the full flash memory array shown in fig. 2, where the performance layer is also called a write cache and the capacity layer is also called a read cache, and how to avoid that, in a process of writing back data of the performance layer to the capacity layer, when data is updated, a problem of quickly performing directional recovery on occurring space debris is a technical problem to be solved by the present application.
When the data in the performance layer is written back to the capacity layer, the written-back data is decompressed, and before decompression, compressed data in the performance layer needs to be obtained, where the compressed data may be various file data or message data in application software, and is not limited specifically here.
102. Segmenting the compressed data into first data blocks with preset lengths, and calculating hash values of the first data blocks;
in the process of compressing data, the data is generally segmented into a first data block with a preset length, wherein the segmentation granularity of the data block can be 2K, 4K, 8K or other sizes, and after the segmentation is completed or when the segmentation is performed, the hash value of the first data block is calculated.
Specifically, the hash algorithm transforms an input of arbitrary length (also called a pre-map) into an output of fixed length, which is a hash value, by the hash algorithm. All hash functions have the basic property that if two hash values are not identical (according to the same function) then the original inputs of the two hash values are not identical. This property is the result of the hash function being deterministic. The hash value of each data block can generally be considered as a fingerprint of that data block.
It should be noted that, in the process of segmenting the data block, the preset length may be segmented according to the actual requirement of the specific application, and is not limited specifically here.
103. And matching the hash value of the first data block with a duplicate removal fingerprint library in the capacity layer to determine whether a matched fingerprint exists, if not, executing step 104, and if so, executing step 107.
After the hash value of the first data block, that is, the fingerprint of the first data block, is obtained, the first data block may be matched with a deduplication fingerprint library pre-stored in the capacity layer to determine whether a matching fingerprint exists in the deduplication fingerprint library, if not, step 104 is executed, and if so, step 107 is executed.
104. Determining that a first data block is a non-duplicated data block, compressing the first data block, writing the compressed first data block back to the capacity layer in a log additional writing mode by taking a preset length as a storage unit, updating the fingerprint of the first data block into the de-duplication fingerprint library, and updating the metadata information of the first data block into a file metadata area in the capacity layer, wherein the metadata information includes: the physical storage address of the compressed first data block and the length of the compressed first data block;
when the fingerprint corresponding to the first data block does not exist in the deduplication fingerprint library, it is indicated that the first data block is a non-duplicate data block, and the first data block is compressed, where a preferred compression algorithm is an LZ4 compression algorithm. After the compression is completed, the compressed first data block is written back to the capacity layer in a way of additionally writing a log by taking a preset length as a storage unit, the hash value (fingerprint) of the first data block is updated to a duplicate removal fingerprint library, and the metadata information of the first data block is updated to a file metadata area, wherein the metadata information comprises: the physical storage address of the first data block after compression and the length of the first data block after compression are convenient for processing the first data block as repeated data when the first data block appears next time, and decompression recovery can be performed on the first data block according to metadata information of the first data block at a later stage.
It should be noted that the hash value (fingerprint) of the first data block and the metadata information of the first data block may also be updated to the duplicate removal fingerprint library at the same time, as long as decompression recovery can be performed on the first data block according to the metadata information of the first data block at a later stage, and no specific limitation is imposed on an update address of the metadata information of the first data block.
Specifically, if it is assumed that the storage space of one compressed non-repeated data block in the original file is 6K, the 6K is written back to the capacity layer by using a preset length (e.g., 1K) as a storage unit, so that the compressed non-repeated data block of 6K occupies 6 storage units of 1K when stored, it should be noted that the storage unit of the preset length may also be 2K, so that the compressed non-repeated data block of 6K occupies 3 storage units of 2K when stored, and the preset length of the storage unit is not specifically limited herein.
Because the log additional writing is performed according to the time sequence, when the file corresponding to the first data block is updated, the new data block in the corresponding file is compressed and then the additional writing is performed according to the time sequence, namely the new data block is stored in a new storage space address (a new storage unit) in the storage medium, namely the additional writing is performed in a different place, but not the storage address corresponding to the original first data block, so that the problem that the compressed length of the new data block is not matched with the storage space of the original first data block after the file data is updated is avoided, the waste of the storage space in the storage medium is avoided, smaller space fragments generated in the storage medium are avoided, the utilization rate of the storage space in the storage medium is improved, in addition, the different place updating only needs to perform the writing operation, but the in-place updating needs to perform the reading operation first, and then, the write operation is executed, so that the IO performance of the capacity layer is further improved by a remote updating mode of adding and writing the log.
Furthermore, because the minimum write unit of the SSD disk is 4K, and when a minimum write unit is not full, and if it is required to perform a write operation in the minimum write unit next time, according to the characteristics of the SSD disk that is erased and written, it is required to read the pre-stored data in the minimum write unit, then erase the pre-stored data, and then rewrite the new data that needs to be newly written and the read stored data, the present application performs a remote update on the file data by way of log additional write, it is also possible to store the compressed data into the log storage unit with the preset length as the storage unit, and write back the log storage unit to the capacity layer after the log storage unit is full, wherein the storage space of the log storage unit is an integer multiple of the minimum write unit of the capacity layer, i.e. an integer multiple of 4K, such as 8K, 12K or 16K, therefore, the method not only adapts to the characteristic that the minimum writing unit of the SSD is 4K, but also avoids the problem that random small writing (namely the length of the written data is less than the minimum writing unit of 4K) is generated in the SSD, namely the problem of waste of storage space in the storage medium is further avoided.
It is easy to understand that the storage unit is a space unit much smaller than the log storage unit, and the storage unit is mainly used for constructing the data bitmap in step 105, the smaller the storage unit is, the easier the space occupation state of the data in each storage unit is to be identified, but the smaller the storage unit is, the larger the data amount in the data bitmap is also correspondingly caused, so the size of the storage unit can be specifically set according to the configuration of the processor in practical application, and is not limited specifically here. For example, the preset length of the memory unit may also be 2K, 3K or other values, but the preset length of the memory unit is generally smaller than the minimum write unit (4K) of the capacity layer, and the preset length of the memory unit is not particularly limited herein.
105. Constructing a data bitmap table, wherein the data bitmap table is used for recording the space occupation state corresponding to each storage unit or a plurality of storage units;
after the compressed first data block is stored in a storage unit with a preset length, a data bitmap table may be further constructed, where the data bitmap table is used to record a space occupation state corresponding to each storage unit (e.g., 1K) or multiple storage units, the space occupation state of each storage unit includes a first state and a second state, the first state indicates that the space occupation state is invalid, that is, data stored in the storage unit is invalid data, and the second state indicates that the space occupation state is valid, that is, data in the storage unit is valid data.
When the compressed first data block is written back to the capacity layer in a manner of additional writing in a log with a preset length as a storage unit, if the first data block in a certain storage unit is changed or deleted, the space occupation state of the corresponding storage unit recorded in the data bit map is changed from the second state to the first state, otherwise, the space occupation state of the corresponding storage unit recorded in the data bit map is still the second state, specifically, for convenience of operation, the space occupation state of the storage unit can be represented as the first state by a number "0" and the space occupation state of the storage unit can be represented as the second state by a number "1" in the data bit map.
When the first data block is updated (deleted or changed), the compressed first data block is updated in a remote way in a way of additionally writing in a log, that is, the compressed first data block is stored in a new storage address (storage unit), and the data of the original storage address of the first data block becomes invalid data, so that when the first data block in the storage unit is changed or deleted, the space occupation states recorded in the data bit diagram corresponding to the storage unit are all in a first state, and when the first data block normally exists (is not updated), the space occupation state recorded in the data bit diagram corresponding to the storage unit is in a second state. Fig. 3A is a schematic diagram of performing remote update on a compressed first data block in a manner of additionally writing a log in the embodiment of the present application, and fig. 3A also shows a correspondence relationship between a logical address of the data block, a matching fingerprint, and a physical address corresponding to the matching fingerprint; FIG. 3B is a diagram of a data bit map in an embodiment of the present application.
106. Scanning the data bit diagram, acquiring the space occupation state of each storage unit, and performing space recovery according to the space occupation state of each storage unit;
in order to clarify invalid data in the capacity layer, the space occupation state of each storage unit can be quickly acquired by scanning the data bit chart, and space recovery is performed according to the space occupation state of each storage unit.
Specifically, the data bitmap table correspondingly records the space occupation state of each storage unit, when the first data block stored in the storage unit is updated, the space occupation state of the storage unit is invalid occupation, that is, the data stored in the storage unit is invalid data, and the space occupation state of the storage unit in the corresponding data bitmap table is changed from the second state to the first state, so that the physical address of the storage unit in the first state in the data bitmap table can be acquired by scanning the data bitmap table, and directional space recovery is performed on the physical address of the storage unit in the first state, that is, the invalid data in the storage unit is deleted.
107. Other processes are performed.
When the first data block matches with the fingerprint in the deduplication fingerprint library, other procedures are executed, and no specific limitation is made here.
In the embodiment of the application, after the non-repeated data blocks are compressed, the non-repeated data blocks are stored in the capacity layer in a storage unit with a preset length in a log additional writing mode, a data bit chart is constructed, for correspondingly recording the space occupation state of each storage unit, so that when the first data block in the storage unit is updated, the data bit map corresponding to the memory cell is changed from the second state to the first state, so that upon recovery in the back space, that is, the space occupation state of each storage unit in the capacity layer can be obtained by scanning the data bit diagram, and when the space occupation state of the storage unit is the first state, the space occupied by the storage unit is quickly and directionally recovered, the data volume stored by the data bitmap table is small, the rapid scanning of the data bit diagram realizes the rapid directional recovery of the space debris in the capacity layer.
Referring to fig. 4, continuing to describe the case when the fingerprint of the first data chunk matches the fingerprint in the de-duplication fingerprint library in the capacity layer based on the embodiment described in fig. 1, another embodiment of the space reclamation method in the embodiment of the present application includes:
108. if the matching fingerprint exists, determining that a first data block is repeated data, and writing back metadata information of the first data block to a metadata area of the capacity layer, wherein the metadata information comprises a corresponding relation among a logical address of the first data block in the compressed data, the matching fingerprint and a physical address of the matching fingerprint;
and when the fingerprint of the first data block is matched with the fingerprint in the duplicate removal fingerprint database, determining that the first data block is duplicate data, and writing the metadata information of the first data block back to the metadata area of the capacity layer so as to restore the first data block according to the metadata information of the first data block in the data decompression process.
Specifically, the metadata information of the first data block includes a corresponding relationship among a logical address, a matching fingerprint, and a physical address of the matching fingerprint of the first data block in the compressed data, so as to decompress and restore the first data block at a later stage. Specifically, the logical address of the first data block in the compressed data refers to the logical order of the first data block in the compressed data (as in fig. 5A, data block B5 is the first data block in file 1), and the physical address of the matching fingerprint refers to the specific physical storage address of the matching fingerprint in the capacity layer, so as to perform decompression recovery on the first data block according to the physical address at a later stage. Fig. 5A is a schematic diagram of logical addresses and physical addresses before and after data deduplication in the embodiment of the present application; fig. 5B is a schematic diagram of a data logical organization relationship of metadata information in a metadata area of a capacity layer in an embodiment of the present application, and in the data logical organization relationship diagram, it is easily understood that a plurality of data blocks may correspond to the same fingerprint, that is, a plurality of (N) logical addresses correspond to the same fingerprint, and one fingerprint corresponds to only one physical storage address of the fingerprint, so that decompression recovery is performed on the data block according to physical storage of the data block corresponding to the fingerprint at a later stage.
109. Count management is performed on the number of references to fingerprints in the deduplication fingerprint library.
In order to clarify space fragment information in the capacity layer, that is, space fragments generated after storage data in an original storage space in the capacity layer is deleted, counting management may be performed on the number of references of fingerprints in the deduplication fingerprint library, and specifically, counting management may be performed through the following two aspects:
firstly, when the matched fingerprint of the first data block exists in the duplicate removal fingerprint library, performing incremental operation on the reference times of the matched fingerprint;
if the first data block in the compressed data has a matching fingerprint in the duplicate fingerprint removal library, then performing a growing operation, preferably an accumulating operation, on the number of references of the matching fingerprint, that is, when the first data block has a matching fingerprint in the duplicate fingerprint removal library, performing a "+ 1" operation on the number of references of the matching fingerprint, of course, the growing operation may also be a multiplication operation or a hybrid operation, as long as it is a positive correlation operation, and this is not limited specifically here.
And secondly, when the first data block which refers to the matched fingerprint in the duplicate removal fingerprint database is updated, performing decreasing operation on the reference times of the matched fingerprint.
Specifically, if the file data corresponding to a certain matching fingerprint in the duplicate removal fingerprint library is deleted or changed, a decrement operation, preferably a subtraction operation, is performed on the number of references of the matching fingerprint, that is, when the first data block corresponding to the matching fingerprint is deleted or updated, a "-1" operation is performed on the number of references of the matching fingerprint, and of course, the decrement operation may also be a division operation or a hybrid operation, as long as it is a negative correlation operation, and no specific limitation is imposed here.
In this way, the reference condition of each fingerprint can be clarified through the management of the reference times of the fingerprints in the duplicate fingerprint removal library, when the reference times of the first fingerprint in the duplicate fingerprint removal library is 0, the space occupation state of the storage unit corresponding to the first data block referencing the first fingerprint is a first state in the corresponding data bit diagram, that is, when the reference times of the first fingerprint is 0, the space occupation state of the storage unit corresponding to the first data block referencing the first fingerprint in the corresponding data bit diagram is invalid occupation, and the space occupied by the storage unit can be directionally recovered, that is, invalid data in the space occupied by the storage unit is deleted.
In the embodiment of the application, counting management can be further performed on the number of times of reference of the fingerprints in the duplicate removal fingerprint library, so that when the number of times of reference of the first fingerprint in the duplicate removal fingerprint is 0, the space occupation state of the storage unit corresponding to the first data block referencing the first fingerprint is invalid occupation in the corresponding data bit diagram, and rapid directional recovery of the space occupied by the storage unit is realized.
Based on step 106 of the embodiment shown in fig. 1, when the compressed first data block is written back to the log storage unit in a storage unit with a preset length, the following steps may be further performed to achieve automatic recovery of invalid data in the log storage unit, specifically referring to fig. 6, another embodiment of the space recovery method based on the full flash memory array in this embodiment includes:
1061. scanning the data bit diagram, acquiring the number of storage units in a first state in each log storage unit, and judging whether the space occupied by the storage units in the first state is greater than a preset space occupation threshold, if so, executing a step 1062, and if not, executing a step 1063;
in the process of updating the data block, it may occur that the space states of the plurality of storage units are invalid occupancy, that is, the storage data in the storage unit is invalid data, in order to timely perform space recovery on the invalid data in the storage unit, the data bit diagram may be scanned in real time or at regular time, and in the scanning process, the number of the storage units in the first state in each log storage unit is obtained, and it is determined whether the space occupied by the storage unit in the first state is greater than a preset space threshold, such as: in a certain log storage unit (200K), the number of storage units in the first state (i.e., invalid occupancy) is 100, if each storage unit is 1K, the space occupied by the storage unit in the first state is 50% of that of the log storage unit, if the preset space occupancy threshold is 40%, and 50% of the space utilization rate occupied by the storage unit in the first state is greater than 40% of that of the preset space occupancy threshold, that is, invalid data in the log storage unit exceeds 40%, step 1062 is executed, otherwise, step 1063 is executed.
1062. Deleting an updated data block in a storage unit in a first state in a log storage unit, acquiring an un-updated data block in a storage unit in a second state in the log storage unit, migrating the un-updated data block to a new storage unit, and updating metadata information of the un-updated data block to a file metadata area, wherein the metadata information of the un-updated data block comprises a physical storage address of the un-updated data block and a compressed length of the un-updated data block;
if the space ratio occupied by the storage unit in the first state in the log storage unit is greater than the preset space occupation threshold, indicating that there is too much invalid data in the log storage unit, a space reclamation operation may be performed on the log storage unit, namely, deleting the updated data block in the storage unit in the first state in the log storage unit, acquiring the non-updated data block in the storage unit in the second state in the log storage unit, and migrates the non-updated data blocks to a new storage unit (i.e., a new storage address), while updating the metadata information of the non-updated data blocks into the file metadata area, wherein the metadata information of the non-updated data block comprises the physical storage address of the non-updated data block and the compressed length of the non-updated data block, so as to perform decompression recovery operation on the data according to the metadata information of the non-updated data block at a later stage.
1063. Other processes are performed.
And if the space occupied by the storage unit in the first state in each log storage unit is not larger than the preset space occupation threshold, executing other processes, wherein the processes are not limited specifically here.
In the above embodiment, the size of the space occupied by the storage unit in the first state in the log storage unit may be determined by a preset space threshold of each log storage unit, and when the size of the space occupied by the storage unit in the first state is greater than the preset space threshold, the automatic recovery of the invalid data in each log storage unit is implemented.
With reference to fig. 7, an embodiment of a space reclamation system based on a full flash memory array in the embodiment of the present application includes:
an obtaining unit 701, configured to obtain compressed data in the performance layer;
a segmentation calculating unit 702, configured to segment the compressed data into a first data block with a preset length, and calculate a hash value of the first data block;
a matching unit 703, configured to match the hash value of the first data chunk with a duplicate removal fingerprint library in the capacity layer to determine whether there is a matching fingerprint;
a compressing unit 704, configured to determine that the first data block is a non-duplicate data block when the matching fingerprint does not exist, compress the first data block, write back the compressed first data block to the capacity layer in a manner of appending and writing a log with a preset length as a storage unit, update the fingerprint of the first data block to the deduplication fingerprint library, and update metadata information of the first data block to a file metadata area in the capacity layer, where the metadata information includes: the physical storage address of the compressed first data block and the length of the compressed first data block;
a constructing unit 705, configured to construct a data bitmap table, where the data bitmap table is used to record a space occupation state corresponding to each storage unit or multiple storage units;
the scanning unit 706 is configured to scan the data bit map, obtain a space occupation state of each storage unit, and perform space recovery according to the space occupation state of each storage unit.
It should be noted that the functions of the units in this embodiment are similar to those of the units in the embodiment described in fig. 1, and are not described again here.
In the embodiment of the present application, after the non-repeated data blocks are compressed by the compression unit 704, in a storage unit with a preset length, stored in the capacity layer in a log write-append manner, and a data bit map is constructed by the construction unit 705, for correspondingly recording the space occupation state of each storage unit, so that when the first data block in the storage unit is updated, the data bit map corresponding to the memory cell is changed from the second state to the first state, so that upon recovery in the back space, that is, the space occupation state of each storage unit in the capacity layer can be obtained by scanning the data bit diagram, and when the space occupation state of the storage unit is the first state, the space occupied by the storage unit is quickly and directionally recovered, the data volume stored by the data bitmap table is small, the rapid scanning of the data bit diagram realizes the rapid directional recovery of the space debris in the capacity layer.
Referring to fig. 8, a space reclamation system based on a full flash memory array in the embodiment of the present application is described in detail below based on the embodiment described in fig. 7, where another embodiment of the space reclamation system based on a full flash memory array in the embodiment of the present application includes:
an acquisition unit 801 configured to acquire compressed data in the performance layer;
a segmentation calculating unit 802, configured to segment the compressed data into a first data block with a preset length, and calculate a hash value of the first data block;
a matching unit 803, configured to match the hash value of the first data chunk with a deduplication fingerprint library in the capacity layer to determine whether there is a matching fingerprint;
a compressing unit 804, configured to determine, when the matching fingerprint does not exist, that the first data block is a non-duplicate data block, compress the first data block, write back the compressed first data block to the capacity layer in a manner of appending and writing a log with a preset length as a storage unit, update the fingerprint of the first data block to the deduplication fingerprint library, and update metadata information of the first data block to a file metadata area of the capacity layer, where the metadata information includes: the physical storage address of the compressed first data block and the length of the compressed first data block;
a constructing unit 805, configured to construct a data bitmap table, where the data bitmap table is used to record a space occupation state corresponding to each storage unit or multiple storage units;
the scanning unit 806 is configured to scan the data bit map, obtain a space occupation state of each storage unit, and perform space recovery according to the space occupation state of each storage unit.
Preferably, the space occupation state of the storage unit includes a first state and a second state, the first state is invalid occupation, and the second state is valid occupation;
when the first data block in the storage unit is changed or deleted, the space occupation state of the corresponding storage unit is recorded in the data bit diagram as the first state, otherwise, the space occupation state of the corresponding storage unit is recorded in the data bit diagram as the second state.
Preferably, the compressing unit 804 includes:
a compression module 8041, configured to determine that the first data block is a non-duplicate data block when the matching fingerprint does not exist, compress the first data block, write back the compressed first data block to a log storage unit in a manner of additionally writing a log with a preset length as a storage unit, and write back the log storage unit to the capacity layer after the log storage unit is full of the log storage unit, where a storage space of the log storage unit is an integer multiple of a minimum write-in unit of the capacity layer, and update the fingerprint of the first data block to the deduplication fingerprint library, and update metadata information of the first data block to a file metadata area of the capacity layer, where the metadata information includes: the physical storage address of the compressed first data block and the length of the compressed first data block;
preferably, the scanning unit 806 includes:
a scanning module 8061, configured to scan the data bit map, obtain the number of storage units in the first state in each log storage unit, and determine whether a space occupied by the storage unit in the first state is greater than a preset space occupation threshold;
the recycling module 8062 is configured to recycle a space occupied by the storage unit in the first state when the space occupied by the storage unit is greater than a preset space occupation threshold.
Preferably, the system further comprises:
a counting unit 807 for performing count management on the number of references of fingerprints in the deduplication fingerprint library;
the counting unit 807 includes:
a first counting module 8071, configured to, when a matching fingerprint of the first data block exists in the deduplication fingerprint library, perform a growing operation on the reference times of the matching fingerprint;
and the combination of (a) and (b),
a second counting module 8072, configured to perform a decreasing operation on the reference times of the matching fingerprint when the first data block referencing the matching fingerprint in the duplicate fingerprint database is updated.
Preferably, when the number of times of reference of the first fingerprint in the deduplication fingerprint library is zero, the space occupation state of the storage unit corresponding to the first data block that references the first fingerprint in the data bit map is the first state.
Preferably, the system further comprises:
a deduplication unit 808, configured to determine that the first data block is duplicate data when the matching fingerprint exists, and write back metadata information of the first data block to a metadata area of the capacity layer, where the metadata information includes a correspondence relationship between a logical address of the first data block in the compressed data, the matching fingerprint, and a physical address of the matching fingerprint.
It should be noted that the functions of the units in this embodiment are similar to those described in the embodiments of fig. 1, fig. 4, and fig. 6, and are not described again here.
In the embodiment of the present application, after the non-repeated data blocks are compressed by the compression unit 804, the non-repeated data blocks are stored in a storage unit with a preset length, stored in the capacity layer in the form of log append writes, and a data bit map is constructed by the construction unit 805, for correspondingly recording the space occupation state of each storage unit, so that when the first data block in the storage unit is updated, the data bit map corresponding to the memory cell is changed from the second state to the first state, so that upon recovery in the back space, that is, the space occupation state of each storage unit in the capacity layer can be obtained by scanning the data bit diagram, and when the space occupation state of the storage unit is the first state, the space occupied by the storage unit is quickly and directionally recovered, the data volume stored by the data bitmap table is small, the rapid scanning of the data bit diagram realizes the rapid directional recovery of the space debris in the capacity layer.
Secondly, the embodiment of the application can also perform counting management on the number of times of reference of the fingerprints in the duplicate removal fingerprint library, so that when the number of times of reference of the first fingerprint in the duplicate removal fingerprint is 0, the space occupation state of the storage unit corresponding to the first data block which refers to the first fingerprint in the corresponding data bit diagram is invalid occupation, and thus the rapid directional recovery of the space occupied by the storage unit is realized.
The space reclamation system based on the full flash memory array in the embodiment of the present application is described above from the perspective of the modular functional entity, and the space reclamation system based on the full flash memory array in the embodiment of the present application is described below from the perspective of hardware processing:
one embodiment of a data compression system of a full flash memory array in the embodiment of the present application includes:
a processor and a memory;
the memory is used for storing the computer program, and the processor is used for realizing the following steps when executing the computer program stored in the memory:
acquiring compressed data in the performance layer;
segmenting the compressed data into first data blocks with preset lengths, and calculating hash values of the first data blocks;
matching the hash value of the first data chunk with a duplicate removal fingerprint library in the capacity layer to determine whether a matching fingerprint exists;
if the matched fingerprint does not exist, determining that the first data block is a non-duplicated data block, compressing the first data block, writing the compressed first data block back to the capacity layer in a log additional writing mode by taking a preset length as a storage unit, updating the fingerprint of the first data block into the duplicate removal fingerprint library, and updating the metadata information of the first data block to a file metadata area of the capacity layer, wherein the metadata information comprises: the physical storage address of the compressed first data block and the length of the compressed first data block;
constructing a data bitmap table, wherein the data bitmap table is used for recording the space occupation state corresponding to each storage unit or a plurality of storage units;
and scanning the data bit diagram, acquiring the space occupation state of each storage unit, and performing space recovery according to the space occupation state of each storage unit.
In some embodiments of the present application, the processor may be further configured to:
the space occupation state of the storage unit comprises a first state and a second state, wherein the first state is invalid occupation, and the second state is valid occupation;
when the first data block in the storage unit is changed or deleted, the space occupation state of the corresponding storage unit is recorded in the data bit diagram as the first state, otherwise, the space occupation state of the corresponding storage unit is recorded in the data bit diagram as the second state.
In some embodiments of the present application, the processor may be further configured to:
scanning the data bit diagram, and determining that the current space occupation state of each storage unit is the first state or the second state;
and when the current space occupation state of the storage unit is the first state, recovering the space occupied by the storage unit.
In some embodiments of the present application, the processor may be further configured to:
performing count management on the number of references of the fingerprints in the duplicate fingerprint library;
in some embodiments of the present application, the processor may be further configured to:
when the matched fingerprint of the first data block exists in the duplicate removal fingerprint library, performing incremental operation on the reference times of the matched fingerprint;
and the combination of (a) and (b),
and when the first data block which refers to the matched fingerprint in the duplicate fingerprint database is updated, performing decreasing operation on the reference times of the matched fingerprint.
In some embodiments of the present application, the processor may be further configured to:
and when the reference times of the first fingerprint in the duplicate fingerprint removing library is zero, the space occupation state of the storage unit corresponding to the first data block for referencing the first fingerprint in the data bit diagram is the first state.
In some embodiments of the present application, the processor may be further configured to:
if the matching fingerprint exists, determining that the first data block is repeated data, and writing back metadata information of the first data block to a metadata area of the capacity layer, wherein the metadata information comprises a corresponding relation among a logical address of the first data block in the compressed data, the matching fingerprint and a physical address of the matching fingerprint.
In some embodiments of the present application, the processor may be further configured to:
if the matched fingerprint does not exist, determining that the first data block is a non-duplicated data block, compressing the first data block, writing back the compressed first data block to a log storage unit by taking a preset length as a storage unit, and writing back the log storage unit to the capacity layer after the log storage unit is full, wherein a storage space of the log storage unit is an integral multiple of a minimum writing unit of the capacity layer, updating the fingerprint of the first data block to the duplicate removal fingerprint library, and updating the metadata information of the first data block to a file metadata area of the capacity layer, wherein the metadata information includes: the physical storage address of the first data block after compression and the length of the first data block after compression.
It is to be understood that, when the processor in the space reclamation system based on the full flash memory array described above executes the computer program, the functions of the units in the corresponding device embodiments may also be implemented, and are not described herein again. Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program in the space reclamation system based on the full flash memory array. For example, the computer program may be partitioned into units in the full flash array based space reclamation system described above, and each unit may implement specific functions as described above for the corresponding full flash array based space reclamation system.
The space recovery system based on the full flash memory array can be computing equipment such as a desktop computer, a notebook computer, a palm computer and a cloud server. The space reclamation system based on the full flash memory array can comprise but is not limited to a processor and a memory. It will be understood by those skilled in the art that the processor and the memory are merely examples of the space reclamation system of the full flash array, and do not constitute a limitation of the space reclamation system of the full flash array, and may include more or less components, or combine certain components, or different components, for example, the space reclamation system based on the full flash array may further include an input-output device, a network access device, a bus, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like which is the control center for the computer device and which connects the various parts of the overall computer device using various interfaces and lines.
The memory may be used to store the computer programs and/or modules, and the processor may implement the various functions of the space reclamation system based on the full flash memory array by running or executing the computer programs and/or modules stored in the memory and calling the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the terminal, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The present application further provides a computer-readable storage medium for implementing the functions of a full flash array based space reclamation system, having a computer program stored thereon, which, when executed by a processor, the processor is operable to perform the steps of:
acquiring compressed data in the performance layer;
segmenting the compressed data into first data blocks with preset lengths, and calculating hash values of the first data blocks;
matching the hash value of the first data chunk with a duplicate removal fingerprint library in the capacity layer to determine whether a matching fingerprint exists;
if the matched fingerprint does not exist, determining that the first data block is a non-duplicated data block, compressing the first data block, writing the compressed first data block back to the capacity layer in a log additional writing mode by taking a preset length as a storage unit, updating the fingerprint of the first data block into the duplicate removal fingerprint library, and updating the metadata information of the first data block to a file metadata area of the capacity layer, wherein the metadata information comprises: the physical storage address of the compressed first data block and the length of the compressed first data block;
constructing a data bitmap table, wherein the data bitmap table is used for recording the space occupation state corresponding to each storage unit or a plurality of storage units;
and scanning the data bit diagram, acquiring the space occupation state of each storage unit, and performing space recovery according to the space occupation state of each storage unit.
In some embodiments of the present application, the computer program stored on the computer-readable storage medium, when executed by the processor, may be specifically configured to perform the following steps:
the space occupation state of the storage unit comprises a first state and a second state, wherein the first state is invalid occupation, and the second state is valid occupation;
when the first data block in the storage unit is changed or deleted, the space occupation state of the corresponding storage unit is recorded in the data bit diagram as the first state, otherwise, the space occupation state of the corresponding storage unit is recorded in the data bit diagram as the second state.
In some embodiments of the present application, the computer program stored on the computer-readable storage medium, when executed by the processor, may be specifically configured to perform the following steps:
scanning the data bit diagram, and determining that the current space occupation state of each storage unit is the first state or the second state;
and when the current space occupation state of the storage unit is the first state, recovering the space occupied by the storage unit.
In some embodiments of the present application, the computer program stored on the computer-readable storage medium, when executed by the processor, may be specifically configured to perform the following steps:
performing count management on the number of references of the fingerprints in the duplicate fingerprint library;
in some embodiments of the present application, the computer program stored on the computer-readable storage medium, when executed by the processor, may be specifically configured to perform the following steps:
when the matched fingerprint of the first data block exists in the duplicate removal fingerprint library, performing incremental operation on the reference times of the matched fingerprint;
and the combination of (a) and (b),
and when the first data block which refers to the matched fingerprint in the duplicate fingerprint database is updated, performing decreasing operation on the reference times of the matched fingerprint.
In some embodiments of the present application, the computer program stored on the computer-readable storage medium, when executed by the processor, may be specifically configured to perform the following steps:
and when the reference times of the first fingerprint in the duplicate fingerprint removing library is zero, the space occupation state of the storage unit corresponding to the first data block for referencing the first fingerprint in the data bit diagram is the first state.
In some embodiments of the present application, the computer program stored on the computer-readable storage medium, when executed by the processor, may be specifically configured to perform the following steps:
if the matching fingerprint exists, determining that the first data block is repeated data, and writing back metadata information of the first data block to a metadata area of the capacity layer, wherein the metadata information comprises a corresponding relation among a logical address of the first data block in the compressed data, the matching fingerprint and a physical address of the matching fingerprint.
In some embodiments of the present application, the computer program stored on the computer-readable storage medium, when executed by the processor, may be specifically configured to perform the following steps:
if the matched fingerprint does not exist, determining that the first data block is a non-duplicated data block, compressing the first data block, writing back the compressed first data block to a log storage unit by taking a preset length as a storage unit, and writing back the log storage unit to the capacity layer after the log storage unit is full, wherein a storage space of the log storage unit is an integral multiple of a minimum writing unit of the capacity layer, updating the fingerprint of the first data block to the duplicate removal fingerprint library, and updating the metadata information of the first data block to a file metadata area of the capacity layer, wherein the metadata information includes: the physical storage address of the first data block after compression and the length of the first data block after compression.
It will be appreciated that the integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a corresponding one of the computer readable storage media. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium and used by a processor to implement the steps of the above embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (18)

1. A method for space reclamation based on a full flash memory array, the full flash memory array comprising a performance layer and a capacity layer, the method comprising:
acquiring compressed data in the performance layer;
dividing the compressed data into first data blocks, and calculating hash values of the first data blocks;
matching the hash value of the first data chunk with a duplicate removal fingerprint library in the capacity layer to determine whether a matching fingerprint exists;
if the matched fingerprint does not exist, determining that the first data block is a non-repeated data block, compressing the first data block, writing the compressed first data block back to the capacity layer in a log additional writing mode by taking a preset length as a storage unit, and updating the fingerprint of the first data block into the duplicate removal fingerprint library, wherein the log additional writing is used as a different-place updating mode for improving the IO performance of the capacity layer;
constructing a data bitmap table, wherein the data bitmap table is used for recording the space occupation state corresponding to each storage unit or a plurality of storage units;
and scanning the data bit diagram, acquiring the space occupation state of each storage unit, and performing space recovery according to the space occupation state of each storage unit.
2. The method of claim 1, wherein the space usage state of the storage unit comprises a first state and a second state, the first state being invalid usage and the second state being valid usage;
when the first data block in the storage unit is changed or deleted, the space occupation state of the corresponding storage unit is recorded in the data bit diagram as the first state, otherwise, the space occupation state of the corresponding storage unit is recorded in the data bit diagram as the second state.
3. The method of claim 2, wherein scanning the data bit map table to obtain the space usage status of each storage unit and performing space reclamation according to the space usage status comprises:
scanning the data bit diagram, and determining that the current space occupation state of each storage unit is the first state or the second state;
and when the current space occupation state of the storage unit is the first state, recovering the space occupied by the storage unit.
4. The method of claim 1, further comprising:
performing count management on the number of references of the fingerprints in the duplicate fingerprint library;
the performing count management on the number of references of the fingerprints in the duplicate fingerprint library comprises:
when the matched fingerprint of the first data block exists in the duplicate removal fingerprint library, performing incremental operation on the reference times of the matched fingerprint;
and the combination of (a) and (b),
and when the first data block which refers to the matched fingerprint in the duplicate fingerprint database is updated, performing decreasing operation on the reference times of the matched fingerprint.
5. The method of claim 4, further comprising:
and when the reference times of the first fingerprint in the duplicate fingerprint removing library is zero, the space occupation state of the storage unit corresponding to the first data block for referencing the first fingerprint in the data bit diagram is the first state.
6. The method according to any one of claims 1 to 5, further comprising:
if the matching fingerprint exists, determining that the first data block is repeated data, and writing back metadata information of the first data block to a metadata area of the capacity layer, wherein the metadata information comprises a corresponding relation among a logical address of the first data block in the compressed data, the matching fingerprint and a physical address of the matching fingerprint.
7. The method according to claim 6, wherein writing back the compressed first data block to the capacity layer in a log append write manner in a storage unit with a preset length comprises:
and writing the compressed first data block back to a log storage unit in a log additional writing mode, and writing the log storage unit back to the capacity layer after the log storage unit is full, wherein the storage space of the log storage unit is an integral multiple of the minimum writing unit of the capacity layer.
8. The method of claim 7, wherein after writing back the compressed first data block to the capacity layer in a log append write, the method further comprises:
updating metadata information of the first data block to a file metadata area of the capacity layer or the de-duplication fingerprint database, wherein the metadata information includes: and the compressed physical storage address of the first data block and the compressed length of the first data block are used for decompressing the first data block according to the metadata information at a later stage.
9. A space reclamation system based on a full flash array, the full flash array including a performance layer and a capacity layer, the system comprising:
an acquisition unit configured to acquire compressed data in the performance layer;
a segmentation calculation unit configured to segment the compressed data into first data blocks, and calculate hash values of the first data blocks;
a matching unit, configured to match the hash value of the first data chunk with a duplicate removal fingerprint library in the capacity layer to determine whether a matching fingerprint exists;
the compression unit is used for determining that the first data block is a non-repeated data block when the matched fingerprint does not exist, compressing the first data block, writing the compressed first data block back to the capacity layer in a log additional writing mode by taking a preset length as a storage unit, and updating the fingerprint of the first data block into the duplicate removal fingerprint library, wherein the log additional writing is used as an allopatric updating mode and is used for improving the IO performance of the capacity layer;
the data bitmap table is used for recording the space occupation state corresponding to each storage unit or a plurality of storage units;
and the scanning unit is used for scanning the data bit diagram, acquiring the space occupation state of each storage unit and carrying out space recovery according to the space occupation state of each storage unit.
10. The system of claim 9, wherein the space usage status of the storage unit comprises a first status and a second status, the first status being invalid usage and the second status being valid usage;
when the first data block in the storage unit is changed or deleted, the space occupation state of the corresponding storage unit is recorded in the data bit diagram as the first state, otherwise, the space occupation state of the corresponding storage unit is recorded in the data bit diagram as the second state.
11. The system of claim 10, wherein the scanning unit comprises:
a scanning module, configured to scan the data bit map, and determine that a current space occupation state of each storage unit is the first state or the second state;
and the recovery module is used for recovering the space occupied by the storage unit when the current space occupied state of the storage unit is the first state.
12. The system of claim 9, further comprising:
a counting unit for performing counting management on the number of references of the fingerprints in the duplicate fingerprint library;
the counting unit includes:
the first counting module is used for executing incremental operation on the reference times of the matched fingerprint when the matched fingerprint of the first data block exists in the duplicate removal fingerprint library;
and the combination of (a) and (b),
and the second counting module is used for performing decreasing operation on the reference times of the matched fingerprint when the first data block which refers to the matched fingerprint in the de-duplication fingerprint library is updated.
13. The system of claim 12,
and when the reference times of the first fingerprint in the duplicate fingerprint removing library is zero, the space occupation state of the storage unit corresponding to the first data block for referencing the first fingerprint in the data bit diagram is the first state.
14. The system of any one of claims 9 to 13, further comprising:
and the deduplication unit is used for determining that the first data block is duplicated data when the matching fingerprint exists, and writing back metadata information of the first data block to a metadata area of the capacity layer, wherein the metadata information includes a corresponding relation between a logical address of the first data block in the compressed data, the matching fingerprint and a physical address of the matching fingerprint.
15. The system of claim 14, wherein the compression unit comprises:
the compression module is used for determining that the first data block is a non-duplicate data block when the matching fingerprint does not exist, compressing the first data block, writing the compressed first data block back to a log storage unit in a log additional writing mode, and writing the log storage unit back to the capacity layer after the log storage unit is full, wherein the storage space of the log storage unit is an integral multiple of the minimum writing unit of the capacity layer.
16. The system of claim 15, further comprising:
an updating unit, configured to update metadata information of the first data block into a file metadata area of the capacity layer or the deduplication fingerprint library, where the metadata information includes: and the compressed physical storage address of the first data block and the compressed length of the first data block are used for decompressing the first data block according to the metadata information at a later stage.
17. A full flash array based space reclamation system comprising a processor, wherein the processor, when executing a computer program stored on a memory, is configured to implement the full flash array based space reclamation method of any of claims 1 through 8.
18. A readable storage medium having stored thereon a computer program for implementing the full flash array based space reclamation method as recited in any one of claims 1 to 8 when the computer program is executed by a processor.
CN201811289335.4A 2018-10-31 2018-10-31 Space recycling method and system based on full flash memory array Active CN111125033B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811289335.4A CN111125033B (en) 2018-10-31 2018-10-31 Space recycling method and system based on full flash memory array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811289335.4A CN111125033B (en) 2018-10-31 2018-10-31 Space recycling method and system based on full flash memory array

Publications (2)

Publication Number Publication Date
CN111125033A true CN111125033A (en) 2020-05-08
CN111125033B CN111125033B (en) 2024-04-09

Family

ID=70485668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811289335.4A Active CN111125033B (en) 2018-10-31 2018-10-31 Space recycling method and system based on full flash memory array

Country Status (1)

Country Link
CN (1) CN111125033B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111913657A (en) * 2020-07-10 2020-11-10 长沙景嘉微电子股份有限公司 Block data read-write method, device, system and storage medium
CN112699067A (en) * 2021-01-04 2021-04-23 瑞芯微电子股份有限公司 Instruction addressing method and device
CN113064556A (en) * 2021-04-29 2021-07-02 山东英信计算机技术有限公司 BIOS data storage method, device, equipment and storage medium
CN113608687A (en) * 2021-06-30 2021-11-05 苏州浪潮智能科技有限公司 Space recovery method, device and equipment and readable storage medium
CN113836051A (en) * 2021-11-29 2021-12-24 苏州浪潮智能科技有限公司 Metadata space recovery method, device, equipment and storage medium
CN115543937A (en) * 2022-03-22 2022-12-30 荣耀终端有限公司 File defragmentation method and electronic device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982180A (en) * 2012-12-18 2013-03-20 华为技术有限公司 Method and device for storing data
CN103049222A (en) * 2012-12-28 2013-04-17 中国船舶重工集团公司第七0九研究所 RAID5 (redundant array of independent disk 5) write IO optimization processing method
CN104035729A (en) * 2014-05-22 2014-09-10 中国科学院计算技术研究所 Block device thin-provisioning method for log mapping
US20150199147A1 (en) * 2014-01-14 2015-07-16 International Business Machines Corporation Storage thin provisioning and space reclamation
CN106502587A (en) * 2016-10-19 2017-03-15 华为技术有限公司 Data in magnetic disk management method and magnetic disk control unit
CN106951375A (en) * 2016-01-06 2017-07-14 北京忆恒创源科技有限公司 The method and device of snapped volume is deleted within the storage system
CN107797934A (en) * 2016-09-05 2018-03-13 北京忆恒创源科技有限公司 The method and storage device that distribution is ordered are gone in processing
CN108427538A (en) * 2018-03-15 2018-08-21 深信服科技股份有限公司 Storage data compression method, device and the readable storage medium storing program for executing of full flash array

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982180A (en) * 2012-12-18 2013-03-20 华为技术有限公司 Method and device for storing data
CN103049222A (en) * 2012-12-28 2013-04-17 中国船舶重工集团公司第七0九研究所 RAID5 (redundant array of independent disk 5) write IO optimization processing method
US20150199147A1 (en) * 2014-01-14 2015-07-16 International Business Machines Corporation Storage thin provisioning and space reclamation
CN104035729A (en) * 2014-05-22 2014-09-10 中国科学院计算技术研究所 Block device thin-provisioning method for log mapping
CN106951375A (en) * 2016-01-06 2017-07-14 北京忆恒创源科技有限公司 The method and device of snapped volume is deleted within the storage system
CN107797934A (en) * 2016-09-05 2018-03-13 北京忆恒创源科技有限公司 The method and storage device that distribution is ordered are gone in processing
CN106502587A (en) * 2016-10-19 2017-03-15 华为技术有限公司 Data in magnetic disk management method and magnetic disk control unit
CN108427538A (en) * 2018-03-15 2018-08-21 深信服科技股份有限公司 Storage data compression method, device and the readable storage medium storing program for executing of full flash array

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111913657A (en) * 2020-07-10 2020-11-10 长沙景嘉微电子股份有限公司 Block data read-write method, device, system and storage medium
CN112699067A (en) * 2021-01-04 2021-04-23 瑞芯微电子股份有限公司 Instruction addressing method and device
CN112699067B (en) * 2021-01-04 2024-05-14 瑞芯微电子股份有限公司 Instruction addressing method and device
CN113064556A (en) * 2021-04-29 2021-07-02 山东英信计算机技术有限公司 BIOS data storage method, device, equipment and storage medium
CN113608687A (en) * 2021-06-30 2021-11-05 苏州浪潮智能科技有限公司 Space recovery method, device and equipment and readable storage medium
CN113836051A (en) * 2021-11-29 2021-12-24 苏州浪潮智能科技有限公司 Metadata space recovery method, device, equipment and storage medium
CN113836051B (en) * 2021-11-29 2022-03-22 苏州浪潮智能科技有限公司 Metadata space recovery method, device, equipment and storage medium
CN115543937A (en) * 2022-03-22 2022-12-30 荣耀终端有限公司 File defragmentation method and electronic device
CN115543937B (en) * 2022-03-22 2023-07-11 荣耀终端有限公司 File defragmentation method and electronic device

Also Published As

Publication number Publication date
CN111125033B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN111125033B (en) Space recycling method and system based on full flash memory array
CN108427538B (en) Storage data compression method and device of full flash memory array and readable storage medium
CN108427539B (en) Offline de-duplication compression method and device for cache device data and readable storage medium
CN107506153B (en) Data compression method, data decompression method and related system
US10635359B2 (en) Managing cache compression in data storage systems
CN103098035B (en) Storage system
US8214620B2 (en) Computer-readable recording medium storing data storage program, computer, and method thereof
CN107682016B (en) Data compression method, data decompression method and related system
CN105009067B (en) Managing operations on units of stored data
US11113245B2 (en) Policy-based, multi-scheme data reduction for computer memory
KR20170054299A (en) Reference block aggregating into a reference set for deduplication in memory management
CN111124940B (en) Space recovery method and system based on full flash memory array
CN111381779B (en) Data processing method, device, equipment and storage medium
KR102275431B1 (en) Managing operations on stored data units
CN111124259A (en) Data compression method and system based on full flash memory array
CN111124939A (en) Data compression method and system based on full flash memory array
KR102275240B1 (en) Managing operations on stored data units
CN111198857A (en) Data compression method and system based on full flash memory array
CN111061428B (en) Data compression method and device
US20230076729A2 (en) Systems, methods and devices for eliminating duplicates and value redundancy in computer memories
CN111857574A (en) Write request data compression method, system, terminal and storage medium
CN109271463B (en) Method for recovering inodb compressed data of MySQL database
US10817417B1 (en) Data storage efficiency using storage devices with variable-size internal data mapping
US20230367477A1 (en) Storage system, data management program, and data management method
CN111625186A (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant