CN108427538A - Storage data compression method, device and the readable storage medium storing program for executing of full flash array - Google Patents

Storage data compression method, device and the readable storage medium storing program for executing of full flash array Download PDF

Info

Publication number
CN108427538A
CN108427538A CN201810214771.9A CN201810214771A CN108427538A CN 108427538 A CN108427538 A CN 108427538A CN 201810214771 A CN201810214771 A CN 201810214771A CN 108427538 A CN108427538 A CN 108427538A
Authority
CN
China
Prior art keywords
data
data block
layer
capacity
fingerprint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810214771.9A
Other languages
Chinese (zh)
Other versions
CN108427538B (en
Inventor
夏文
古亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN201810214771.9A priority Critical patent/CN108427538B/en
Publication of CN108427538A publication Critical patent/CN108427538A/en
Application granted granted Critical
Publication of CN108427538B publication Critical patent/CN108427538B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0688Non-volatile semiconductor memory arrays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0626Reducing size or complexity of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System (AREA)

Abstract

The embodiment of the invention discloses a kind of storage data compression method, device and the readable storage medium storing program for executing of full flash array, the space availability ratio for improving storage system.Present invention method includes:Judge whether the current storage capacity of performance layer is more than secure threshold;If being not more than, the data segment of preset length in reading performance layer;Data segment is divided into the data block of presetting granularity, and calculates the fingerprint of data block;The fingerprint base for inquiring capacity layer judges to whether there is fingerprint in fingerprint base;If there are fingerprints, it is determined that the data block is duplicate data block, and the metadata information of data block is written back to the metadata area of capacity layer, and metadata information includes sequence, the physical storage address of data block and the length of data block of the data block in data segment.The embodiment of the present invention additionally provides a kind of storage data compression device of full flash array, IO performances and storage efficiency for improving storage system.

Description

Storage data compression method, device and the readable storage medium storing program for executing of full flash array
Technical field
The present invention relates to technical field of data storage more particularly to a kind of storage data compression method of full flash array, Device and readable storage medium storing program for executing.
Background technology
Full flash array:Flash memory solid-state disk (SSD) is widely used in the caching of mechanical hard disk, such as Ceph and ZFS, this master There are good random IO performances because of flash memory solid-state disk, and traditional mechanical hard disk shows in terms of the support of random IO performances Bad, storage system, which disposes full flash memory device, at present becomes popular tendency, with the overall performance of General Promotion storage system.Consider Cost to SSD is much more expensive than present mechanical hard disk, and simultaneously under cloud computing instantly and virtualized environment, storage system is deposited In a large amount of repeated and redundant data, the logical memory space of SSD storage systems can be extended by data deduplication and compress technique, The capacity utilization for promoting SSD achievees the purpose that reduce SSD costs.
In general, the physical structure of full flash array is divided into capacity layer (read buffer) and performance layer (write buffer), generally (but also not limiting) is formed by pcie SSD and sata SSD respectively, is determined this is mainly due to SSD readwrite performance asymmetry, General reading rate is far above writing rate, while the durability of pcie SSD ratio sata SSD is stronger, so how to dispose performance layer Data store strategy between capacity layer ensures full flash array to improve the response speed of full flash array performance layer Ergosphere has preferably write-in bandwidth and time delay, while promoting the memory space of full flash array capacity layer, becomes current research Hot spot.
Invention content
An embodiment of the present invention provides a kind of storage data compression method, device and the readable storage mediums of full flash array Matter, the response speed for improving full flash array performance layer, ensure full flash array performance layer have preferably write-in bandwidth and Time delay, while the memory space of full flash array capacity layer is promoted, to improve the IO performances of storage system.
First aspect of the embodiment of the present invention provides a kind of storage data compression method of full flash array, full flash array Including performance layer and capacity layer, this method includes:
Judge whether the current storage capacity of performance layer is more than secure threshold;
If being not more than, the data segment of preset length in reading performance layer;
Data segment is divided into the data block of presetting granularity, and calculates the fingerprint of data block;
The fingerprint base for inquiring capacity layer judges to whether there is fingerprint in fingerprint base;
If there are fingerprints, it is determined that the data block is duplicate data, and the metadata information of data block is written back to capacity The metadata area of layer, metadata information include the physical storage address and number of sequence of the data block in data segment, data block According to the length of block.
Preferably, this method further includes:
Judge whether the modification number of data segment is more than first threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered.
If more than data segment to be then directly written back to the data area of capacity layer.
Preferably, this method further includes:
Judge whether the currently stored bandwidth of performance layer is more than bandwidth threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered;
If more than data segment to be then directly written back to the data area of capacity layer.
Preferably, this method further includes:
With the length of the compressed data block of reduction length coded representation duplicate removal;
Metadata includes sequence, logical address and the reduction length coding of data block.
Preferably, this method further includes:
If current storage capacity is more than secure threshold, directly by the data field of the write back data in performance layer to capacity layer Domain;
If fingerprint is not present in fingerprint base, squeeze operation is executed to data block, and compressed data block is written back to appearance The data area of layer is measured, and the fingerprint of the metadata information of data block after compression and former data block is updated in fingerprint base, institute Stating metadata information includes:The compressed physical storage address of data block and the compressed length of data block.
Second aspect of the present invention provides a kind of storage data compression device of full flash array, and full flash array is inclusive Ergosphere and capacity layer, the device include:
First judging unit, for judging whether the current storage capacity of performance layer is more than secure threshold;
Reading unit is used for when no more than secure threshold, the data segment of preset length in reading performance layer;
Computing unit, the data block for data segment to be divided into presetting granularity, and calculate the fingerprint of data block;
Inquiry judging unit, the fingerprint base for inquiring capacity layer judge to whether there is fingerprint in fingerprint base;
Duplicate removal unit for when there are fingerprint, determining that the data block is duplicate data, and the metadata of data block is believed Breath is written back to the metadata area of capacity layer, and metadata information includes the physics of sequence of the data block in data segment, data block The length of storage address and data block.Preferably, which further includes:
Second judgment unit, for judging whether the modification number of data segment is more than first threshold;
First trigger element, for when no more than first threshold, then data segment to be divided into the number of presetting granularity by triggering The step of according to block.
First write back unit, for when more than first threshold, data segment to be directly written back to the data area of capacity layer.
Preferably, which further includes:
Third judging unit, for judging whether the currently stored bandwidth of performance layer is more than bandwidth threshold;
Second trigger element, for when no more than bandwidth threshold, data segment to be divided into the data of presetting granularity by triggering The step of block;
Second write back unit, for when more than bandwidth threshold, data segment to be directly written back to the data area of capacity layer.
Preferably, duplicate removal compression unit further includes:
Mark module, for the length of the compressed data block of reduction length coded representation duplicate removal;
Metadata includes sequence, logical address and the reduction length coding of data block.
Preferably, which further includes:
Third write back unit, for when current storage capacity is more than secure threshold, directly returning the data in performance layer It is written to the data area of capacity layer;
4th write back unit, when for fingerprint to be not present in fingerprint base,
Squeeze operation is executed to data block, and compressed data block is written back to the data area of capacity layer, and will pressure The metadata information of data block and the fingerprint of former data block are updated in fingerprint base after contracting, and the metadata information includes:Data The compressed physical storage address of block and the compressed length of data block.
The present invention also provides a kind of computer installation, including processor, which is stored in execution on memory Computer program when, for realizing following step:
Judge whether the current storage capacity of performance layer is more than secure threshold;
If being not more than, the data segment of preset length in reading performance layer;
Data segment is divided into the data block of presetting granularity, and calculates the fingerprint of data block;
The fingerprint base for inquiring capacity layer judges to whether there is fingerprint in fingerprint base;
If there are fingerprints, it is determined that the data block is duplicate data, and the metadata information of the data block is written back to appearance Measure the metadata area of layer, metadata information include sequence of the data block in data segment, data block physical storage address and The length of data block.
The present invention also provides a kind of readable storage medium storing program for executing, are stored thereon with computer program, which is held When row, for realizing following step:
Judge whether the current storage capacity of performance layer is more than secure threshold;
If being not more than, the data segment of preset length in reading performance layer;
Data segment is divided into the data block of presetting granularity, and calculates the fingerprint of data block;
The fingerprint base for inquiring capacity layer judges to whether there is fingerprint in fingerprint base;
If there are fingerprints, it is determined that the data block is duplicate data, and the metadata information of the data block is written back to appearance Measure the metadata area of layer, metadata information include sequence of the data block in data segment, data block physical storage address and The length of data block.As can be seen from the above technical solutions, the embodiment of the present invention has the following advantages:
In the embodiment of the present invention, the storage data compression device of full flash array first judges the current storage capacity of performance layer Whether secure threshold is more than, and when memory capacity is not more than secure threshold, the data segment of preset length in reading performance layer, and Data segment is divided into data block, calculates the fingerprint of data block, further in the fingerprint base of capacity layer there are when the fingerprint, really The fixed data block is duplicate data, and the metadata information of the data block is stored to the capacity layer of full flash array, because should Compression set is when the memory capacity in performance layer is not up to secure threshold, in real time by the write back data in performance layer to capacity Layer improves the response of full flash array performance layer to provide the memory space of bigger for the performance layer of full flash array Speed ensures that full flash array performance layer has preferably write-in bandwidth and time delay, while the data block duplicate removal of performance layer being compressed The memory space that capacity layer is further improved to the capacity layer of full flash array is stored afterwards, to improve storage system Space availability ratio and storage efficiency.
Description of the drawings
Fig. 1 is a kind of one embodiment signal of the storage data compression method of full flash array in the embodiment of the present invention Figure;
Fig. 2 is a kind of another embodiment signal of the storage data compression method of full flash array in the embodiment of the present invention Figure;
Fig. 3 be the performance layer of full flash array, capacity layer and performance layer data, capacity layer data structural schematic diagram;
Fig. 4 is a kind of one embodiment signal of the storage data compression device of full flash array in the embodiment of the present invention Figure;
Fig. 5 is a kind of another embodiment signal of the storage data compression device of full flash array in the embodiment of the present invention Figure.
Specific implementation mode
An embodiment of the present invention provides a kind of storage data compression method, device and the readable storage mediums of full flash array Matter ensures that full flash array performance layer has preferably write-in band for the response speed for improving full flash array performance layer Wide and time delay, to further increase the IO performances of storage system.
For convenience of understanding, the expert data in text is explained as follows below:
Data deduplication:Data deduplication is also known as data de-duplication (Data Deduplication), is that a kind of apply is being deposited The technology for globally identifying and eliminating redundant data in storage system, becomes the hot spot of storage system research in recent years.Data Secure Hash abstract (such as SHA1 fingerprint) of the duplicate removal by calculating data block come unique identification data block, avoid data by The matching of a character, and storage system only needs simply to safeguard the concordance list of secure Hash abstract, so that it may it is quick to realize It easily identifies duplicate data, is with good expansibility;The data content repeated only needs the data pointer of recording responses Information is that can reach the purpose for saving memory space;So data deduplication technology can not only greatly save memory space to Improve the resource utilization of storage device.
Data compression:Data compression is also a kind of redundant data technology for eliminating of mainstream, is mainly disappeared by way of coding Except redundant data information, i.e., under the premise of ensureing that legacy data information is not lost, original contents are converted, for repeating Byte sequence less byte number coded representation, eliminate partial redundance data and finally saving memory space to reach Purpose.Mainly adopt the compression algorithms such as LZ4, LZO applied to the data compression tool of storage system at present.
Full flash array:Flash memory solid-state disk (SSD) is widely used in the caching of mechanical hard disk, such as Ceph and ZFS, this master There are good random IO performances because of flash memory solid-state disk, and traditional mechanical hard disk shows in terms of the support of random IO performances Bad, storage system, which disposes full flash memory device, at present becomes popular tendency, with the overall performance of General Promotion storage system.Consider Cost to SSD is much more expensive than present mechanical hard disk, and simultaneously under cloud computing instantly and virtualized environment, storage system is deposited In a large amount of repeated and redundant data, the logical memory space of SSD storage systems can be extended by data deduplication and compress technique, The capacity utilization for promoting SSD achievees the purpose that reduce SSD costs.
In general, for the equipment with processor, the IO performances of storage system are to influence device systems The principal element of energy, and when the external memory of equipment is deployed as full flash array, the physical structure of general full flash array It is divided into capacity layer and performance layer, wherein performance layer is also known as write buffer, and capacity layer is also known as read buffer, and full flash array is SSD When, because of the asymmetry of SSD readwrite performances, reading speed is far above writing speed, therefore generally deployment pcie SSD are capacity layer, Sata SSD are performance layer, and the data store strategy how being arranged between pcie SSD and sata SSD, to improve as property The response speed of the sata SSD of ergosphere ensures that sata SSD have preferably write-in bandwidth and time delay, while improving capacity layer Memory space is the technical problem to be solved in the present invention to further increase the IO performances of storage system.
Based on the above issues, the present invention proposes a kind of storage data compression method of full flash array, for convenience of understanding, Referring to Fig. 1, in the embodiment of the present invention full flash array storage data compression method, including:
101, judge whether the current storage capacity of performance layer more than secure threshold thens follow the steps 102 if being not more than, If more than thening follow the steps 106;
It is well known that the response speed of full flash array performance layer (write buffer) is improved, to ensure that performance layer has preferably Bandwidth and time delay is written, is a kind of method for improving storage system IO performances, current general full flash array is all SSD, and Because of the asymmetry of SSD readwrite performances, reading speed is far above writing speed, therefore generally deployment pcie SSD are capacity layer, Sata SSD are performance layer, and the reading speed of pcie SSD ratio sata SSD is faster, and the durability of pcie SSD is stronger, and It, can be when the current storage capacity of performance layer be not more than secure threshold, by property in order to ensure the better response speed of performance layer Data in ergosphere are written back to capacity layer in real time, have sufficiently large space to be stored with the data ensured in performance layer.
Therefore the storage data compression device (later referred to as compression set) of full flash array will deposit the current of performance layer It stores up capacity to be judged, and when the current storage capacity of performance layer is less than secure threshold, extremely by the write back data in performance layer Capacity layer, to ensure the timely write-back of data in performance layer, it should be noted that:
The secure threshold of performance layer capacity be in order to ensure data in performance layer can in the form of duplicate removal is compressed it is quick Backwash to capacity layer a critical value, when the current storage capacity of performance layer be less than or equal to secure threshold when, then to performance Data in layer execute duplicate removal compression, and when the current storage capacity of performance layer is more than secure threshold, in order to reduce weight Contracting occupies the I/O resource of performance layer, then abandons executing duplicate removal compression to the data in performance layer, and directly by the number in performance layer According to being written back to capacity layer, therefore the secure threshold can be to be configured for the purpose of the IO performances for improving storage system, the safety threshold Value can be the 80% or 60% of total memory capacity, be not particularly limited herein to the size of secure threshold.
And compression set can both judge the current storage capacity of performance layer in real time, it can also (every 5 points of timing Clock) or the current storage capacity to performance layer of not timing judge, specifically, compression set can also be according to the number of system Judged according to process demand, if the treating capacity of current data is bigger, current storage capacity is judged in real time, if working as The treating capacity of preceding data is general, then timing judges current storage capacity, indefinite if the treating capacity of current data is seldom When current storage capacity is judged, with achieve the purpose that save system resource.
102, in reading performance layer preset length data segment;
If the current storage capacity in performance layer is no more than the secure threshold in performance layer, compression set can be with reading performance The data that store in layer, and by the write back data to capacity layer in performance layer, but in order in more easily reading performance layer Data, compression set can be read out the data in performance layer according to preset length, and specific preset length can be 1M or 2M can be according to the system performance of compression set to facilitate the processing of data to be herein for the preset length of data Purpose is designed, and is not specifically limited herein.
103, data segment is divided into the data block of presetting granularity, and calculate the fingerprint of data block;
In compression set reading performance layer after the data segment of preset length, in order to after data segment is written back to capacity layer, save The about occupied space of the data segment carries out duplicate removal compression to the data segment, and the compressed data segment of duplicate removal is written back to appearance Measure layer.
Specifically, compression set is to the compression step of the data segment of preset length:The data segment is divided into default grain The data block of degree, and the fingerprint of data block is calculated, to judge to whether there is data content identical with the data block in capacity layer. Wherein, the presetting granularity of data block can be the integral multiple of 4KB (because 4KB is the minimum write-in unit of SSD), i.e. 4KB, 8KB, 16KB, 24KB etc., the size of data block is using the processing speed of system as foundation herein, for the purpose of the processing speed for improving data It is configured, is not particularly limited herein.
Because SHA-1 algorithms are the binary values that the binary value of random length is mapped as to shorter regular length, this Small binary value is cryptographic Hash, and cryptographic Hash is the unique and compact numerical value representation of one piece of data, if one section of hash A letter of the paragraph is changed in plain text and only, subsequent cryptographic Hash will all generate different values, therefore it is same for finding hash The different inputs of two of value, are computationally impossible, so the Hash of a certain hash plaintext confirmed by SHA-1 algorithms Value can be considered as " fingerprint " of the hash plaintext, and MD5 algorithms are identical as the principle of SHA-1 algorithms, therefore SHA-1 and MD5 algorithms It is often used in the fingerprint for calculating data block.
104, the fingerprint base for inquiring capacity layer judges to whether there is the fingerprint in fingerprint base, and if it exists, then follow the steps 105, if being not present, then follow the steps 106;
It is understood that in order to expand the memory space of capacity layer as much as possible, the capacity layer in the present embodiment is right To duplicate data stored in the form of compressed data, wherein in identical block when data are stored Hold, specific compression process can be understood as follows:
When in capacity layer exist data content identical with above-mentioned data block when, i.e., in the fingerprint base in capacity layer exist with When the identical fingerprint of above-mentioned fingerprint, compression set deletes the data block, and the data block is suitable in former data segment In sequence, logical address, length deposit metadata, for recording position of the data block in former data segment, it is convenient for the later stage pair The recovery of the data block.
Therefore compression set needs the fingerprint base for inquiring capacity layer, to judge in fingerprint base after obtaining the fingerprint of data block With the presence or absence of the fingerprint, and step 105 is executed there are when the fingerprint in fingerprint base, when the fingerprint is not present, executed Step 106;
105, it determines that the data block is duplicate data, and the metadata information of data block is written back to the metadata of capacity layer Region, metadata information include sequence, the physical storage address of data block and the length of data block of the data block in data segment;
In order to expand the memory space of capacity layer, the data in capacity layer are stored in a compressed format in the present embodiment , specifically, in fingerprint base in compression set judgement capacity layer when having the fingerprint of above-mentioned data block, it is determined that the data block is Duplicate data, and the metadata information of the data block is written back to the metadata area of capacity layer, wherein Fig. 3 is full flash array Performance layer, capacity layer and performance layer data, capacity layer data structural schematic diagram, specific metadata information includes data block The physical storage address of sequence, data block in data segment and the length of data block.
106, other flows are executed.
In the present embodiment, it is more than in secure threshold or capacity layer in the current storage capacity of performance layer and the number is not present According to block fingerprint when, then execute other flows, be not particularly limited herein.
In the embodiment of the present invention, the storage data compression device of full flash array first judges the current storage capacity of performance layer Whether secure threshold is more than, and when memory capacity is not more than secure threshold, the data segment of preset length in reading performance layer, and Data segment is divided into data block, calculates the fingerprint of data block, further in the fingerprint base of capacity layer there are when the fingerprint, really The fixed data block is duplicate data, and the metadata of the data block is stored to the capacity layer of full flash array, because of the compression Device is when the memory capacity in performance layer is not up to secure threshold, in real time by the write back data in performance layer to capacity layer, from And the memory space of bigger is provided for the performance layer of full flash array, the response speed of full flash array performance layer is improved, It ensures that full flash array performance layer has preferably write-in bandwidth and time delay, while will be stored to complete after the data block duplicate removal of performance layer The capacity layer of flash array further improves the memory space of capacity layer, to improve the space availability ratio of storage system And storage efficiency.
It is understood that under certain scenes, as being directed to common data (corporate communication record) in massive corporate, Because the flowing of personnel or the change of communication modes can lead to the data frequent updating in address list, if the communication of company is recorded It is stored in capacity layer, because data are stored in the form of duplicate removal, therefore when later data updates, the duplicate removal of data can be caused The variation of reduction length after reference problem and data update, and while causing data to do strange land update bring fragmentation of data problem, And then the difficulty and expense of follow-up data space reclamation are increased, for the problem, the embodiment of the present invention proposes a kind of full sudden strain of a muscle The storage data compression method of array is deposited, referring to Fig. 2, a kind of storage data compression of full flash array in the embodiment of the present invention Another embodiment of method, including:
201, judge whether the current storage capacity of performance layer more than secure threshold thens follow the steps 202 if being not more than, If more than thening follow the steps 209;
202, in reading performance layer preset length data segment;
It should be noted that step 201 in the present embodiment is to 202 and the step 101 in embodiment described in Fig. 1 to 102 Similar, details are not described herein again.
203, judge whether the modification number of data segment more than first threshold thens follow the steps 204 if being not more than, if greatly In thening follow the steps 209;
The fragmentation of data problem brought to reduce later data frequent updating, it is pre- in compression set reading performance layer If after the data segment of length, judging the modification number of the data segment, if modification number is more than first threshold, judgement should Data segment belongs to frequently modification data, thens follow the steps 210, if modification number is not more than first threshold, judges the data segment Belong to non-frequent modification data, thens follow the steps 204.
Specifically, whether belonging to the type frequently changed for data, can be remembered by timer by setting up timer In the preset time period of record (5 minutes or 10 minutes), the number that data segment is written into or reads is judged, if data segment is write The number for entering or reading is more than first threshold (such as 10 times), then judges that the data segment belongs to frequently modification data, otherwise belong to non- Frequently modification data.
204, judge whether the currently stored bandwidth of performance layer more than bandwidth threshold thens follow the steps 205 if being not more than, If more than thening follow the steps 209;
In data processing, CPU and I/O resource are occupied in order to be further reduced data deduplication and compression, is deposited to original The influence of the data service of storage system, compression set can also further judge the currently stored bandwidth of performance layer, i.e., Judge whether the currently stored bandwidth of performance layer is more than bandwidth threshold, if no more than thening follow the steps 205, if more than then executing Step 209.
It should be noted that there is no stringent sequence limitation between step 203 and 204 in the present embodiment, it both can be first 203 are executed, then executes 204;Also to first carry out 204, then 203 are executed;Or the configuration according to system, to step 203 and step 204 carry out selective execution, and the IO performances of even storage system are poor, then can be performed simultaneously with step 203 and step 204, if The IO performances of storage system are stronger, then can select an execution to step 203 and step 204.
Bandwidth of memory (memory bandwidth), refers to the information content that memory is accessed in the unit interval, also referred to as For the digit or byte number of memory reading/write-in within the unit interval, message transmission rate technical indicator (unit is embodied: Bps, bit/second or Bytes/s, byte per second), wherein the bandwidth threshold of memory determines the machine centered on memory The transmission speed for obtaining information, as bandwidth uses B usmIt indicates, if the storage period is tm, n byte of each read/write, then its Bandwidth
Even storage cycle is 500ns, and each storage cycle may have access to 16, then its bandwidth is 32M/s.
Therefore CPU and I/O resource are occupied in order to reduce data deduplication and compression, to the shadow of the data service of original storage system It rings, after the modification number to data segment judges, further the currently stored bandwidth of performance layer can be judged, Judge whether the currently stored bandwidth of buffer memory device is more than bandwidth threshold, if being not more than bandwidth threshold, performance layer can provide I/O resource is compressed for data deduplication, if more than bandwidth threshold, is thened follow the steps 209, is occupied with reducing data deduplication and compression CPU and I/O resource, the influence to the data service of original storage system.
Specifically, the bandwidth threshold of buffer memory device can be the bandwidth peak of buffer memory device, or bandwidth peak 80% or 60%, it is not particularly limited herein for the size of bandwidth threshold.
205, data segment is divided into the data block of presetting granularity, and calculate the fingerprint of data block;
206, the fingerprint base for inquiring capacity layer judges to whether there is the fingerprint in fingerprint base, and if it exists, then follow the steps 207, if being not present, then follow the steps 208;
It should be noted that step 205 in the present embodiment is to 206 and the step 103 in embodiment described in Fig. 1 to 104 Similar, details are not described herein again.
207, it determines that the data block is duplicate data, and the metadata information of the data block is written back to first number of capacity layer According to region, metadata information includes sequence, the physical storage address of data block and the length of data block of the data block in data segment Degree;
If there are the fingerprint of data block in capacity layer, illustrate that the content of the data block in capacity layer is to repeat, then will The metadata information of data block is written back to the metadata area of capacity layer, and metadata information includes that data block is suitable in data segment The length of sequence, the physical storage address of data block and data block.It should be noted that in order to save metadata area in capacity layer Memory space, can by reduction length coding come indicate duplicate removal compression after data block length, if data block compression before Length be 16K, become 12K after duplicate removal compression, then the length of data block can be aligned as unit of 4KB, respectively with 0,1, 2,4KB, 8KB, 12KB and 16KB are indicated 3, then can indicates data block after compression by number 2 in metadata area Length, and without by 12KB come the length of data block after recording compressed because in capacity layer being remembered by binary system Data are recorded, therefore 12KB is converted into the occupied digit of binary system and is much larger than digital 2 occupied digits, so being grown by compressing Degree encodes to indicate that the length of data block after compressing can save the space of metadata area in capacity layer, to improve capacity layer IO performances.
It is understood that indicating the length of data block after compressing with reduction length coding, then metadata includes data The physical storage address and and reduction length coding of sequence of the block in data segment, data block.
208, squeeze operation is executed to data block, and compressed data block is written back to the data area of capacity layer, and The fingerprint of the metadata information of data block after compression and former data block is updated in fingerprint base, the metadata information includes: The compressed physical storage address of data block and the compressed length of data block;
After step 206, if the fingerprint of the data block is not present in the fingerprint base of capacity layer, illustrate that the data block exists Belong to new data block in capacity layer, then squeeze operation is executed to the data block, and compressed data block is stored to capacity layer Data area, and the fingerprint of the metadata information of data block after compression and meta data block (data block before compression) is updated to In fingerprint base, metadata information includes:The compressed object storage address of data block and the compressed length of data block, in order to Later stage restores the content of former data block according to metadata information.
Specifically, the decompression procedure of data block is referred to Huffman compression algorithm described in the prior or LZ compressions Algorithm, details are not described herein again.
209, directly by the data area of the write back data in performance layer to capacity layer.
In step 201, if the current storage capacity of performance layer is more than secure threshold, in order to reduce duplicate removal compression occupancy property The I/O resource of ergosphere, and the data in performance layer are quickly written back in capacity layer, to increase the storage of performance layer The duplicate removal to data and compression are then abandoned in space, directly by the data area of the write back data in performance layer to capacity layer.
In step 203, if the modification number of data segment is more than first threshold, in order to reduce data, subsequently update is brought The data segment is then directly written back to the data area of capacity layer by fragmentation of data problem, to reduce the difficulty in follow-up data space Degree and IO expenses.
In step 204, CPU and I/O resource are occupied in order to be further reduced data deduplication and compression, is to original storage The influence of the data service of system, compression set are then directly put when judging that the currently stored band of performance layer is wider than bandwidth threshold The duplicate removal to data in performance layer and compression are abandoned, and directly by the data area of the write back data in performance layer to capacity layer.
In the embodiment of the present invention, the storage data compression device of full flash array first judges the current storage capacity of performance layer Whether secure threshold is more than, and when memory capacity is not more than secure threshold, the data segment of preset length in reading performance layer, and Data segment is divided into data block, calculates the fingerprint of data block, further in the fingerprint base of capacity layer there are when the fingerprint, really It is duplicate data to determine data block, and the metadata of the data block is stored to the capacity layer of full flash array, because the compression fills It sets when the memory capacity in performance layer is not up to secure threshold, in real time by the write back data in performance layer to capacity layer, thus The memory space that bigger is provided for the performance layer of full flash array improves the response speed of full flash array performance layer, protects Hindering full flash array performance layer has a preferably write-in bandwidth and time delay, at the same will store after the compression of the data block duplicate removal of performance layer to The capacity layer of full flash array, further improves the memory space of capacity layer, to improve the space utilization of storage system Rate and storage efficiency.
Secondly, the data segment modification number of compression set in the present embodiment also in performance layer be more than first threshold and/ Or the currently stored band of performance layer abandons the duplicate removal compression to data in performance layer, to reduce duplicate removal when being wider than bandwidth threshold Compression occupies the I/O resource of performance layer, to further improve the IO performances of storage system.
Described above is the storage data compression methods of the full flash array in the embodiment of the present invention, this hair will be described below The storage data compression device of full flash array in bright embodiment, referring to Fig. 4, full flash array in the embodiment of the present invention Storing data compression device includes:
First judging unit 401, for judging whether the current storage capacity of the performance layer is more than secure threshold;
Reading unit 402, for when no more than the secure threshold, reading the data of preset length in the performance layer Section;
Computing unit 403, the data block for the data segment to be divided into presetting granularity, and calculate the data block Fingerprint;
Inquiry judging unit 404, the fingerprint base for inquiring the capacity layer judge to whether there is institute in the fingerprint base State fingerprint;
Duplicate removal unit 405, for when there are the fingerprint, determining that the data block is duplicate data, and by data block Metadata information is written back to the metadata area of capacity layer, and metadata information includes sequence, data of the data block in data segment The physical storage address of block and the length of data block..
It should be noted that in the present embodiment in embodiment described in the effect of each unit and Fig. 1 full flash array storage The effect of data compression device is similar, and details are not described herein again.
In the embodiment of the present invention, the first judging unit 401 first judges whether the current storage capacity of performance layer is more than safety Threshold value, and when memory capacity is not more than secure threshold, pass through the data of preset length in 402 reading performance layer of reading unit Section, and data segment is divided into data block, the fingerprint of data block is calculated, further there are the fingerprints in the fingerprint base of capacity layer When, determine that the data block is duplicate data by duplicate removal unit 405, and by after duplicate removal data block and metadata store to complete and dodge The capacity layer of array is deposited, because the compression set is when the memory capacity in performance layer is not up to secure threshold, in real time by performance Write back data in layer, to provide the memory space of bigger for the performance layer of full flash array, improves complete to capacity layer The response speed of flash array performance layer ensures that full flash array performance layer has a preferably write-in bandwidth and time delay, while by property The memory space that capacity layer is further improved to the capacity layer of full flash array is stored after the data block duplicate removal compression of ergosphere, To improve the IO performances of storage system.
Based on Fig. 4 the embodiment described, the storage data of the full flash array in the embodiment of the present invention are described below in detail Compression set, referring to Fig. 5, in the embodiment of the present invention full flash array storage data compression device another embodiment, Including:
First judging unit 501, for judging whether the current storage capacity of the performance layer is more than secure threshold;
Reading unit 502, for when no more than the secure threshold, reading the data of preset length in the performance layer Section;
Computing unit 503, the data block for the data segment to be divided into presetting granularity, and calculate the data block Fingerprint;
Inquiry judging unit 504, the fingerprint base for inquiring the capacity layer judge to whether there is institute in the fingerprint base State fingerprint;
Duplicate removal unit 505, for when there are the fingerprint, it is determined that the data block is duplicate data, and by data block Metadata information be written back to the metadata area of capacity layer, metadata information includes sequence of the data block in data segment, number According to the physical storage address of block and the length of data block.
Preferably, which further includes:
Second judgment unit 506, for judging whether the modification number of data segment is more than first threshold;
First trigger element 507, for when no more than first threshold, then data segment to be divided into presetting granularity by triggering The step of data block.
First write back unit 508, for when more than first threshold, data segment to be directly written back to the data field of capacity layer Domain.
Preferably, which further includes:
Third judging unit 509, for judging whether the currently stored bandwidth of performance layer is more than bandwidth threshold;
Second trigger element 510, for when no more than bandwidth threshold, data segment to be divided into the number of presetting granularity by triggering The step of according to block;
Second write back unit 511, for when more than bandwidth threshold, data segment to be directly written back to the data field of capacity layer Domain.
Preferably, duplicate removal unit 505 further includes:
Mark module 5051, for the length of the data block after reduction length coded representation duplicate removal;
Metadata includes sequence of the data block in data segment, the physical storage address of data block and reduction length coding.
Preferably, which further includes:
Third write back unit 512 is used for when current storage capacity is more than secure threshold, directly by the data in performance layer It is written back to the data area of capacity layer;
4th write back unit 513 when for fingerprint to be not present in fingerprint base, executes squeeze operation, and will to data block Compressed data block is written back to the data area of capacity layer, and by the metadata information of data block after compression and former data block Fingerprint is updated in fingerprint base, and the metadata information includes:The compressed physical storage address of data block and data block compression Length afterwards.In the embodiment of the present invention, the first judging unit 501 first judges whether the current storage capacity of performance layer is more than safety Threshold value, and when memory capacity is not more than secure threshold, pass through the data of preset length in 502 reading performance layer of reading unit Section, and data segment is divided into data block, the fingerprint of data block is calculated, further there are the fingerprints in the fingerprint base of capacity layer When, determine that the data block belongs to duplicate data by duplicate removal unit 505, and the metadata of the data block is stored to full flash memory battle array The capacity layer of row in real time will be in performance layer because the compression set is when the memory capacity in performance layer is not up to secure threshold Write back data to capacity layer, to provide the memory space of bigger for the performance layer of full flash array, improve full flash memory The response speed of array performance layer ensures that full flash array performance layer has a preferably write-in bandwidth and time delay, while by performance layer Data block duplicate removal compression after store the memory space that capacity layer is further improved to the capacity layer of full flash array, to Improve the space availability ratio and storage efficiency of storage system.
Secondly, the data segment modification number of compression set in the present embodiment also in performance layer be more than first threshold and/ Or the currently stored band of performance layer passes through the first write back unit 508, the second write back unit 511 and when being wider than bandwidth threshold Three write back units 512 abandon the duplicate removal compression to data in performance layer, to reduce the I/O resource that duplicate removal compression occupies performance layer, from And further improve the IO performances of storage system.
Storage data compression from the angle of modular functionality entity to the full flash array in the embodiment of the present invention above Device is described, and the computer installation in the embodiment of the present invention is described from the angle of hardware handles below:
The computer installation for realizing the storage data compression device of full flash array function, in the embodiment of the present invention Computer installation one embodiment includes:
Processor and memory;
Memory can when processor is used to execute the computer program stored in memory for storing computer program To realize following steps:
Judge whether the current storage capacity of performance layer is more than secure threshold;
If being not more than, the data segment of preset length in reading performance layer;
Data segment is divided into the data block of presetting granularity, and calculates the fingerprint of data block;
The fingerprint base for inquiring capacity layer judges to whether there is fingerprint in fingerprint base;
If there are fingerprint, determine that the data block is duplicate data, and the metadata information of data block is written back to capacity layer Metadata area, metadata information includes the physical storage address and data of sequence of the data block in data segment, data block The length of block.
In some embodiments of the invention, processor can be also used for realizing following steps:
Judge whether the modification number of data segment is more than first threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered.
If more than data segment to be then directly written back to the data area of capacity layer.
In some embodiments of the invention, processor can be also used for realizing following steps:
Judge whether the currently stored bandwidth of performance layer is more than bandwidth threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered;
If more than data segment to be then directly written back to the data area of capacity layer.
In some embodiments of the invention, processor can also be specifically used for realizing following steps:
With the length of the data block after reduction length coded representation duplicate removal;
Metadata includes sequence of the data block in data segment, the physical storage address of data block and reduction length coding.
In some embodiments of the invention, processor can be also used for realizing following steps:
If current storage capacity is more than secure threshold, directly by the data field of the write back data in performance layer to capacity layer Domain;
If fingerprint is not present in fingerprint base, squeeze operation is executed to data block, and compressed data block is written back to appearance The data area of layer is measured, and the fingerprint of the metadata information of data block after compression and former data block is updated in fingerprint base, institute Stating metadata information includes:The compressed physical storage address of data block and the compressed length of data block.
It is understood that when the processor in the computer installation of above description executes the computer program, also may be used To realize the function of each unit in above-mentioned corresponding each device embodiment, details are not described herein again.Illustratively, the computer journey Sequence can be divided into one or more module/units, and one or more of module/units are stored in the memory In, and executed by the processor, to complete the present invention.One or more of module/units can be can complete it is specific The series of computation machine program instruction section of function, the instruction segment is for describing the computer program in the full flash array Store the implementation procedure of data compression device.For example, the computer program can be divided into depositing for above-mentioned full flash array The each unit in data compression device is stored up, the storage data compression device such as above-mentioned corresponding full flash array may be implemented in each unit The concrete function of explanation.
The computer installation can be that the calculating such as desktop PC, notebook, palm PC and cloud server are set It is standby.The computer installation may include but be not limited only to processor, memory.It will be understood by those skilled in the art that processor, Memory is only the example of computer installation, does not constitute the restriction to computer installation, may include more or fewer Component either combines certain components or different components, such as the computer installation can also be set including input and output Standby, network access equipment, bus etc..
The processor can be central processing unit (Central Processing Unit, CPU), can also be it His general processor, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor Deng the processor is the control centre of the computer installation, utilizes various interfaces and the entire computer installation of connection Various pieces.
The memory can be used for storing the computer program and/or module, and the processor is by running or executing Computer program in the memory and/or module are stored, and calls the data being stored in memory, described in realization The various functions of computer installation.The memory can include mainly storing program area and storage data field, wherein storage program It area can storage program area, the application program etc. needed at least one function;Storage data field can store the use according to terminal The data etc. created.In addition, memory may include high-speed random access memory, can also include non-volatile memories Device, such as hard disk, memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card), at least one disk memory, flush memory device or other volatibility are solid State memory device.
The present invention also provides a kind of computer readable storage medium, which dodges for realizing complete The function of depositing the storage data compression device of array, is stored thereon with computer program, when computer program is executed by processor, Processor can be used for executing following steps:
Judge whether the current storage capacity of performance layer is more than secure threshold;
If being not more than, the data segment of preset length in reading performance layer;
Data segment is divided into the data block of presetting granularity, and calculates the fingerprint of data block;
The fingerprint base for inquiring capacity layer judges to whether there is fingerprint in fingerprint base;
If there are fingerprint, deduplication operation is executed to data block, and the metadata information of data block is written back to capacity layer Metadata area, metadata information include the physical storage address and data block of sequence of the data block in data segment, data block Length.In some embodiments of the invention, the computer program of computer-readable recording medium storage is executed by processor When, processor can be also used for executing following steps:
Judge whether the modification number of data segment is more than first threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered.
If more than data segment to be then directly written back to the data area of capacity layer.
In some embodiments of the invention, the computer program of computer-readable recording medium storage is executed by processor When, processor can be also used for executing following steps:
Judge whether the currently stored bandwidth of performance layer is more than bandwidth threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered;
If more than data segment to be then directly written back to the data area of capacity layer.
In some embodiments of the invention, the computer program of computer-readable recording medium storage is executed by processor When, processor can be also used for specifically executing following steps:
With the length of the compressed data block of reduction length coded representation duplicate removal;
Metadata includes sequence of the data block in data segment, the physical storage address of data block and reduction length coding.
In some embodiments of the invention, the computer program of computer-readable recording medium storage is executed by processor When, processor can be also used for executing following steps:
If current storage capacity is more than secure threshold, directly by the data field of the write back data in performance layer to capacity layer Domain;
If fingerprint is not present in fingerprint base, the fingerprint of the data block is updated in fingerprint base, pressure is executed to data block Contracting operates, and compressed data block and corresponding metadata information are written back to data area and the metadata of capacity layer respectively Region, metadata information include:After the compressed physical storage address of sequence, data block and data block compression inside data block Length.It is understood that if the integrated unit is realized in the form of SFU software functional unit and as independent production Product are sold or in use, can be stored in a corresponding computer read/write memory medium.Based on this understanding, this hair The bright all or part of flow realized in above-mentioned corresponding embodiment method, can also be instructed relevant by computer program Hardware is completed, and the computer program can be stored in a computer readable storage medium, which is being located It manages when device executes, it can be achieved that the step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer program generation Code, the computer program code can be source code form, object identification code form, executable file or certain intermediate forms Deng.The computer-readable medium may include:Any entity or device, record of the computer program code can be carried Medium, USB flash disk, mobile hard disk, magnetic disc, CD, computer storage, read-only memory (ROM, Read-Only Memory), with Machine accesses memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc.. It should be noted that the content that the computer-readable medium includes can be according to legislation and patent practice in jurisdiction It is required that carrying out increase and decrease appropriate, such as in certain jurisdictions, do not wrapped according to legislation and patent practice, computer-readable medium Include electric carrier signal and telecommunication signal.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be the indirect coupling by some interfaces, device or unit It closes or communicates to connect, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can be stored in a computer read/write memory medium.Based on this understanding, technical scheme of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present invention Portion or part steps.And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to before Stating embodiment, invention is explained in detail, it will be understood by those of ordinary skill in the art that:It still can be to preceding The technical solution recorded in each embodiment is stated to modify or equivalent replacement of some of the technical features;And these Modification or replacement, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution.

Claims (12)

1. a kind of storage data compression method of full flash array, the full flash array includes performance layer and capacity layer, spy Sign is, the method includes:
Judge whether the current storage capacity of the performance layer is more than secure threshold;
If being not more than, the data segment of preset length in the performance layer is read;
The data segment is divided into the data block of presetting granularity, and calculates the fingerprint of the data block;
The fingerprint base for inquiring the capacity layer judges to whether there is the fingerprint in the fingerprint base;
If there are the fingerprints, it is determined that the data block is duplicate data, and by the metadata information write-back of the data block To the metadata area of the capacity layer, the metadata information includes sequence of the data block in the data segment, institute State the physical storage address of data block and the length of the data block.
2. according to the method described in claim 1, it is characterized in that, in reading the performance layer preset length data segment it Afterwards, the method further includes:
Judge whether the modification number of the data segment is more than first threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered;
If more than the data segment to be then directly written back to the data area of the capacity layer.
3. according to the method described in claim 1, it is characterized in that, in reading the performance layer preset length data segment it Afterwards, the method further includes:
Judge whether the currently stored bandwidth of the performance layer is more than bandwidth threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered;
If more than the data segment to be then directly written back to the data area of the capacity layer.
4. according to the method in any one of claims 1 to 3, which is characterized in that the method further includes:
With the length of the compressed data block of reduction length coded representation duplicate removal;
The metadata includes sequence of the data block in the data segment, the physical storage address of the data block and institute State the reduction length coding of data block.
5. according to the method described in claim 4, it is characterized in that, the method further includes:
If current storage capacity is more than the secure threshold, directly by the write back data in the performance layer to the capacity layer Data area;
If the fingerprint is not present, squeeze operation is executed to the data block, and compressed data block is written back to the appearance The data area of layer is measured, and the fingerprint of the metadata information of data block after compression and the data block is updated to the fingerprint base In, the metadata information includes:The compressed physical storage address of data block and the compressed length of the data block.
6. a kind of storage data compression device of full flash array, the full flash array includes performance layer and capacity layer, spy Sign is that described device includes:
First judging unit, for judging whether the current storage capacity of the performance layer is more than secure threshold;
Reading unit, for when no more than the secure threshold, reading the data segment of preset length in the performance layer;
Computing unit, the data block for the data segment to be divided into presetting granularity, and calculate the fingerprint of the data block;
Inquiry judging unit, the fingerprint base for inquiring the capacity layer judge to whether there is the fingerprint in the fingerprint base;
Duplicate removal unit, for when there are the fingerprint, it is determined that the data block is duplicate data, and by the data block Metadata information is written back to the metadata area of the capacity layer, and the metadata information includes the data block in the data Sequence, the physical storage address of the data block and the length of the data block in section.
7. device according to claim 6, which is characterized in that described device further includes:
Second judgment unit, for judging whether the modification number of the data segment is more than first threshold;
First trigger element, for when no more than the first threshold, then the data segment to be divided into presetting granularity by triggering Data block the step of;
First write back unit, for when more than the first threshold, the data segment to be directly written back to the capacity layer Data area.
8. device according to claim 6, which is characterized in that described device further includes:
Third judging unit, for judging whether the currently stored bandwidth of the performance layer is more than bandwidth threshold;
Second trigger element, for when no more than the bandwidth threshold, the data segment to be divided into presetting granularity by triggering The step of data block;
Second write back unit, for when more than the bandwidth threshold, the data segment to be directly written back to the capacity layer Data area.
9. the device according to any one of claim 6 to 8, which is characterized in that the duplicate removal compression unit further includes:
Mark module, for the length of the compressed data block of reduction length coded representation duplicate removal;
The metadata includes sequence of the data block in the data segment, the physical storage address of the data block and institute State the reduction length coding of data block.
10. device according to claim 9, which is characterized in that described device further includes:
Third write back unit is used for when current storage capacity is more than the secure threshold, directly by the number in the performance layer According to the data area for being written back to the capacity layer;
When for the fingerprint to be not present in the fingerprint base, squeeze operation is executed to the data block for 4th write back unit, And compressed data block is written back to the data area of the capacity layer, and by the metadata information of data block after compression and institute The fingerprint for stating data block is updated in the fingerprint base, and the metadata information includes:The compressed physics of data block is deposited Store up address and the compressed length of the data block.
11. a kind of computer installation, including processor, which is characterized in that the processor is stored in execution on memory When computer program, for realizing the storage data compression side of the full flash array as described in any one of claim 1 to 5 Method.
12. a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that the computer program is performed When, for realizing the storage data compression method of the full flash array as described in any one of claim 1 to 5.
CN201810214771.9A 2018-03-15 2018-03-15 Storage data compression method and device of full flash memory array and readable storage medium Active CN108427538B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810214771.9A CN108427538B (en) 2018-03-15 2018-03-15 Storage data compression method and device of full flash memory array and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810214771.9A CN108427538B (en) 2018-03-15 2018-03-15 Storage data compression method and device of full flash memory array and readable storage medium

Publications (2)

Publication Number Publication Date
CN108427538A true CN108427538A (en) 2018-08-21
CN108427538B CN108427538B (en) 2021-06-04

Family

ID=63158230

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810214771.9A Active CN108427538B (en) 2018-03-15 2018-03-15 Storage data compression method and device of full flash memory array and readable storage medium

Country Status (1)

Country Link
CN (1) CN108427538B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109445713A (en) * 2018-11-09 2019-03-08 郑州云海信息技术有限公司 A kind of storage state recording method, system and the associated component of metadata volume
CN109814809A (en) * 2019-01-14 2019-05-28 杭州宏杉科技股份有限公司 Data compression method and apparatus
CN110018792A (en) * 2019-04-10 2019-07-16 苏州浪潮智能科技有限公司 One kind is to rule data processing method, device, electronic equipment and storage medium
CN110209640A (en) * 2019-06-06 2019-09-06 四川长虹电器股份有限公司 The method of switching at runtime lz4 compression algorithm type under cell phone system operating status
CN110377226A (en) * 2019-06-10 2019-10-25 平安科技(深圳)有限公司 Compression method, device and storage medium based on storage engines bluestore
CN110618789A (en) * 2019-08-14 2019-12-27 华为技术有限公司 Method and device for deleting repeated data
CN111079917A (en) * 2018-10-22 2020-04-28 北京地平线机器人技术研发有限公司 Tensor data block access method and device
CN111124940A (en) * 2018-10-31 2020-05-08 深信服科技股份有限公司 Space recovery method and system based on full flash memory array
CN111124259A (en) * 2018-10-31 2020-05-08 深信服科技股份有限公司 Data compression method and system based on full flash memory array
CN111125033A (en) * 2018-10-31 2020-05-08 深信服科技股份有限公司 Space recovery method and system based on full flash memory array
CN111124939A (en) * 2018-10-31 2020-05-08 深信服科技股份有限公司 Data compression method and system based on full flash memory array
CN111198857A (en) * 2018-10-31 2020-05-26 深信服科技股份有限公司 Data compression method and system based on full flash memory array
CN111831480A (en) * 2020-06-17 2020-10-27 华中科技大学 Layered coding method and device based on duplicate removal system and duplicate removal system
CN112306974A (en) * 2019-07-30 2021-02-02 深信服科技股份有限公司 Data processing method, device, equipment and storage medium
CN113467699A (en) * 2020-03-30 2021-10-01 华为技术有限公司 Method and device for improving available storage capacity
CN113590051A (en) * 2021-09-29 2021-11-02 阿里云计算有限公司 Data storage and reading method and device, electronic equipment and medium
CN114003169A (en) * 2021-08-02 2022-02-01 固存芯控半导体科技(苏州)有限公司 Data compression method for SSD
CN114866483A (en) * 2022-03-25 2022-08-05 新华三大数据技术有限公司 Data compression flow control method and device and electronic equipment

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080005141A1 (en) * 2006-06-29 2008-01-03 Ling Zheng System and method for retrieving and using block fingerprints for data deduplication
CN102831222A (en) * 2012-08-24 2012-12-19 华中科技大学 Differential compression method based on data de-duplication
CN102982122A (en) * 2012-11-13 2013-03-20 浪潮电子信息产业股份有限公司 Repeating data deleting method suitable for mass storage system
CN103473266A (en) * 2013-08-09 2013-12-25 记忆科技(深圳)有限公司 Solid state disk and method for deleting repeating data thereof
CN103502957A (en) * 2012-12-28 2014-01-08 华为技术有限公司 Data processing method and device
WO2014037767A1 (en) * 2012-09-05 2014-03-13 Indian Institute Of Technology, Kharagpur Multi-level inline data deduplication
CN103873506A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 Data block duplication removing system in storage cluster and method thereof
CN103914516A (en) * 2014-02-25 2014-07-09 深圳市中博科创信息技术有限公司 Method and system for layer-management of storage system
CN104462388A (en) * 2014-12-10 2015-03-25 上海爱数软件有限公司 Redundant data cleaning method based on cascade storage media
US20150088945A1 (en) * 2013-09-25 2015-03-26 Nec Laboratories America, Inc. Adaptive compression supporting output size thresholds
CN105094709A (en) * 2015-08-27 2015-11-25 浪潮电子信息产业股份有限公司 Dynamic data compression method for solid-state disc storage system
CN105787037A (en) * 2016-02-25 2016-07-20 浪潮(北京)电子信息产业有限公司 Repeated data deleting method and device
CN106055271A (en) * 2016-05-17 2016-10-26 浪潮(北京)电子信息产业有限公司 Method and device for de-repetition selection of repeated data based on cloud computing
US20170192712A1 (en) * 2015-12-30 2017-07-06 Nutanix, Inc. Method and system for implementing high yield de-duplication for computing applications
CN107193498A (en) * 2017-05-25 2017-09-22 山东浪潮商用系统有限公司 A kind of method and device that data are carried out with deduplication processing
CN107682016A (en) * 2017-09-26 2018-02-09 深信服科技股份有限公司 A kind of data compression method, data decompression method and related system

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080005141A1 (en) * 2006-06-29 2008-01-03 Ling Zheng System and method for retrieving and using block fingerprints for data deduplication
CN102831222A (en) * 2012-08-24 2012-12-19 华中科技大学 Differential compression method based on data de-duplication
WO2014037767A1 (en) * 2012-09-05 2014-03-13 Indian Institute Of Technology, Kharagpur Multi-level inline data deduplication
CN102982122A (en) * 2012-11-13 2013-03-20 浪潮电子信息产业股份有限公司 Repeating data deleting method suitable for mass storage system
CN103873506A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 Data block duplication removing system in storage cluster and method thereof
CN103502957A (en) * 2012-12-28 2014-01-08 华为技术有限公司 Data processing method and device
CN103473266A (en) * 2013-08-09 2013-12-25 记忆科技(深圳)有限公司 Solid state disk and method for deleting repeating data thereof
US20150088945A1 (en) * 2013-09-25 2015-03-26 Nec Laboratories America, Inc. Adaptive compression supporting output size thresholds
CN103914516A (en) * 2014-02-25 2014-07-09 深圳市中博科创信息技术有限公司 Method and system for layer-management of storage system
CN104462388A (en) * 2014-12-10 2015-03-25 上海爱数软件有限公司 Redundant data cleaning method based on cascade storage media
CN105094709A (en) * 2015-08-27 2015-11-25 浪潮电子信息产业股份有限公司 Dynamic data compression method for solid-state disc storage system
US20170192712A1 (en) * 2015-12-30 2017-07-06 Nutanix, Inc. Method and system for implementing high yield de-duplication for computing applications
CN105787037A (en) * 2016-02-25 2016-07-20 浪潮(北京)电子信息产业有限公司 Repeated data deleting method and device
CN106055271A (en) * 2016-05-17 2016-10-26 浪潮(北京)电子信息产业有限公司 Method and device for de-repetition selection of repeated data based on cloud computing
CN107193498A (en) * 2017-05-25 2017-09-22 山东浪潮商用系统有限公司 A kind of method and device that data are carried out with deduplication processing
CN107682016A (en) * 2017-09-26 2018-02-09 深信服科技股份有限公司 A kind of data compression method, data decompression method and related system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
夏文: "数据备份系统中冗余数据的高性能消除技术研究", 《中国博士学位论文全文数据库 信息科技辑》 *
韩帅军: "面向归档存储的重复数据删除优化方法研究", 《中国优秀硕士学位论文全文数据库 信息科学辑》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079917B (en) * 2018-10-22 2023-08-11 北京地平线机器人技术研发有限公司 Tensor data block access method and device
CN111079917A (en) * 2018-10-22 2020-04-28 北京地平线机器人技术研发有限公司 Tensor data block access method and device
CN111124939A (en) * 2018-10-31 2020-05-08 深信服科技股份有限公司 Data compression method and system based on full flash memory array
CN111125033B (en) * 2018-10-31 2024-04-09 深信服科技股份有限公司 Space recycling method and system based on full flash memory array
CN111124940B (en) * 2018-10-31 2022-03-22 深信服科技股份有限公司 Space recovery method and system based on full flash memory array
CN111198857A (en) * 2018-10-31 2020-05-26 深信服科技股份有限公司 Data compression method and system based on full flash memory array
CN111124940A (en) * 2018-10-31 2020-05-08 深信服科技股份有限公司 Space recovery method and system based on full flash memory array
CN111124259A (en) * 2018-10-31 2020-05-08 深信服科技股份有限公司 Data compression method and system based on full flash memory array
CN111125033A (en) * 2018-10-31 2020-05-08 深信服科技股份有限公司 Space recovery method and system based on full flash memory array
CN109445713A (en) * 2018-11-09 2019-03-08 郑州云海信息技术有限公司 A kind of storage state recording method, system and the associated component of metadata volume
CN109814809B (en) * 2019-01-14 2022-03-11 杭州宏杉科技股份有限公司 Data compression method and device
CN109814809A (en) * 2019-01-14 2019-05-28 杭州宏杉科技股份有限公司 Data compression method and apparatus
CN110018792A (en) * 2019-04-10 2019-07-16 苏州浪潮智能科技有限公司 One kind is to rule data processing method, device, electronic equipment and storage medium
CN110209640A (en) * 2019-06-06 2019-09-06 四川长虹电器股份有限公司 The method of switching at runtime lz4 compression algorithm type under cell phone system operating status
CN110377226A (en) * 2019-06-10 2019-10-25 平安科技(深圳)有限公司 Compression method, device and storage medium based on storage engines bluestore
WO2020248493A1 (en) * 2019-06-10 2020-12-17 平安科技(深圳)有限公司 Compression method and device based on storage engine bluestore, and storage medium
CN112306974A (en) * 2019-07-30 2021-02-02 深信服科技股份有限公司 Data processing method, device, equipment and storage medium
CN110618789A (en) * 2019-08-14 2019-12-27 华为技术有限公司 Method and device for deleting repeated data
CN113467699A (en) * 2020-03-30 2021-10-01 华为技术有限公司 Method and device for improving available storage capacity
CN113467699B (en) * 2020-03-30 2023-08-22 华为技术有限公司 Method and device for improving available storage capacity
CN111831480A (en) * 2020-06-17 2020-10-27 华中科技大学 Layered coding method and device based on duplicate removal system and duplicate removal system
CN111831480B (en) * 2020-06-17 2024-04-19 华中科技大学 Layered coding method and device based on deduplication system and deduplication system
CN114003169A (en) * 2021-08-02 2022-02-01 固存芯控半导体科技(苏州)有限公司 Data compression method for SSD
CN114003169B (en) * 2021-08-02 2024-04-16 固存芯控半导体科技(苏州)有限公司 Data compression method for SSD
CN113590051A (en) * 2021-09-29 2021-11-02 阿里云计算有限公司 Data storage and reading method and device, electronic equipment and medium
CN114866483A (en) * 2022-03-25 2022-08-05 新华三大数据技术有限公司 Data compression flow control method and device and electronic equipment
CN114866483B (en) * 2022-03-25 2023-10-03 新华三大数据技术有限公司 Data compression flow control method and device and electronic equipment

Also Published As

Publication number Publication date
CN108427538B (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN108427538A (en) Storage data compression method, device and the readable storage medium storing program for executing of full flash array
CN108427539A (en) Offline duplicate removal compression method, device and the readable storage medium storing program for executing of buffer memory device data
CN108415669A (en) The data duplicate removal method and device of storage system, computer installation and storage medium
CN105204781B (en) Compression method, device and equipment
CN107046812B (en) Data storage method and device
CN103098035B (en) Storage system
EP3316150B1 (en) Method and apparatus for file compaction in key-value storage system
WO2018033035A1 (en) Solid-state drive control device and solid-state drive data access method based on learning
CN103870514B (en) Data de-duplication method and device
CN110377226B (en) Compression method and device based on storage engine bluestore and storage medium
CN105824881B (en) A kind of data de-duplication data placement method based on load balancing
CN103353850B (en) Virtual machine thermal migration memory processing method, device and system
CN107506153A (en) A kind of data compression method, data decompression method and related system
CN111125033B (en) Space recycling method and system based on full flash memory array
CN107682016A (en) A kind of data compression method, data decompression method and related system
CN103152430B (en) A kind of reduce the cloud storage method that data take up room
CN110347643B (en) Method and device for cloning NTFS (New technology File System) volume between disks
CN102970043A (en) GZIP (GNUzip)-based hardware compressing system and accelerating method thereof
CN110941514B (en) Data backup method, data recovery method, computer equipment and storage medium
CN106569750A (en) Data compression method and device
CN111124940B (en) Space recovery method and system based on full flash memory array
CN110083487A (en) A kind of reference data block fragment removing method and system based on data locality
CN111124939A (en) Data compression method and system based on full flash memory array
CN111061428B (en) Data compression method and device
CN103810297A (en) Writing method, reading method, writing device and reading device on basis of re-deleting technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant