CN108427538A - Storage data compression method, device and the readable storage medium storing program for executing of full flash array - Google Patents
Storage data compression method, device and the readable storage medium storing program for executing of full flash array Download PDFInfo
- Publication number
- CN108427538A CN108427538A CN201810214771.9A CN201810214771A CN108427538A CN 108427538 A CN108427538 A CN 108427538A CN 201810214771 A CN201810214771 A CN 201810214771A CN 108427538 A CN108427538 A CN 108427538A
- Authority
- CN
- China
- Prior art keywords
- data
- data block
- layer
- capacity
- fingerprint
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0688—Non-volatile semiconductor memory arrays
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0626—Reducing size or complexity of storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Memory System (AREA)
Abstract
The embodiment of the invention discloses a kind of storage data compression method, device and the readable storage medium storing program for executing of full flash array, the space availability ratio for improving storage system.Present invention method includes:Judge whether the current storage capacity of performance layer is more than secure threshold;If being not more than, the data segment of preset length in reading performance layer;Data segment is divided into the data block of presetting granularity, and calculates the fingerprint of data block;The fingerprint base for inquiring capacity layer judges to whether there is fingerprint in fingerprint base;If there are fingerprints, it is determined that the data block is duplicate data block, and the metadata information of data block is written back to the metadata area of capacity layer, and metadata information includes sequence, the physical storage address of data block and the length of data block of the data block in data segment.The embodiment of the present invention additionally provides a kind of storage data compression device of full flash array, IO performances and storage efficiency for improving storage system.
Description
Technical field
The present invention relates to technical field of data storage more particularly to a kind of storage data compression method of full flash array,
Device and readable storage medium storing program for executing.
Background technology
Full flash array:Flash memory solid-state disk (SSD) is widely used in the caching of mechanical hard disk, such as Ceph and ZFS, this master
There are good random IO performances because of flash memory solid-state disk, and traditional mechanical hard disk shows in terms of the support of random IO performances
Bad, storage system, which disposes full flash memory device, at present becomes popular tendency, with the overall performance of General Promotion storage system.Consider
Cost to SSD is much more expensive than present mechanical hard disk, and simultaneously under cloud computing instantly and virtualized environment, storage system is deposited
In a large amount of repeated and redundant data, the logical memory space of SSD storage systems can be extended by data deduplication and compress technique,
The capacity utilization for promoting SSD achievees the purpose that reduce SSD costs.
In general, the physical structure of full flash array is divided into capacity layer (read buffer) and performance layer (write buffer), generally
(but also not limiting) is formed by pcie SSD and sata SSD respectively, is determined this is mainly due to SSD readwrite performance asymmetry,
General reading rate is far above writing rate, while the durability of pcie SSD ratio sata SSD is stronger, so how to dispose performance layer
Data store strategy between capacity layer ensures full flash array to improve the response speed of full flash array performance layer
Ergosphere has preferably write-in bandwidth and time delay, while promoting the memory space of full flash array capacity layer, becomes current research
Hot spot.
Invention content
An embodiment of the present invention provides a kind of storage data compression method, device and the readable storage mediums of full flash array
Matter, the response speed for improving full flash array performance layer, ensure full flash array performance layer have preferably write-in bandwidth and
Time delay, while the memory space of full flash array capacity layer is promoted, to improve the IO performances of storage system.
First aspect of the embodiment of the present invention provides a kind of storage data compression method of full flash array, full flash array
Including performance layer and capacity layer, this method includes:
Judge whether the current storage capacity of performance layer is more than secure threshold;
If being not more than, the data segment of preset length in reading performance layer;
Data segment is divided into the data block of presetting granularity, and calculates the fingerprint of data block;
The fingerprint base for inquiring capacity layer judges to whether there is fingerprint in fingerprint base;
If there are fingerprints, it is determined that the data block is duplicate data, and the metadata information of data block is written back to capacity
The metadata area of layer, metadata information include the physical storage address and number of sequence of the data block in data segment, data block
According to the length of block.
Preferably, this method further includes:
Judge whether the modification number of data segment is more than first threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered.
If more than data segment to be then directly written back to the data area of capacity layer.
Preferably, this method further includes:
Judge whether the currently stored bandwidth of performance layer is more than bandwidth threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered;
If more than data segment to be then directly written back to the data area of capacity layer.
Preferably, this method further includes:
With the length of the compressed data block of reduction length coded representation duplicate removal;
Metadata includes sequence, logical address and the reduction length coding of data block.
Preferably, this method further includes:
If current storage capacity is more than secure threshold, directly by the data field of the write back data in performance layer to capacity layer
Domain;
If fingerprint is not present in fingerprint base, squeeze operation is executed to data block, and compressed data block is written back to appearance
The data area of layer is measured, and the fingerprint of the metadata information of data block after compression and former data block is updated in fingerprint base, institute
Stating metadata information includes:The compressed physical storage address of data block and the compressed length of data block.
Second aspect of the present invention provides a kind of storage data compression device of full flash array, and full flash array is inclusive
Ergosphere and capacity layer, the device include:
First judging unit, for judging whether the current storage capacity of performance layer is more than secure threshold;
Reading unit is used for when no more than secure threshold, the data segment of preset length in reading performance layer;
Computing unit, the data block for data segment to be divided into presetting granularity, and calculate the fingerprint of data block;
Inquiry judging unit, the fingerprint base for inquiring capacity layer judge to whether there is fingerprint in fingerprint base;
Duplicate removal unit for when there are fingerprint, determining that the data block is duplicate data, and the metadata of data block is believed
Breath is written back to the metadata area of capacity layer, and metadata information includes the physics of sequence of the data block in data segment, data block
The length of storage address and data block.Preferably, which further includes:
Second judgment unit, for judging whether the modification number of data segment is more than first threshold;
First trigger element, for when no more than first threshold, then data segment to be divided into the number of presetting granularity by triggering
The step of according to block.
First write back unit, for when more than first threshold, data segment to be directly written back to the data area of capacity layer.
Preferably, which further includes:
Third judging unit, for judging whether the currently stored bandwidth of performance layer is more than bandwidth threshold;
Second trigger element, for when no more than bandwidth threshold, data segment to be divided into the data of presetting granularity by triggering
The step of block;
Second write back unit, for when more than bandwidth threshold, data segment to be directly written back to the data area of capacity layer.
Preferably, duplicate removal compression unit further includes:
Mark module, for the length of the compressed data block of reduction length coded representation duplicate removal;
Metadata includes sequence, logical address and the reduction length coding of data block.
Preferably, which further includes:
Third write back unit, for when current storage capacity is more than secure threshold, directly returning the data in performance layer
It is written to the data area of capacity layer;
4th write back unit, when for fingerprint to be not present in fingerprint base,
Squeeze operation is executed to data block, and compressed data block is written back to the data area of capacity layer, and will pressure
The metadata information of data block and the fingerprint of former data block are updated in fingerprint base after contracting, and the metadata information includes:Data
The compressed physical storage address of block and the compressed length of data block.
The present invention also provides a kind of computer installation, including processor, which is stored in execution on memory
Computer program when, for realizing following step:
Judge whether the current storage capacity of performance layer is more than secure threshold;
If being not more than, the data segment of preset length in reading performance layer;
Data segment is divided into the data block of presetting granularity, and calculates the fingerprint of data block;
The fingerprint base for inquiring capacity layer judges to whether there is fingerprint in fingerprint base;
If there are fingerprints, it is determined that the data block is duplicate data, and the metadata information of the data block is written back to appearance
Measure the metadata area of layer, metadata information include sequence of the data block in data segment, data block physical storage address and
The length of data block.
The present invention also provides a kind of readable storage medium storing program for executing, are stored thereon with computer program, which is held
When row, for realizing following step:
Judge whether the current storage capacity of performance layer is more than secure threshold;
If being not more than, the data segment of preset length in reading performance layer;
Data segment is divided into the data block of presetting granularity, and calculates the fingerprint of data block;
The fingerprint base for inquiring capacity layer judges to whether there is fingerprint in fingerprint base;
If there are fingerprints, it is determined that the data block is duplicate data, and the metadata information of the data block is written back to appearance
Measure the metadata area of layer, metadata information include sequence of the data block in data segment, data block physical storage address and
The length of data block.As can be seen from the above technical solutions, the embodiment of the present invention has the following advantages:
In the embodiment of the present invention, the storage data compression device of full flash array first judges the current storage capacity of performance layer
Whether secure threshold is more than, and when memory capacity is not more than secure threshold, the data segment of preset length in reading performance layer, and
Data segment is divided into data block, calculates the fingerprint of data block, further in the fingerprint base of capacity layer there are when the fingerprint, really
The fixed data block is duplicate data, and the metadata information of the data block is stored to the capacity layer of full flash array, because should
Compression set is when the memory capacity in performance layer is not up to secure threshold, in real time by the write back data in performance layer to capacity
Layer improves the response of full flash array performance layer to provide the memory space of bigger for the performance layer of full flash array
Speed ensures that full flash array performance layer has preferably write-in bandwidth and time delay, while the data block duplicate removal of performance layer being compressed
The memory space that capacity layer is further improved to the capacity layer of full flash array is stored afterwards, to improve storage system
Space availability ratio and storage efficiency.
Description of the drawings
Fig. 1 is a kind of one embodiment signal of the storage data compression method of full flash array in the embodiment of the present invention
Figure;
Fig. 2 is a kind of another embodiment signal of the storage data compression method of full flash array in the embodiment of the present invention
Figure;
Fig. 3 be the performance layer of full flash array, capacity layer and performance layer data, capacity layer data structural schematic diagram;
Fig. 4 is a kind of one embodiment signal of the storage data compression device of full flash array in the embodiment of the present invention
Figure;
Fig. 5 is a kind of another embodiment signal of the storage data compression device of full flash array in the embodiment of the present invention
Figure.
Specific implementation mode
An embodiment of the present invention provides a kind of storage data compression method, device and the readable storage mediums of full flash array
Matter ensures that full flash array performance layer has preferably write-in band for the response speed for improving full flash array performance layer
Wide and time delay, to further increase the IO performances of storage system.
For convenience of understanding, the expert data in text is explained as follows below:
Data deduplication:Data deduplication is also known as data de-duplication (Data Deduplication), is that a kind of apply is being deposited
The technology for globally identifying and eliminating redundant data in storage system, becomes the hot spot of storage system research in recent years.Data
Secure Hash abstract (such as SHA1 fingerprint) of the duplicate removal by calculating data block come unique identification data block, avoid data by
The matching of a character, and storage system only needs simply to safeguard the concordance list of secure Hash abstract, so that it may it is quick to realize
It easily identifies duplicate data, is with good expansibility;The data content repeated only needs the data pointer of recording responses
Information is that can reach the purpose for saving memory space;So data deduplication technology can not only greatly save memory space to
Improve the resource utilization of storage device.
Data compression:Data compression is also a kind of redundant data technology for eliminating of mainstream, is mainly disappeared by way of coding
Except redundant data information, i.e., under the premise of ensureing that legacy data information is not lost, original contents are converted, for repeating
Byte sequence less byte number coded representation, eliminate partial redundance data and finally saving memory space to reach
Purpose.Mainly adopt the compression algorithms such as LZ4, LZO applied to the data compression tool of storage system at present.
Full flash array:Flash memory solid-state disk (SSD) is widely used in the caching of mechanical hard disk, such as Ceph and ZFS, this master
There are good random IO performances because of flash memory solid-state disk, and traditional mechanical hard disk shows in terms of the support of random IO performances
Bad, storage system, which disposes full flash memory device, at present becomes popular tendency, with the overall performance of General Promotion storage system.Consider
Cost to SSD is much more expensive than present mechanical hard disk, and simultaneously under cloud computing instantly and virtualized environment, storage system is deposited
In a large amount of repeated and redundant data, the logical memory space of SSD storage systems can be extended by data deduplication and compress technique,
The capacity utilization for promoting SSD achievees the purpose that reduce SSD costs.
In general, for the equipment with processor, the IO performances of storage system are to influence device systems
The principal element of energy, and when the external memory of equipment is deployed as full flash array, the physical structure of general full flash array
It is divided into capacity layer and performance layer, wherein performance layer is also known as write buffer, and capacity layer is also known as read buffer, and full flash array is SSD
When, because of the asymmetry of SSD readwrite performances, reading speed is far above writing speed, therefore generally deployment pcie SSD are capacity layer,
Sata SSD are performance layer, and the data store strategy how being arranged between pcie SSD and sata SSD, to improve as property
The response speed of the sata SSD of ergosphere ensures that sata SSD have preferably write-in bandwidth and time delay, while improving capacity layer
Memory space is the technical problem to be solved in the present invention to further increase the IO performances of storage system.
Based on the above issues, the present invention proposes a kind of storage data compression method of full flash array, for convenience of understanding,
Referring to Fig. 1, in the embodiment of the present invention full flash array storage data compression method, including:
101, judge whether the current storage capacity of performance layer more than secure threshold thens follow the steps 102 if being not more than,
If more than thening follow the steps 106;
It is well known that the response speed of full flash array performance layer (write buffer) is improved, to ensure that performance layer has preferably
Bandwidth and time delay is written, is a kind of method for improving storage system IO performances, current general full flash array is all SSD, and
Because of the asymmetry of SSD readwrite performances, reading speed is far above writing speed, therefore generally deployment pcie SSD are capacity layer,
Sata SSD are performance layer, and the reading speed of pcie SSD ratio sata SSD is faster, and the durability of pcie SSD is stronger, and
It, can be when the current storage capacity of performance layer be not more than secure threshold, by property in order to ensure the better response speed of performance layer
Data in ergosphere are written back to capacity layer in real time, have sufficiently large space to be stored with the data ensured in performance layer.
Therefore the storage data compression device (later referred to as compression set) of full flash array will deposit the current of performance layer
It stores up capacity to be judged, and when the current storage capacity of performance layer is less than secure threshold, extremely by the write back data in performance layer
Capacity layer, to ensure the timely write-back of data in performance layer, it should be noted that:
The secure threshold of performance layer capacity be in order to ensure data in performance layer can in the form of duplicate removal is compressed it is quick
Backwash to capacity layer a critical value, when the current storage capacity of performance layer be less than or equal to secure threshold when, then to performance
Data in layer execute duplicate removal compression, and when the current storage capacity of performance layer is more than secure threshold, in order to reduce weight
Contracting occupies the I/O resource of performance layer, then abandons executing duplicate removal compression to the data in performance layer, and directly by the number in performance layer
According to being written back to capacity layer, therefore the secure threshold can be to be configured for the purpose of the IO performances for improving storage system, the safety threshold
Value can be the 80% or 60% of total memory capacity, be not particularly limited herein to the size of secure threshold.
And compression set can both judge the current storage capacity of performance layer in real time, it can also (every 5 points of timing
Clock) or the current storage capacity to performance layer of not timing judge, specifically, compression set can also be according to the number of system
Judged according to process demand, if the treating capacity of current data is bigger, current storage capacity is judged in real time, if working as
The treating capacity of preceding data is general, then timing judges current storage capacity, indefinite if the treating capacity of current data is seldom
When current storage capacity is judged, with achieve the purpose that save system resource.
102, in reading performance layer preset length data segment;
If the current storage capacity in performance layer is no more than the secure threshold in performance layer, compression set can be with reading performance
The data that store in layer, and by the write back data to capacity layer in performance layer, but in order in more easily reading performance layer
Data, compression set can be read out the data in performance layer according to preset length, and specific preset length can be
1M or 2M can be according to the system performance of compression set to facilitate the processing of data to be herein for the preset length of data
Purpose is designed, and is not specifically limited herein.
103, data segment is divided into the data block of presetting granularity, and calculate the fingerprint of data block;
In compression set reading performance layer after the data segment of preset length, in order to after data segment is written back to capacity layer, save
The about occupied space of the data segment carries out duplicate removal compression to the data segment, and the compressed data segment of duplicate removal is written back to appearance
Measure layer.
Specifically, compression set is to the compression step of the data segment of preset length:The data segment is divided into default grain
The data block of degree, and the fingerprint of data block is calculated, to judge to whether there is data content identical with the data block in capacity layer.
Wherein, the presetting granularity of data block can be the integral multiple of 4KB (because 4KB is the minimum write-in unit of SSD), i.e. 4KB, 8KB,
16KB, 24KB etc., the size of data block is using the processing speed of system as foundation herein, for the purpose of the processing speed for improving data
It is configured, is not particularly limited herein.
Because SHA-1 algorithms are the binary values that the binary value of random length is mapped as to shorter regular length, this
Small binary value is cryptographic Hash, and cryptographic Hash is the unique and compact numerical value representation of one piece of data, if one section of hash
A letter of the paragraph is changed in plain text and only, subsequent cryptographic Hash will all generate different values, therefore it is same for finding hash
The different inputs of two of value, are computationally impossible, so the Hash of a certain hash plaintext confirmed by SHA-1 algorithms
Value can be considered as " fingerprint " of the hash plaintext, and MD5 algorithms are identical as the principle of SHA-1 algorithms, therefore SHA-1 and MD5 algorithms
It is often used in the fingerprint for calculating data block.
104, the fingerprint base for inquiring capacity layer judges to whether there is the fingerprint in fingerprint base, and if it exists, then follow the steps
105, if being not present, then follow the steps 106;
It is understood that in order to expand the memory space of capacity layer as much as possible, the capacity layer in the present embodiment is right
To duplicate data stored in the form of compressed data, wherein in identical block when data are stored
Hold, specific compression process can be understood as follows:
When in capacity layer exist data content identical with above-mentioned data block when, i.e., in the fingerprint base in capacity layer exist with
When the identical fingerprint of above-mentioned fingerprint, compression set deletes the data block, and the data block is suitable in former data segment
In sequence, logical address, length deposit metadata, for recording position of the data block in former data segment, it is convenient for the later stage pair
The recovery of the data block.
Therefore compression set needs the fingerprint base for inquiring capacity layer, to judge in fingerprint base after obtaining the fingerprint of data block
With the presence or absence of the fingerprint, and step 105 is executed there are when the fingerprint in fingerprint base, when the fingerprint is not present, executed
Step 106;
105, it determines that the data block is duplicate data, and the metadata information of data block is written back to the metadata of capacity layer
Region, metadata information include sequence, the physical storage address of data block and the length of data block of the data block in data segment;
In order to expand the memory space of capacity layer, the data in capacity layer are stored in a compressed format in the present embodiment
, specifically, in fingerprint base in compression set judgement capacity layer when having the fingerprint of above-mentioned data block, it is determined that the data block is
Duplicate data, and the metadata information of the data block is written back to the metadata area of capacity layer, wherein Fig. 3 is full flash array
Performance layer, capacity layer and performance layer data, capacity layer data structural schematic diagram, specific metadata information includes data block
The physical storage address of sequence, data block in data segment and the length of data block.
106, other flows are executed.
In the present embodiment, it is more than in secure threshold or capacity layer in the current storage capacity of performance layer and the number is not present
According to block fingerprint when, then execute other flows, be not particularly limited herein.
In the embodiment of the present invention, the storage data compression device of full flash array first judges the current storage capacity of performance layer
Whether secure threshold is more than, and when memory capacity is not more than secure threshold, the data segment of preset length in reading performance layer, and
Data segment is divided into data block, calculates the fingerprint of data block, further in the fingerprint base of capacity layer there are when the fingerprint, really
The fixed data block is duplicate data, and the metadata of the data block is stored to the capacity layer of full flash array, because of the compression
Device is when the memory capacity in performance layer is not up to secure threshold, in real time by the write back data in performance layer to capacity layer, from
And the memory space of bigger is provided for the performance layer of full flash array, the response speed of full flash array performance layer is improved,
It ensures that full flash array performance layer has preferably write-in bandwidth and time delay, while will be stored to complete after the data block duplicate removal of performance layer
The capacity layer of flash array further improves the memory space of capacity layer, to improve the space availability ratio of storage system
And storage efficiency.
It is understood that under certain scenes, as being directed to common data (corporate communication record) in massive corporate,
Because the flowing of personnel or the change of communication modes can lead to the data frequent updating in address list, if the communication of company is recorded
It is stored in capacity layer, because data are stored in the form of duplicate removal, therefore when later data updates, the duplicate removal of data can be caused
The variation of reduction length after reference problem and data update, and while causing data to do strange land update bring fragmentation of data problem,
And then the difficulty and expense of follow-up data space reclamation are increased, for the problem, the embodiment of the present invention proposes a kind of full sudden strain of a muscle
The storage data compression method of array is deposited, referring to Fig. 2, a kind of storage data compression of full flash array in the embodiment of the present invention
Another embodiment of method, including:
201, judge whether the current storage capacity of performance layer more than secure threshold thens follow the steps 202 if being not more than,
If more than thening follow the steps 209;
202, in reading performance layer preset length data segment;
It should be noted that step 201 in the present embodiment is to 202 and the step 101 in embodiment described in Fig. 1 to 102
Similar, details are not described herein again.
203, judge whether the modification number of data segment more than first threshold thens follow the steps 204 if being not more than, if greatly
In thening follow the steps 209;
The fragmentation of data problem brought to reduce later data frequent updating, it is pre- in compression set reading performance layer
If after the data segment of length, judging the modification number of the data segment, if modification number is more than first threshold, judgement should
Data segment belongs to frequently modification data, thens follow the steps 210, if modification number is not more than first threshold, judges the data segment
Belong to non-frequent modification data, thens follow the steps 204.
Specifically, whether belonging to the type frequently changed for data, can be remembered by timer by setting up timer
In the preset time period of record (5 minutes or 10 minutes), the number that data segment is written into or reads is judged, if data segment is write
The number for entering or reading is more than first threshold (such as 10 times), then judges that the data segment belongs to frequently modification data, otherwise belong to non-
Frequently modification data.
204, judge whether the currently stored bandwidth of performance layer more than bandwidth threshold thens follow the steps 205 if being not more than,
If more than thening follow the steps 209;
In data processing, CPU and I/O resource are occupied in order to be further reduced data deduplication and compression, is deposited to original
The influence of the data service of storage system, compression set can also further judge the currently stored bandwidth of performance layer, i.e.,
Judge whether the currently stored bandwidth of performance layer is more than bandwidth threshold, if no more than thening follow the steps 205, if more than then executing
Step 209.
It should be noted that there is no stringent sequence limitation between step 203 and 204 in the present embodiment, it both can be first
203 are executed, then executes 204;Also to first carry out 204, then 203 are executed;Or the configuration according to system, to step 203 and step
204 carry out selective execution, and the IO performances of even storage system are poor, then can be performed simultaneously with step 203 and step 204, if
The IO performances of storage system are stronger, then can select an execution to step 203 and step 204.
Bandwidth of memory (memory bandwidth), refers to the information content that memory is accessed in the unit interval, also referred to as
For the digit or byte number of memory reading/write-in within the unit interval, message transmission rate technical indicator (unit is embodied:
Bps, bit/second or Bytes/s, byte per second), wherein the bandwidth threshold of memory determines the machine centered on memory
The transmission speed for obtaining information, as bandwidth uses B usmIt indicates, if the storage period is tm, n byte of each read/write, then its
Bandwidth
Even storage cycle is 500ns, and each storage cycle may have access to 16, then its bandwidth is 32M/s.
Therefore CPU and I/O resource are occupied in order to reduce data deduplication and compression, to the shadow of the data service of original storage system
It rings, after the modification number to data segment judges, further the currently stored bandwidth of performance layer can be judged,
Judge whether the currently stored bandwidth of buffer memory device is more than bandwidth threshold, if being not more than bandwidth threshold, performance layer can provide
I/O resource is compressed for data deduplication, if more than bandwidth threshold, is thened follow the steps 209, is occupied with reducing data deduplication and compression
CPU and I/O resource, the influence to the data service of original storage system.
Specifically, the bandwidth threshold of buffer memory device can be the bandwidth peak of buffer memory device, or bandwidth peak
80% or 60%, it is not particularly limited herein for the size of bandwidth threshold.
205, data segment is divided into the data block of presetting granularity, and calculate the fingerprint of data block;
206, the fingerprint base for inquiring capacity layer judges to whether there is the fingerprint in fingerprint base, and if it exists, then follow the steps
207, if being not present, then follow the steps 208;
It should be noted that step 205 in the present embodiment is to 206 and the step 103 in embodiment described in Fig. 1 to 104
Similar, details are not described herein again.
207, it determines that the data block is duplicate data, and the metadata information of the data block is written back to first number of capacity layer
According to region, metadata information includes sequence, the physical storage address of data block and the length of data block of the data block in data segment
Degree;
If there are the fingerprint of data block in capacity layer, illustrate that the content of the data block in capacity layer is to repeat, then will
The metadata information of data block is written back to the metadata area of capacity layer, and metadata information includes that data block is suitable in data segment
The length of sequence, the physical storage address of data block and data block.It should be noted that in order to save metadata area in capacity layer
Memory space, can by reduction length coding come indicate duplicate removal compression after data block length, if data block compression before
Length be 16K, become 12K after duplicate removal compression, then the length of data block can be aligned as unit of 4KB, respectively with 0,1,
2,4KB, 8KB, 12KB and 16KB are indicated 3, then can indicates data block after compression by number 2 in metadata area
Length, and without by 12KB come the length of data block after recording compressed because in capacity layer being remembered by binary system
Data are recorded, therefore 12KB is converted into the occupied digit of binary system and is much larger than digital 2 occupied digits, so being grown by compressing
Degree encodes to indicate that the length of data block after compressing can save the space of metadata area in capacity layer, to improve capacity layer
IO performances.
It is understood that indicating the length of data block after compressing with reduction length coding, then metadata includes data
The physical storage address and and reduction length coding of sequence of the block in data segment, data block.
208, squeeze operation is executed to data block, and compressed data block is written back to the data area of capacity layer, and
The fingerprint of the metadata information of data block after compression and former data block is updated in fingerprint base, the metadata information includes:
The compressed physical storage address of data block and the compressed length of data block;
After step 206, if the fingerprint of the data block is not present in the fingerprint base of capacity layer, illustrate that the data block exists
Belong to new data block in capacity layer, then squeeze operation is executed to the data block, and compressed data block is stored to capacity layer
Data area, and the fingerprint of the metadata information of data block after compression and meta data block (data block before compression) is updated to
In fingerprint base, metadata information includes:The compressed object storage address of data block and the compressed length of data block, in order to
Later stage restores the content of former data block according to metadata information.
Specifically, the decompression procedure of data block is referred to Huffman compression algorithm described in the prior or LZ compressions
Algorithm, details are not described herein again.
209, directly by the data area of the write back data in performance layer to capacity layer.
In step 201, if the current storage capacity of performance layer is more than secure threshold, in order to reduce duplicate removal compression occupancy property
The I/O resource of ergosphere, and the data in performance layer are quickly written back in capacity layer, to increase the storage of performance layer
The duplicate removal to data and compression are then abandoned in space, directly by the data area of the write back data in performance layer to capacity layer.
In step 203, if the modification number of data segment is more than first threshold, in order to reduce data, subsequently update is brought
The data segment is then directly written back to the data area of capacity layer by fragmentation of data problem, to reduce the difficulty in follow-up data space
Degree and IO expenses.
In step 204, CPU and I/O resource are occupied in order to be further reduced data deduplication and compression, is to original storage
The influence of the data service of system, compression set are then directly put when judging that the currently stored band of performance layer is wider than bandwidth threshold
The duplicate removal to data in performance layer and compression are abandoned, and directly by the data area of the write back data in performance layer to capacity layer.
In the embodiment of the present invention, the storage data compression device of full flash array first judges the current storage capacity of performance layer
Whether secure threshold is more than, and when memory capacity is not more than secure threshold, the data segment of preset length in reading performance layer, and
Data segment is divided into data block, calculates the fingerprint of data block, further in the fingerprint base of capacity layer there are when the fingerprint, really
It is duplicate data to determine data block, and the metadata of the data block is stored to the capacity layer of full flash array, because the compression fills
It sets when the memory capacity in performance layer is not up to secure threshold, in real time by the write back data in performance layer to capacity layer, thus
The memory space that bigger is provided for the performance layer of full flash array improves the response speed of full flash array performance layer, protects
Hindering full flash array performance layer has a preferably write-in bandwidth and time delay, at the same will store after the compression of the data block duplicate removal of performance layer to
The capacity layer of full flash array, further improves the memory space of capacity layer, to improve the space utilization of storage system
Rate and storage efficiency.
Secondly, the data segment modification number of compression set in the present embodiment also in performance layer be more than first threshold and/
Or the currently stored band of performance layer abandons the duplicate removal compression to data in performance layer, to reduce duplicate removal when being wider than bandwidth threshold
Compression occupies the I/O resource of performance layer, to further improve the IO performances of storage system.
Described above is the storage data compression methods of the full flash array in the embodiment of the present invention, this hair will be described below
The storage data compression device of full flash array in bright embodiment, referring to Fig. 4, full flash array in the embodiment of the present invention
Storing data compression device includes:
First judging unit 401, for judging whether the current storage capacity of the performance layer is more than secure threshold;
Reading unit 402, for when no more than the secure threshold, reading the data of preset length in the performance layer
Section;
Computing unit 403, the data block for the data segment to be divided into presetting granularity, and calculate the data block
Fingerprint;
Inquiry judging unit 404, the fingerprint base for inquiring the capacity layer judge to whether there is institute in the fingerprint base
State fingerprint;
Duplicate removal unit 405, for when there are the fingerprint, determining that the data block is duplicate data, and by data block
Metadata information is written back to the metadata area of capacity layer, and metadata information includes sequence, data of the data block in data segment
The physical storage address of block and the length of data block..
It should be noted that in the present embodiment in embodiment described in the effect of each unit and Fig. 1 full flash array storage
The effect of data compression device is similar, and details are not described herein again.
In the embodiment of the present invention, the first judging unit 401 first judges whether the current storage capacity of performance layer is more than safety
Threshold value, and when memory capacity is not more than secure threshold, pass through the data of preset length in 402 reading performance layer of reading unit
Section, and data segment is divided into data block, the fingerprint of data block is calculated, further there are the fingerprints in the fingerprint base of capacity layer
When, determine that the data block is duplicate data by duplicate removal unit 405, and by after duplicate removal data block and metadata store to complete and dodge
The capacity layer of array is deposited, because the compression set is when the memory capacity in performance layer is not up to secure threshold, in real time by performance
Write back data in layer, to provide the memory space of bigger for the performance layer of full flash array, improves complete to capacity layer
The response speed of flash array performance layer ensures that full flash array performance layer has a preferably write-in bandwidth and time delay, while by property
The memory space that capacity layer is further improved to the capacity layer of full flash array is stored after the data block duplicate removal compression of ergosphere,
To improve the IO performances of storage system.
Based on Fig. 4 the embodiment described, the storage data of the full flash array in the embodiment of the present invention are described below in detail
Compression set, referring to Fig. 5, in the embodiment of the present invention full flash array storage data compression device another embodiment,
Including:
First judging unit 501, for judging whether the current storage capacity of the performance layer is more than secure threshold;
Reading unit 502, for when no more than the secure threshold, reading the data of preset length in the performance layer
Section;
Computing unit 503, the data block for the data segment to be divided into presetting granularity, and calculate the data block
Fingerprint;
Inquiry judging unit 504, the fingerprint base for inquiring the capacity layer judge to whether there is institute in the fingerprint base
State fingerprint;
Duplicate removal unit 505, for when there are the fingerprint, it is determined that the data block is duplicate data, and by data block
Metadata information be written back to the metadata area of capacity layer, metadata information includes sequence of the data block in data segment, number
According to the physical storage address of block and the length of data block.
Preferably, which further includes:
Second judgment unit 506, for judging whether the modification number of data segment is more than first threshold;
First trigger element 507, for when no more than first threshold, then data segment to be divided into presetting granularity by triggering
The step of data block.
First write back unit 508, for when more than first threshold, data segment to be directly written back to the data field of capacity layer
Domain.
Preferably, which further includes:
Third judging unit 509, for judging whether the currently stored bandwidth of performance layer is more than bandwidth threshold;
Second trigger element 510, for when no more than bandwidth threshold, data segment to be divided into the number of presetting granularity by triggering
The step of according to block;
Second write back unit 511, for when more than bandwidth threshold, data segment to be directly written back to the data field of capacity layer
Domain.
Preferably, duplicate removal unit 505 further includes:
Mark module 5051, for the length of the data block after reduction length coded representation duplicate removal;
Metadata includes sequence of the data block in data segment, the physical storage address of data block and reduction length coding.
Preferably, which further includes:
Third write back unit 512 is used for when current storage capacity is more than secure threshold, directly by the data in performance layer
It is written back to the data area of capacity layer;
4th write back unit 513 when for fingerprint to be not present in fingerprint base, executes squeeze operation, and will to data block
Compressed data block is written back to the data area of capacity layer, and by the metadata information of data block after compression and former data block
Fingerprint is updated in fingerprint base, and the metadata information includes:The compressed physical storage address of data block and data block compression
Length afterwards.In the embodiment of the present invention, the first judging unit 501 first judges whether the current storage capacity of performance layer is more than safety
Threshold value, and when memory capacity is not more than secure threshold, pass through the data of preset length in 502 reading performance layer of reading unit
Section, and data segment is divided into data block, the fingerprint of data block is calculated, further there are the fingerprints in the fingerprint base of capacity layer
When, determine that the data block belongs to duplicate data by duplicate removal unit 505, and the metadata of the data block is stored to full flash memory battle array
The capacity layer of row in real time will be in performance layer because the compression set is when the memory capacity in performance layer is not up to secure threshold
Write back data to capacity layer, to provide the memory space of bigger for the performance layer of full flash array, improve full flash memory
The response speed of array performance layer ensures that full flash array performance layer has a preferably write-in bandwidth and time delay, while by performance layer
Data block duplicate removal compression after store the memory space that capacity layer is further improved to the capacity layer of full flash array, to
Improve the space availability ratio and storage efficiency of storage system.
Secondly, the data segment modification number of compression set in the present embodiment also in performance layer be more than first threshold and/
Or the currently stored band of performance layer passes through the first write back unit 508, the second write back unit 511 and when being wider than bandwidth threshold
Three write back units 512 abandon the duplicate removal compression to data in performance layer, to reduce the I/O resource that duplicate removal compression occupies performance layer, from
And further improve the IO performances of storage system.
Storage data compression from the angle of modular functionality entity to the full flash array in the embodiment of the present invention above
Device is described, and the computer installation in the embodiment of the present invention is described from the angle of hardware handles below:
The computer installation for realizing the storage data compression device of full flash array function, in the embodiment of the present invention
Computer installation one embodiment includes:
Processor and memory;
Memory can when processor is used to execute the computer program stored in memory for storing computer program
To realize following steps:
Judge whether the current storage capacity of performance layer is more than secure threshold;
If being not more than, the data segment of preset length in reading performance layer;
Data segment is divided into the data block of presetting granularity, and calculates the fingerprint of data block;
The fingerprint base for inquiring capacity layer judges to whether there is fingerprint in fingerprint base;
If there are fingerprint, determine that the data block is duplicate data, and the metadata information of data block is written back to capacity layer
Metadata area, metadata information includes the physical storage address and data of sequence of the data block in data segment, data block
The length of block.
In some embodiments of the invention, processor can be also used for realizing following steps:
Judge whether the modification number of data segment is more than first threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered.
If more than data segment to be then directly written back to the data area of capacity layer.
In some embodiments of the invention, processor can be also used for realizing following steps:
Judge whether the currently stored bandwidth of performance layer is more than bandwidth threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered;
If more than data segment to be then directly written back to the data area of capacity layer.
In some embodiments of the invention, processor can also be specifically used for realizing following steps:
With the length of the data block after reduction length coded representation duplicate removal;
Metadata includes sequence of the data block in data segment, the physical storage address of data block and reduction length coding.
In some embodiments of the invention, processor can be also used for realizing following steps:
If current storage capacity is more than secure threshold, directly by the data field of the write back data in performance layer to capacity layer
Domain;
If fingerprint is not present in fingerprint base, squeeze operation is executed to data block, and compressed data block is written back to appearance
The data area of layer is measured, and the fingerprint of the metadata information of data block after compression and former data block is updated in fingerprint base, institute
Stating metadata information includes:The compressed physical storage address of data block and the compressed length of data block.
It is understood that when the processor in the computer installation of above description executes the computer program, also may be used
To realize the function of each unit in above-mentioned corresponding each device embodiment, details are not described herein again.Illustratively, the computer journey
Sequence can be divided into one or more module/units, and one or more of module/units are stored in the memory
In, and executed by the processor, to complete the present invention.One or more of module/units can be can complete it is specific
The series of computation machine program instruction section of function, the instruction segment is for describing the computer program in the full flash array
Store the implementation procedure of data compression device.For example, the computer program can be divided into depositing for above-mentioned full flash array
The each unit in data compression device is stored up, the storage data compression device such as above-mentioned corresponding full flash array may be implemented in each unit
The concrete function of explanation.
The computer installation can be that the calculating such as desktop PC, notebook, palm PC and cloud server are set
It is standby.The computer installation may include but be not limited only to processor, memory.It will be understood by those skilled in the art that processor,
Memory is only the example of computer installation, does not constitute the restriction to computer installation, may include more or fewer
Component either combines certain components or different components, such as the computer installation can also be set including input and output
Standby, network access equipment, bus etc..
The processor can be central processing unit (Central Processing Unit, CPU), can also be it
His general processor, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor
Deng the processor is the control centre of the computer installation, utilizes various interfaces and the entire computer installation of connection
Various pieces.
The memory can be used for storing the computer program and/or module, and the processor is by running or executing
Computer program in the memory and/or module are stored, and calls the data being stored in memory, described in realization
The various functions of computer installation.The memory can include mainly storing program area and storage data field, wherein storage program
It area can storage program area, the application program etc. needed at least one function;Storage data field can store the use according to terminal
The data etc. created.In addition, memory may include high-speed random access memory, can also include non-volatile memories
Device, such as hard disk, memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure
Digital, SD) card, flash card (Flash Card), at least one disk memory, flush memory device or other volatibility are solid
State memory device.
The present invention also provides a kind of computer readable storage medium, which dodges for realizing complete
The function of depositing the storage data compression device of array, is stored thereon with computer program, when computer program is executed by processor,
Processor can be used for executing following steps:
Judge whether the current storage capacity of performance layer is more than secure threshold;
If being not more than, the data segment of preset length in reading performance layer;
Data segment is divided into the data block of presetting granularity, and calculates the fingerprint of data block;
The fingerprint base for inquiring capacity layer judges to whether there is fingerprint in fingerprint base;
If there are fingerprint, deduplication operation is executed to data block, and the metadata information of data block is written back to capacity layer
Metadata area, metadata information include the physical storage address and data block of sequence of the data block in data segment, data block
Length.In some embodiments of the invention, the computer program of computer-readable recording medium storage is executed by processor
When, processor can be also used for executing following steps:
Judge whether the modification number of data segment is more than first threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered.
If more than data segment to be then directly written back to the data area of capacity layer.
In some embodiments of the invention, the computer program of computer-readable recording medium storage is executed by processor
When, processor can be also used for executing following steps:
Judge whether the currently stored bandwidth of performance layer is more than bandwidth threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered;
If more than data segment to be then directly written back to the data area of capacity layer.
In some embodiments of the invention, the computer program of computer-readable recording medium storage is executed by processor
When, processor can be also used for specifically executing following steps:
With the length of the compressed data block of reduction length coded representation duplicate removal;
Metadata includes sequence of the data block in data segment, the physical storage address of data block and reduction length coding.
In some embodiments of the invention, the computer program of computer-readable recording medium storage is executed by processor
When, processor can be also used for executing following steps:
If current storage capacity is more than secure threshold, directly by the data field of the write back data in performance layer to capacity layer
Domain;
If fingerprint is not present in fingerprint base, the fingerprint of the data block is updated in fingerprint base, pressure is executed to data block
Contracting operates, and compressed data block and corresponding metadata information are written back to data area and the metadata of capacity layer respectively
Region, metadata information include:After the compressed physical storage address of sequence, data block and data block compression inside data block
Length.It is understood that if the integrated unit is realized in the form of SFU software functional unit and as independent production
Product are sold or in use, can be stored in a corresponding computer read/write memory medium.Based on this understanding, this hair
The bright all or part of flow realized in above-mentioned corresponding embodiment method, can also be instructed relevant by computer program
Hardware is completed, and the computer program can be stored in a computer readable storage medium, which is being located
It manages when device executes, it can be achieved that the step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer program generation
Code, the computer program code can be source code form, object identification code form, executable file or certain intermediate forms
Deng.The computer-readable medium may include:Any entity or device, record of the computer program code can be carried
Medium, USB flash disk, mobile hard disk, magnetic disc, CD, computer storage, read-only memory (ROM, Read-Only Memory), with
Machine accesses memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..
It should be noted that the content that the computer-readable medium includes can be according to legislation and patent practice in jurisdiction
It is required that carrying out increase and decrease appropriate, such as in certain jurisdictions, do not wrapped according to legislation and patent practice, computer-readable medium
Include electric carrier signal and telecommunication signal.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be the indirect coupling by some interfaces, device or unit
It closes or communicates to connect, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple
In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also
It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list
The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can be stored in a computer read/write memory medium.Based on this understanding, technical scheme of the present invention is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present invention
Portion or part steps.And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey
The medium of sequence code.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to before
Stating embodiment, invention is explained in detail, it will be understood by those of ordinary skill in the art that:It still can be to preceding
The technical solution recorded in each embodiment is stated to modify or equivalent replacement of some of the technical features;And these
Modification or replacement, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution.
Claims (12)
1. a kind of storage data compression method of full flash array, the full flash array includes performance layer and capacity layer, spy
Sign is, the method includes:
Judge whether the current storage capacity of the performance layer is more than secure threshold;
If being not more than, the data segment of preset length in the performance layer is read;
The data segment is divided into the data block of presetting granularity, and calculates the fingerprint of the data block;
The fingerprint base for inquiring the capacity layer judges to whether there is the fingerprint in the fingerprint base;
If there are the fingerprints, it is determined that the data block is duplicate data, and by the metadata information write-back of the data block
To the metadata area of the capacity layer, the metadata information includes sequence of the data block in the data segment, institute
State the physical storage address of data block and the length of the data block.
2. according to the method described in claim 1, it is characterized in that, in reading the performance layer preset length data segment it
Afterwards, the method further includes:
Judge whether the modification number of the data segment is more than first threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered;
If more than the data segment to be then directly written back to the data area of the capacity layer.
3. according to the method described in claim 1, it is characterized in that, in reading the performance layer preset length data segment it
Afterwards, the method further includes:
Judge whether the currently stored bandwidth of the performance layer is more than bandwidth threshold;
If being not more than, the step of data segment is divided into the data block of presetting granularity is triggered;
If more than the data segment to be then directly written back to the data area of the capacity layer.
4. according to the method in any one of claims 1 to 3, which is characterized in that the method further includes:
With the length of the compressed data block of reduction length coded representation duplicate removal;
The metadata includes sequence of the data block in the data segment, the physical storage address of the data block and institute
State the reduction length coding of data block.
5. according to the method described in claim 4, it is characterized in that, the method further includes:
If current storage capacity is more than the secure threshold, directly by the write back data in the performance layer to the capacity layer
Data area;
If the fingerprint is not present, squeeze operation is executed to the data block, and compressed data block is written back to the appearance
The data area of layer is measured, and the fingerprint of the metadata information of data block after compression and the data block is updated to the fingerprint base
In, the metadata information includes:The compressed physical storage address of data block and the compressed length of the data block.
6. a kind of storage data compression device of full flash array, the full flash array includes performance layer and capacity layer, spy
Sign is that described device includes:
First judging unit, for judging whether the current storage capacity of the performance layer is more than secure threshold;
Reading unit, for when no more than the secure threshold, reading the data segment of preset length in the performance layer;
Computing unit, the data block for the data segment to be divided into presetting granularity, and calculate the fingerprint of the data block;
Inquiry judging unit, the fingerprint base for inquiring the capacity layer judge to whether there is the fingerprint in the fingerprint base;
Duplicate removal unit, for when there are the fingerprint, it is determined that the data block is duplicate data, and by the data block
Metadata information is written back to the metadata area of the capacity layer, and the metadata information includes the data block in the data
Sequence, the physical storage address of the data block and the length of the data block in section.
7. device according to claim 6, which is characterized in that described device further includes:
Second judgment unit, for judging whether the modification number of the data segment is more than first threshold;
First trigger element, for when no more than the first threshold, then the data segment to be divided into presetting granularity by triggering
Data block the step of;
First write back unit, for when more than the first threshold, the data segment to be directly written back to the capacity layer
Data area.
8. device according to claim 6, which is characterized in that described device further includes:
Third judging unit, for judging whether the currently stored bandwidth of the performance layer is more than bandwidth threshold;
Second trigger element, for when no more than the bandwidth threshold, the data segment to be divided into presetting granularity by triggering
The step of data block;
Second write back unit, for when more than the bandwidth threshold, the data segment to be directly written back to the capacity layer
Data area.
9. the device according to any one of claim 6 to 8, which is characterized in that the duplicate removal compression unit further includes:
Mark module, for the length of the compressed data block of reduction length coded representation duplicate removal;
The metadata includes sequence of the data block in the data segment, the physical storage address of the data block and institute
State the reduction length coding of data block.
10. device according to claim 9, which is characterized in that described device further includes:
Third write back unit is used for when current storage capacity is more than the secure threshold, directly by the number in the performance layer
According to the data area for being written back to the capacity layer;
When for the fingerprint to be not present in the fingerprint base, squeeze operation is executed to the data block for 4th write back unit,
And compressed data block is written back to the data area of the capacity layer, and by the metadata information of data block after compression and institute
The fingerprint for stating data block is updated in the fingerprint base, and the metadata information includes:The compressed physics of data block is deposited
Store up address and the compressed length of the data block.
11. a kind of computer installation, including processor, which is characterized in that the processor is stored in execution on memory
When computer program, for realizing the storage data compression side of the full flash array as described in any one of claim 1 to 5
Method.
12. a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that the computer program is performed
When, for realizing the storage data compression method of the full flash array as described in any one of claim 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810214771.9A CN108427538B (en) | 2018-03-15 | 2018-03-15 | Storage data compression method and device of full flash memory array and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810214771.9A CN108427538B (en) | 2018-03-15 | 2018-03-15 | Storage data compression method and device of full flash memory array and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108427538A true CN108427538A (en) | 2018-08-21 |
CN108427538B CN108427538B (en) | 2021-06-04 |
Family
ID=63158230
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810214771.9A Active CN108427538B (en) | 2018-03-15 | 2018-03-15 | Storage data compression method and device of full flash memory array and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108427538B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109445713A (en) * | 2018-11-09 | 2019-03-08 | 郑州云海信息技术有限公司 | A kind of storage state recording method, system and the associated component of metadata volume |
CN109814809A (en) * | 2019-01-14 | 2019-05-28 | 杭州宏杉科技股份有限公司 | Data compression method and apparatus |
CN110018792A (en) * | 2019-04-10 | 2019-07-16 | 苏州浪潮智能科技有限公司 | One kind is to rule data processing method, device, electronic equipment and storage medium |
CN110209640A (en) * | 2019-06-06 | 2019-09-06 | 四川长虹电器股份有限公司 | The method of switching at runtime lz4 compression algorithm type under cell phone system operating status |
CN110377226A (en) * | 2019-06-10 | 2019-10-25 | 平安科技(深圳)有限公司 | Compression method, device and storage medium based on storage engines bluestore |
CN110618789A (en) * | 2019-08-14 | 2019-12-27 | 华为技术有限公司 | Method and device for deleting repeated data |
CN111079917A (en) * | 2018-10-22 | 2020-04-28 | 北京地平线机器人技术研发有限公司 | Tensor data block access method and device |
CN111124940A (en) * | 2018-10-31 | 2020-05-08 | 深信服科技股份有限公司 | Space recovery method and system based on full flash memory array |
CN111124259A (en) * | 2018-10-31 | 2020-05-08 | 深信服科技股份有限公司 | Data compression method and system based on full flash memory array |
CN111125033A (en) * | 2018-10-31 | 2020-05-08 | 深信服科技股份有限公司 | Space recovery method and system based on full flash memory array |
CN111124939A (en) * | 2018-10-31 | 2020-05-08 | 深信服科技股份有限公司 | Data compression method and system based on full flash memory array |
CN111198857A (en) * | 2018-10-31 | 2020-05-26 | 深信服科技股份有限公司 | Data compression method and system based on full flash memory array |
CN111831480A (en) * | 2020-06-17 | 2020-10-27 | 华中科技大学 | Layered coding method and device based on duplicate removal system and duplicate removal system |
CN112306974A (en) * | 2019-07-30 | 2021-02-02 | 深信服科技股份有限公司 | Data processing method, device, equipment and storage medium |
CN113467699A (en) * | 2020-03-30 | 2021-10-01 | 华为技术有限公司 | Method and device for improving available storage capacity |
CN113590051A (en) * | 2021-09-29 | 2021-11-02 | 阿里云计算有限公司 | Data storage and reading method and device, electronic equipment and medium |
CN114003169A (en) * | 2021-08-02 | 2022-02-01 | 固存芯控半导体科技(苏州)有限公司 | Data compression method for SSD |
CN114866483A (en) * | 2022-03-25 | 2022-08-05 | 新华三大数据技术有限公司 | Data compression flow control method and device and electronic equipment |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080005141A1 (en) * | 2006-06-29 | 2008-01-03 | Ling Zheng | System and method for retrieving and using block fingerprints for data deduplication |
CN102831222A (en) * | 2012-08-24 | 2012-12-19 | 华中科技大学 | Differential compression method based on data de-duplication |
CN102982122A (en) * | 2012-11-13 | 2013-03-20 | 浪潮电子信息产业股份有限公司 | Repeating data deleting method suitable for mass storage system |
CN103473266A (en) * | 2013-08-09 | 2013-12-25 | 记忆科技(深圳)有限公司 | Solid state disk and method for deleting repeating data thereof |
CN103502957A (en) * | 2012-12-28 | 2014-01-08 | 华为技术有限公司 | Data processing method and device |
WO2014037767A1 (en) * | 2012-09-05 | 2014-03-13 | Indian Institute Of Technology, Kharagpur | Multi-level inline data deduplication |
CN103873506A (en) * | 2012-12-12 | 2014-06-18 | 鸿富锦精密工业(深圳)有限公司 | Data block duplication removing system in storage cluster and method thereof |
CN103914516A (en) * | 2014-02-25 | 2014-07-09 | 深圳市中博科创信息技术有限公司 | Method and system for layer-management of storage system |
CN104462388A (en) * | 2014-12-10 | 2015-03-25 | 上海爱数软件有限公司 | Redundant data cleaning method based on cascade storage media |
US20150088945A1 (en) * | 2013-09-25 | 2015-03-26 | Nec Laboratories America, Inc. | Adaptive compression supporting output size thresholds |
CN105094709A (en) * | 2015-08-27 | 2015-11-25 | 浪潮电子信息产业股份有限公司 | Dynamic data compression method for solid-state disc storage system |
CN105787037A (en) * | 2016-02-25 | 2016-07-20 | 浪潮(北京)电子信息产业有限公司 | Repeated data deleting method and device |
CN106055271A (en) * | 2016-05-17 | 2016-10-26 | 浪潮(北京)电子信息产业有限公司 | Method and device for de-repetition selection of repeated data based on cloud computing |
US20170192712A1 (en) * | 2015-12-30 | 2017-07-06 | Nutanix, Inc. | Method and system for implementing high yield de-duplication for computing applications |
CN107193498A (en) * | 2017-05-25 | 2017-09-22 | 山东浪潮商用系统有限公司 | A kind of method and device that data are carried out with deduplication processing |
CN107682016A (en) * | 2017-09-26 | 2018-02-09 | 深信服科技股份有限公司 | A kind of data compression method, data decompression method and related system |
-
2018
- 2018-03-15 CN CN201810214771.9A patent/CN108427538B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080005141A1 (en) * | 2006-06-29 | 2008-01-03 | Ling Zheng | System and method for retrieving and using block fingerprints for data deduplication |
CN102831222A (en) * | 2012-08-24 | 2012-12-19 | 华中科技大学 | Differential compression method based on data de-duplication |
WO2014037767A1 (en) * | 2012-09-05 | 2014-03-13 | Indian Institute Of Technology, Kharagpur | Multi-level inline data deduplication |
CN102982122A (en) * | 2012-11-13 | 2013-03-20 | 浪潮电子信息产业股份有限公司 | Repeating data deleting method suitable for mass storage system |
CN103873506A (en) * | 2012-12-12 | 2014-06-18 | 鸿富锦精密工业(深圳)有限公司 | Data block duplication removing system in storage cluster and method thereof |
CN103502957A (en) * | 2012-12-28 | 2014-01-08 | 华为技术有限公司 | Data processing method and device |
CN103473266A (en) * | 2013-08-09 | 2013-12-25 | 记忆科技(深圳)有限公司 | Solid state disk and method for deleting repeating data thereof |
US20150088945A1 (en) * | 2013-09-25 | 2015-03-26 | Nec Laboratories America, Inc. | Adaptive compression supporting output size thresholds |
CN103914516A (en) * | 2014-02-25 | 2014-07-09 | 深圳市中博科创信息技术有限公司 | Method and system for layer-management of storage system |
CN104462388A (en) * | 2014-12-10 | 2015-03-25 | 上海爱数软件有限公司 | Redundant data cleaning method based on cascade storage media |
CN105094709A (en) * | 2015-08-27 | 2015-11-25 | 浪潮电子信息产业股份有限公司 | Dynamic data compression method for solid-state disc storage system |
US20170192712A1 (en) * | 2015-12-30 | 2017-07-06 | Nutanix, Inc. | Method and system for implementing high yield de-duplication for computing applications |
CN105787037A (en) * | 2016-02-25 | 2016-07-20 | 浪潮(北京)电子信息产业有限公司 | Repeated data deleting method and device |
CN106055271A (en) * | 2016-05-17 | 2016-10-26 | 浪潮(北京)电子信息产业有限公司 | Method and device for de-repetition selection of repeated data based on cloud computing |
CN107193498A (en) * | 2017-05-25 | 2017-09-22 | 山东浪潮商用系统有限公司 | A kind of method and device that data are carried out with deduplication processing |
CN107682016A (en) * | 2017-09-26 | 2018-02-09 | 深信服科技股份有限公司 | A kind of data compression method, data decompression method and related system |
Non-Patent Citations (2)
Title |
---|
夏文: "数据备份系统中冗余数据的高性能消除技术研究", 《中国博士学位论文全文数据库 信息科技辑》 * |
韩帅军: "面向归档存储的重复数据删除优化方法研究", 《中国优秀硕士学位论文全文数据库 信息科学辑》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111079917B (en) * | 2018-10-22 | 2023-08-11 | 北京地平线机器人技术研发有限公司 | Tensor data block access method and device |
CN111079917A (en) * | 2018-10-22 | 2020-04-28 | 北京地平线机器人技术研发有限公司 | Tensor data block access method and device |
CN111124939A (en) * | 2018-10-31 | 2020-05-08 | 深信服科技股份有限公司 | Data compression method and system based on full flash memory array |
CN111125033B (en) * | 2018-10-31 | 2024-04-09 | 深信服科技股份有限公司 | Space recycling method and system based on full flash memory array |
CN111124940B (en) * | 2018-10-31 | 2022-03-22 | 深信服科技股份有限公司 | Space recovery method and system based on full flash memory array |
CN111198857A (en) * | 2018-10-31 | 2020-05-26 | 深信服科技股份有限公司 | Data compression method and system based on full flash memory array |
CN111124940A (en) * | 2018-10-31 | 2020-05-08 | 深信服科技股份有限公司 | Space recovery method and system based on full flash memory array |
CN111124259A (en) * | 2018-10-31 | 2020-05-08 | 深信服科技股份有限公司 | Data compression method and system based on full flash memory array |
CN111125033A (en) * | 2018-10-31 | 2020-05-08 | 深信服科技股份有限公司 | Space recovery method and system based on full flash memory array |
CN109445713A (en) * | 2018-11-09 | 2019-03-08 | 郑州云海信息技术有限公司 | A kind of storage state recording method, system and the associated component of metadata volume |
CN109814809B (en) * | 2019-01-14 | 2022-03-11 | 杭州宏杉科技股份有限公司 | Data compression method and device |
CN109814809A (en) * | 2019-01-14 | 2019-05-28 | 杭州宏杉科技股份有限公司 | Data compression method and apparatus |
CN110018792A (en) * | 2019-04-10 | 2019-07-16 | 苏州浪潮智能科技有限公司 | One kind is to rule data processing method, device, electronic equipment and storage medium |
CN110209640A (en) * | 2019-06-06 | 2019-09-06 | 四川长虹电器股份有限公司 | The method of switching at runtime lz4 compression algorithm type under cell phone system operating status |
CN110377226A (en) * | 2019-06-10 | 2019-10-25 | 平安科技(深圳)有限公司 | Compression method, device and storage medium based on storage engines bluestore |
WO2020248493A1 (en) * | 2019-06-10 | 2020-12-17 | 平安科技(深圳)有限公司 | Compression method and device based on storage engine bluestore, and storage medium |
CN112306974A (en) * | 2019-07-30 | 2021-02-02 | 深信服科技股份有限公司 | Data processing method, device, equipment and storage medium |
CN110618789A (en) * | 2019-08-14 | 2019-12-27 | 华为技术有限公司 | Method and device for deleting repeated data |
CN113467699A (en) * | 2020-03-30 | 2021-10-01 | 华为技术有限公司 | Method and device for improving available storage capacity |
CN113467699B (en) * | 2020-03-30 | 2023-08-22 | 华为技术有限公司 | Method and device for improving available storage capacity |
CN111831480A (en) * | 2020-06-17 | 2020-10-27 | 华中科技大学 | Layered coding method and device based on duplicate removal system and duplicate removal system |
CN111831480B (en) * | 2020-06-17 | 2024-04-19 | 华中科技大学 | Layered coding method and device based on deduplication system and deduplication system |
CN114003169A (en) * | 2021-08-02 | 2022-02-01 | 固存芯控半导体科技(苏州)有限公司 | Data compression method for SSD |
CN114003169B (en) * | 2021-08-02 | 2024-04-16 | 固存芯控半导体科技(苏州)有限公司 | Data compression method for SSD |
CN113590051A (en) * | 2021-09-29 | 2021-11-02 | 阿里云计算有限公司 | Data storage and reading method and device, electronic equipment and medium |
CN114866483A (en) * | 2022-03-25 | 2022-08-05 | 新华三大数据技术有限公司 | Data compression flow control method and device and electronic equipment |
CN114866483B (en) * | 2022-03-25 | 2023-10-03 | 新华三大数据技术有限公司 | Data compression flow control method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN108427538B (en) | 2021-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108427538A (en) | Storage data compression method, device and the readable storage medium storing program for executing of full flash array | |
CN108427539A (en) | Offline duplicate removal compression method, device and the readable storage medium storing program for executing of buffer memory device data | |
CN108415669A (en) | The data duplicate removal method and device of storage system, computer installation and storage medium | |
CN105204781B (en) | Compression method, device and equipment | |
CN107046812B (en) | Data storage method and device | |
CN103098035B (en) | Storage system | |
EP3316150B1 (en) | Method and apparatus for file compaction in key-value storage system | |
WO2018033035A1 (en) | Solid-state drive control device and solid-state drive data access method based on learning | |
CN103870514B (en) | Data de-duplication method and device | |
CN110377226B (en) | Compression method and device based on storage engine bluestore and storage medium | |
CN105824881B (en) | A kind of data de-duplication data placement method based on load balancing | |
CN103353850B (en) | Virtual machine thermal migration memory processing method, device and system | |
CN107506153A (en) | A kind of data compression method, data decompression method and related system | |
CN111125033B (en) | Space recycling method and system based on full flash memory array | |
CN107682016A (en) | A kind of data compression method, data decompression method and related system | |
CN103152430B (en) | A kind of reduce the cloud storage method that data take up room | |
CN110347643B (en) | Method and device for cloning NTFS (New technology File System) volume between disks | |
CN102970043A (en) | GZIP (GNUzip)-based hardware compressing system and accelerating method thereof | |
CN110941514B (en) | Data backup method, data recovery method, computer equipment and storage medium | |
CN106569750A (en) | Data compression method and device | |
CN111124940B (en) | Space recovery method and system based on full flash memory array | |
CN110083487A (en) | A kind of reference data block fragment removing method and system based on data locality | |
CN111124939A (en) | Data compression method and system based on full flash memory array | |
CN111061428B (en) | Data compression method and device | |
CN103810297A (en) | Writing method, reading method, writing device and reading device on basis of re-deleting technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |