CN103389946A - Fragmentization removal method and system - Google Patents

Fragmentization removal method and system Download PDF

Info

Publication number
CN103389946A
CN103389946A CN2013102983262A CN201310298326A CN103389946A CN 103389946 A CN103389946 A CN 103389946A CN 2013102983262 A CN2013102983262 A CN 2013102983262A CN 201310298326 A CN201310298326 A CN 201310298326A CN 103389946 A CN103389946 A CN 103389946A
Authority
CN
China
Prior art keywords
bandwidth
data object
access
read
length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013102983262A
Other languages
Chinese (zh)
Other versions
CN103389946B (en
Inventor
严得辰
刘立坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201310298326.2A priority Critical patent/CN103389946B/en
Publication of CN103389946A publication Critical patent/CN103389946A/en
Application granted granted Critical
Publication of CN103389946B publication Critical patent/CN103389946B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a fragmentization removal method. The method comprises the following steps of firstly determining the expected read/write bandwidth of a data object of each access operation in an access predicted sequence, wherein the expected read/write bandwidth is fitted bandwidth obtained by taking data object length and access skip step length which correspond to the access operation as variables on the basis of a fitted bandwidth function, and then, writing the data objects of continuous multiple access operations in a new continuous storage space in sequence from a certain access operation in the access predicted sequence if the continuous multiple access operations exceed a scheduled number and the expected read/write bandwidth of the data object of each access operation is smaller than a bandwidth threshold value. According to the method, fragmentization data can be more accurately judged, and under the condition that little influence is generated on the current system performance, the read performance is improved. The method is better in adaptability in a changing environment.

Description

Go the fragmentation method and system
Technical field
The invention belongs to the Computer Storage field, relate in particular to the method for deal with data fragment.
Background technology
Fragmentation refers to originally relatively continuous in logic file data after being stored in storage medium, and its memory location becomes and relatively disperses, and has brought a large amount of random read-write operations when data access.For the storage system based on disk of present main flow, fragmentation can cause the continuous decline of read/write bandwidth, therefore, needs a kind of fragmentation method of going, and reduces the impact of fragmentation, reduces the random read-write access.
The existing normally input reference forecasting sequence of fragmentation method that goes, identify fragmentation data and the fragmentation data be written to new storage area according to this access forecasting sequence, thereby reaching the purpose of fragmentation.Described access forecasting sequence is to a kind of prediction of in the future most possible access order, has multiple Forecasting Methodology, but from being divided in essence two kinds: (1) static prediction, as the logical storage sequential prediction according to data; (2) performance prediction, as according to the current accessed sequential prediction.
In going the process of fragmentation, mainly need to solve two problems: (1) which type of data object is the fragmentation data, if namely according to the current storage system of access forecasting sequence access, the access of which data object can cause the read/write bandwidth more decline to occur; (2) which data should be written to new memory location, thereby in the random read-write access that reduces in the future, the current performance of little effect system as far as possible.Existing go the subject matter that exists in the fragmentation method to show following several aspect: at first, not accurate enough to the judgement of fragmentation data, effective improving performance; Secondly, go in Fragmentation, need mobile data volume too much, cause operating period system performance appearance to descend significantly.
Summary of the invention
Therefore, the object of the invention is to overcome the defect of above-mentioned prior art, a kind of new fragmentation method of going is provided.
The objective of the invention is to be achieved through the following technical solutions:
On the one hand, the invention provides a kind of fragmentation method of going, described method comprises:
Step 1) determine the expection read/write bandwidth of the data object of each accessing operation in the access forecasting sequence, the described read/write bandwidth of looking ahead is to beat step-length as variable, the match bandwidth that obtains based on match bandwidth function take this accessing operation corresponding data object length and access;
Step 2) certain accessing operation starts from the access forecasting sequence, less than bandwidth threshold, the data object of these accessing operations is written to according to the order of sequence new continuous storage space for the read/write bandwidth of looking ahead of the data object of the continuous a plurality of accessing operations that surpass predetermined number and each accessing operation.
In said method, described step 1) in, described match bandwidth function can be to go in the storage system of fragmentation at needs, beat step-length as variable take data object length and access, based on the benchmark read/write band data of in different pieces of information object length and access, beating under step-length and measuring, the bandwidth function expression that obtains by approximating method.
In said method, described match bandwidth function can obtain through the following steps:
Step a) is gone in the storage system of fragmentation at needs, take data object length x and access, beats step-length y as variable, measures the benchmark read/write band data under one group of different x and y;
Step b) from x, selected master variable in y, another is time variable;
Step c) find the master variable value that approaches the most with master variable;
Steps d) find the interval of time variable;
Step e) based on measured different x and the benchmark read/write band data match x under y, the read/write bandwidth f (x, y) when y is any value.
In said method, described step c) in, when master variable is x, find x iMake Δ x=h (x iThe absolute value of)-h (x) is minimum; When master variable is y, find y iMake Δ y=h (y iThe absolute value of)-h (y) is minimum;
Described steps d) in, when inferior variable is y, find y jMake y ∈ [y j, y j+1], when inferior variable is x, find x jMake x ∈ [x j, x j+1];
Step e) in, when inferior variable is y, make f ( x , y ) = f ij + 1 + h ( y ) - h ( y j ) h ( y j + 1 ) - hy j | f ij - f ij + 1 | + g ( Δx ) , Wherein, the further correction of g (Δ x) expression to fitting result; When inferior variable is x, make f ( x , y ) = f ij + h ( x ) - h ( x j ) h ( x j + 1 ) - h ( x j ) | f ij - f i + 1 j | + g ( Δy ) , Wherein, the further correction of g (Δ y) expression to fitting result.
In said method, also can comprise step 3) certain accessing operation starts from the access forecasting sequence, the read/write band of looking ahead for the data object of the continuous a plurality of accessing operations that surpass predetermined number and each accessing operation is wider than bandwidth threshold, skip these accessing operations, continue to process ensuing accessing operation.
In said method, also can comprise described step 4), in the access forecasting sequence not by step 2) or step 3) subsequence s that process, that formed by continuous a plurality of accessing operations i..., s i+mIf the overall bandwidth of this subsequence less than the hypothesis continuous bandwidth of certain threshold value and this subsequence greater than certain threshold value, the data object of each accessing operation in this subsequence is written to new continuous storage space, and the access entrance of these data objects is changed to new position; Wherein, described hypothesis continuous bandwidth represents if the bandwidth after the data object of accessing operation in this subsequence is deposited continuously; Subsequence s i..., s i+mOverall bandwidth l iFor s iThe length of data object, f iFor s iThe expection read/write bandwidth of data object, m is the number of accessing operation in this subsequence.
In said method, described bandwidth threshold can go maximum bandwidth value or the continuous bandwidth of the storage system of fragmentation to determine as required, and described continuous bandwidth is that step-length is 0 in the situation that access is beated, the match bandwidth that obtains based on match bandwidth function.
Another aspect, the present invention also provides a kind of fragmentation system of going, and described system comprises:
Bandwidth is determined device, the expection read/write bandwidth that is used for the data object of definite each accessing operation of access forecasting sequence, the described read/write bandwidth of looking ahead is to beat step-length as variable, the match bandwidth that obtains based on match bandwidth function take this accessing operation corresponding data object length and access;
Remove the fragment device, be used for from certain accessing operation of access forecasting sequence, less than bandwidth threshold, the data object of these accessing operations is written to according to the order of sequence new continuous storage space for the read/write bandwidth of looking ahead of the data object of the continuous a plurality of accessing operations that surpass predetermined number and each accessing operation.
In said system, described match bandwidth function can be to go in the storage system of fragmentation at needs, beat step-length as variable take data object length and access, based on the benchmark read/write band data of in different pieces of information object length and access, beating under step-length and measuring, the bandwidth function expression that obtains by approximating method.
In said system, described go the fragment device also can be used for from the access forecasting sequence certain accessing operation start, the read/write band of looking ahead for the data object of the continuous a plurality of accessing operations that surpass predetermined number and each accessing operation is wider than bandwidth threshold, skip these accessing operations, continue to process ensuing accessing operation.
In said system, describedly go the fragment device also to can be used for the read/write bandwidth of looking ahead for the data object of the continuous a plurality of accessing operations that meet not surpass predetermined number all to be greater than or less than subsequence s bandwidth threshold, that formed by continuous a plurality of accessing operations i..., s i+mIf the overall bandwidth of this subsequence less than the hypothesis continuous bandwidth of certain threshold value and this subsequence greater than certain threshold value, the data object of each accessing operation in this subsequence is written to new continuous storage space, and the access entrance of these data objects is changed to new position; Wherein, described hypothesis continuous bandwidth represents if the bandwidth after the data object of accessing operation in this subsequence is deposited continuously; Subsequence s i..., s i+mOverall bandwidth
Figure BDA00003520751100041
l iFor s iThe length of data object, f iFor s iThe expection read/write bandwidth of data object, m is the number of accessing operation in this subsequence.
Compared with prior art, the fragmentation method of going provided by the present invention, can judge the fragmentation data comparatively accurately,, in the situation that little effect current system performance improves reading performance, at the environment that changes, has better adaptability.
Description of drawings
Embodiments of the present invention is further illustrated referring to accompanying drawing, wherein:
Fig. 1 is for removing according to an embodiment of the invention the schematic flow sheet of fragmentation method;
Fig. 2 is the schematic flow sheet that goes the fragmentation method of another embodiment according to the present invention.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage are clearer, and the present invention is described in more detail by specific embodiment below in conjunction with accompanying drawing.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
In fact the access forecasting sequence is exactly a series of accessing operation, and the size of the main data object of accessing with each accessing operation of read/write bandwidth and access access location before or after this data object to change distance (or also can be called access the step-length of beating) relevant.Usually, the degree of fragmentation of the data object that in the access forecasting sequence, the read/write bandwidth is less is higher, but considers that the read/write bandwidth is a kind of posteriority index, can't obtain in advance.Therefore, in an embodiment of the present invention, adopted read/write bandwidth approximating method, calculate the expection read/write bandwidth (hereinafter sometimes also being referred to as match bandwidth corresponding to accessing operation) of the data object that in the access forecasting sequence, each accessing operation is accessed, and carry out the judgement of degree of fragmentation based on this index.
Fig. 1 has provided the schematic flow sheet that goes according to an embodiment of the invention the fragmentation method.The method comprises: step 1) definite expection read/write bandwidth of accessing the data object of each accessing operation in forecasting sequence; Step 2) certain accessing operation starts from the access forecasting sequence, if surpass the read/write bandwidth of looking ahead of data object of continuous a plurality of accessing operations of predetermined number less than bandwidth threshold, the data object of these accessing operations be written to new continuous storage space according to the order of sequence.
With reference to figure 1, more specifically, in step 1) definite expection read/write bandwidth of accessing the data object of each accessing operation in forecasting sequence.The described read/write bandwidth of looking ahead is to beat step-length as variable, the match bandwidth that obtains based on match bandwidth function take this accessing operation corresponding data object length and access.According to one embodiment of present invention, described step 1) can comprise the following steps:
Step 11) receive one by one the accessing operation of accessing in forecasting sequence, and be kept at formation S:<s 1..., s nIn.In the information of preserving in formation S including but not limited to the reference position o of each the accessing operation institute visit data object that obtains by certain mode (as metadata access, index search etc.) iAnd the length l of data object iThe access forecasting sequence of preserving in formation S can be the forecasting sequence (static prediction or performance prediction) that directly obtains.But can be also through pretreated forecasting sequence the forecasting sequence of repetition (as remove); For example, if k the accessing operation s that receives kWith s in formation S 1..., s k-1One of them the access same data object, not with this accessing operation s kAdd in formation S.
Step 12) calculate each accessing operation s in formation S iCorresponding match bandwidth f i
As mentioned above, described match bandwidth is actually take this accessing operation corresponding data object length and access beats step-length as variable, the bandwidth that obtains based on match bandwidth function.Described match bandwidth function is to go in the storage system of fragmentation at needs, beat step-length as variable take data object length and access, based on one group in the beat benchmark read/write band data of the measurement under step-length of different pieces of information object length and access, the bandwidth function expression that obtains by approximating method.In one embodiment of the invention, obtain through the following steps the expression formula of match bandwidth function:
Step a) is gone in the storage system of fragmentation at needs, take data object length x and access, beats step-length y as variable, measures the benchmark read/write band data under one group of different x and y, and is as shown in table 1.X wherein i, y jRepresenting the different values of x and y, is f through the corresponding read/write bandwidth of actual measurement ij, described different x and y refer to the incremental manner of variable, include but not limited to that equal difference increases progressively, geometric ratio increases progressively etc.
Table 1
Parameter y 0 y 1 ... y n
x 0 f 00 f 01 ... f on
x 1 f 10 f 11 ... f 1n
... ... ... ... ...
x m f m0 f m2 ... f mn
Step b) from x, selected master variable in y, another is time variable.
Step c) find master variable value (for example, the x that approaches the most with master variable 0~x mIn a value), namely when master variable is x, find x iMake Δ x=h (x iThe absolute value of)-h (x) is minimum, when master variable is y, finds y iMake Δ y=h (y iThe absolute value of)-h (y) is minimum, and wherein (as h (x)=x), logarithmic function is (as h (x)=log including but not limited to linear function for function h (x) 2X) etc.
Steps d) find the interval of time variable, namely when time variable is y, find y jMake y ∈ [y j, y j+1], when inferior variable is x, find x jMake x ∈ [x j, x j+1].
Step e) based on reference data match x, the read/write bandwidth f (x, y) when y is any value: get in proportion a rational value as fitting result in the interval corresponding bandwidth interval of inferior variable-value, namely when time variable is y, make f ( x , y ) = f ij + 1 + h ( y ) - h ( y j ) h ( y j + 1 ) - hy j ) | f ij - f ij + 1 | + g ( Δx ) , Wherein, the further correction of g (Δ x) expression to fitting result.When inferior variable is x, make f ( x , y ) = f ij + h ( x ) - h ( x j ) h ( x j + 1 ) - h ( x j ) | f ij - f i + 1 j | + g ( Δy ) , Wherein, the further correction of g (Δ y) expression to fitting result.
By above-mentioned steps a)-step e) obtained the expression formula of match bandwidth function, wherein comprise two variable data object length x and the access step-length y that beats.Described data object length can be the physical length take byte as unit, can be also the logic length of self-defined unit, the described access step-length of beating refers to the variation distance of access location before or after this data object of access, this distance can be the physical distance take byte as unit, can be also the logical reach of self-defined unit.
For example, measured reference data can be the wide data of tape reading, for data object length x and the access step-length of beating, can adopt from small to large by 2 integer power as value mode, namely gets x i=2 i, y j=2 j, according to x i, y jFrom first to last read several times a large file, use mean value as f ijFor each x iReference bandwidth data while also needing measuring sequence to read (namely accessing the step-length y=0 that beats), the band data while namely not having fragment (can be referred to as continuous bandwidth).Here can select master variable is visit data size x, and inferior variable is the step-length y that beats.Then find the value (x near master variable 0~x mIn a value) and time variable-value interval.Make h (x)=log 2X, Δ x=i-log 2X, i=[log 2X+0.5] time Δ x minimum.Order
Figure BDA00003520751100065
, inferior variable-value interval is [2 j, 2 j+1], when y=0, interval is [0,1].Make g (Δ x)=Δ x * (f ij-f Ij+1), as described above, can obtain the wide fitting function that reads tape f ( x , y ) = f ij + 1 + h ( y ) - h ( y j ) h ( y j + 1 ) - hy j | f ij - f ij + 1 | + g ( Δx ) .
Then,, according to resulting match bandwidth function, calculate each accessing operation s in formation S iCorresponding match bandwidth f i=f (l i, | o i-o i-1-l i-1|).As seen, this match bandwidth is actually length (for example, the l according to data object i) and the access location of access before or after this object change distance (for example, | o i-o i-1-l i-1|), obtain by above-mentioned the numerical value that match bandwidth function calculates.Receive, arrive delay upper bound (for avoiding fragmentation, introducing the threshold value that too much delay arranges) when for example accessing forecasting sequence, start to go to step 2 when perhaps receiving queue S is full etc.) continue to carry out.
Continuation is with reference to figure 1, in step 2) certain accessing operation starts from the access forecasting sequence, if surpass the read/write bandwidth of looking ahead of data object of continuous a plurality of accessing operations of predetermined number all less than bandwidth threshold, the data object of these accessing operations be written to new continuous storage space according to the order of sequence.Like this, the method is gone fragmentation by the match bandwidth selection that above calculates to those data objects, to reduce the current bandwidth that writes is brought negative effect as far as possible.If surpass the read/write bandwidth of looking ahead of data object of continuous a plurality of accessing operations of predetermined number all greater than bandwidth threshold, skip these accessing operations, continue to process ensuing accessing operation.
Wherein, described bandwidth threshold can be static threshold, can go as required the maximum bandwidth value of the storage system of fragmentation to determine.The number percent that for example can bandwidth threshold be set to the storage system maximum bandwidth, as more than 60% of storage system maximum bandwidth; When described bandwidth threshold also can be dynamic threshold, can determine according to continuous bandwidth, this continuous bandwidth is to calculate gained from fitting function, i.e. the access step-length of beating is the match bandwidth f (x, 0) of 0 o'clock.For example can bandwidth threshold be set to be set to the number percent of continuous bandwidth, as more than 70% of continuous bandwidth.
Avoid simultaneously the impact of write operation on the entire system performance in order more effectively to reduce system fragmentation program, In yet another embodiment, in step 2) will access forecasting sequence according to three conditions and be divided into the three continuous subsequences of class, and all kinds of subsequences are carried out different disposal.As described above, at formation S:<s 1..., s nWhat preserve is the access forecasting sequence that receives, and from first to last scans receiving queue S, should access forecasting sequence according to following three conditions and be divided into a plurality of subsequences:
Condition 1:s i..., s i+mMeet f i+1〉=T Fi+1..., f i+m〉=T Fi+m, m 〉=T m, f I+m+1<T Fi+m+1Or i+m=n; Here show certain accessing operation (s for example from the access forecasting sequence i) start, surpassing a certain number of (is T m) the match bandwidth of continuous a plurality of accessing operations all greater than bandwidth threshold, that is to say that the degree of fragmentation of data object of these accessing operations is lower, can ignore this class subsequence.This moment s i..., s i+mCorresponding data object need not to be written to new continuous memory location, to s i..., s i+mDo not need any processing, continue the next subsequence of identification.Wherein, T fiExpression is for s iThe bandwidth threshold of match bandwidth, T mFor a predefined integer threshold value.As described above, bandwidth threshold T fiCan be for described bandwidth threshold can be static threshold, as being set to go more than 60% of storage system maximum bandwidth of fragment; In the time of also can being dynamic threshold, as being set to, can be set to more than 70% of continuous bandwidth, it is the match bandwidth f (x, 0) of 0 o'clock that this continuous bandwidth refers to access the step-length of beating.
Condition 2:s i..., s i+mMeet f i<T fi..., f i+m<T Fi+m, m 〉=T ' m, f I+m+1〉=T Fi+m+1Or i+m=n; Here show from certain accessing operation (s for example from the access forecasting sequence i) start, surpass a certain number of (T ' for example m) the match bandwidth of continuous a plurality of accessing operations all less than bandwidth threshold, that is to say that the degree of fragmentation of data object of these accessing operations is higher, need to go fragmentation to process to this class subsequence.This moment s i..., s i+mCorresponding data object need to be written to new continuous storage.Wherein, T fiExpression is for s iThe bandwidth threshold of match bandwidth, T ' mFor a predefined integer threshold value.
For satisfying condition 2 subsequence s i..., s i+m, can be with s i..., s i+mCorresponding data object is written to new continuous storage space according to the order of sequence, and the access entrance of these data objects is changed to new position.Wherein, institute's data writing object can read from local memory location, also can obtain from other equipment by network.
Condition 3, except all the other situations of condition 1 and condition 2.For example, s i..., s i+mDo not satisfy condition 1 or condition 2, simultaneously i+m=n, perhaps s I+m+1..., s I+m+kSatisfy condition 1 or condition 2.
Divide the subsequence of condition, for example s for not meeting (1) and (2) i..., s i+m, calculate the overall bandwidth of this sequence, namely carry out the overall bandwidth that in this sequence, all accessing operations produce, rather than the bandwidth of certain accessing operation.(for example threshold value a), illustrates that the data object degree of fragmentation of this sequence access is not high, needn't process this sequence if the overall bandwidth of this sequence is greater than certain threshold value.Otherwise if overall bandwidth, less than threshold value a, is calculated the hypothesis continuous bandwidth of this sequence, this hypothesis continuous bandwidth refers to if the bandwidth after the data object of accessing operation in this sequence is deposited continuously.If overall bandwidth is less than threshold value a and suppose that continuous bandwidth is greater than certain threshold value (for example threshold value b), illustrate if will be written to the Coutinuous store zone to data object corresponding to these sequences and will effectively improve bandwidth, therefore can process accordingly data object corresponding to these sequences, i.e. this subsequence s i..., s i+mCorresponding data object is written to new continuous storage space., if overall bandwidth is less than threshold value a and suppose continuous bandwidth less than threshold value b,, even illustrate that will be written to the Coutinuous store zone to data object corresponding to these sequences can not improve bandwidth, therefore this sequence is not processed.Wherein, the setting of threshold value a and threshold value b can be with reference to bandwidth threshold setting mentioned above.
Wherein overall bandwidth and hypothesis continuous bandwidth are calculated as follows:
s i..., s i+mOverall bandwidth numerical value f = l i + · · · + l i + m l i f i + · · · + l i + m f i + m ;
Obtain and can write s i..., s i+mThe Coutinuous store space reference position o of corresponding data object also calculates hypothesis continuous bandwidth f '=f (l i+ ... + l i+m, | o-o ' i-1-l i-1|), if f<T fAnd f '>T ' f, with s i..., s i+mCorresponding data object is written to new continuous storage space according to the order of sequence, and the access entrance of these data objects is changed to new position.Wherein, o ' i-1And l i-1For forecasting sequence s i-1The up-to-date reference position of corresponding data object (may be written to new position) and size, T fAnd T ' fSetting can be with reference to bandwidth threshold setting mentioned above, institute's data writing object can read from local memory location, also can obtain from other equipment by network.
S is disposed when formation, after emptying formation S, continues to receive another access forecasting sequence.
In order to understand better the above-mentioned fragment method of going, below in conjunction with table 2 and Fig. 2 to going Fragmentation to illustrate in more detail.Table 2 has provided a concrete access forecasting sequence that goes Fragmentation to receive, and wherein comprises sequence number i and the corresponding long l of data object of accessing operation in this access forecasting sequence iThe reference position o of degree and this data object i
Table 2
Sequence number 1 2 3 4 5 6 7 8 9 10 11 12 13 14
l i 3 2 5 6 4 3 3 5 4 3 2 3 3 3
o i 10 20 30 50 3 10 7 35 40 13 16 20 7 23
Wherein, the length of setting receiving queue S is 12, during original state, supposes that the last stored access location is 0, and concrete implementation is:
1. perform step 201, receive successively the accessing operation 1~14 in the access forecasting sequence, accessing operation of every reception goes to step 202.
2. perform step 202, determine whether it is the accessing operation of repetition; If it is go to step 201; If not, go to step 203.
Particularly, if the accessing operation institute visit data object that has received in the accessing operation of current reception and formation is identical, ignore the accessing operation of current reception, go to step 201, to avoid the access bandwidth of the same data object of double counting, for example, receive accessing operation 6,13 o'clock, accessing operation 6,13 and 1,7 visit data object identical (reference position is identical with data length).
3. perform step 203, calculate the match bandwidth of this accessing operation.
Particularly, obtain the step-length of beating of data object of tail entry of the relative formation S of accessing operation institute visit data object of current reception, the match bandwidth function that data object size and the step-length input of beating of obtaining are obtained as mentioned, and with data object size, data object reference position and the information recording /s such as match bandwidth that calculate arrive the formation afterbody, for example, in accessing operation 1, data object size is 3, the step-length of beating is 10 (from last visit position 0, jumping to data object reference position 10), and the match bandwidth is f 1=f (3,10), in accessing operation 2, data object size is 2, the step-length of beating is 7 (from the data object end position 13 of accessing operation 1, jumping to data object reference position 20), match bandwidth f 2=f (2,7).
4. perform step 204, judge whether the processing queue condition meets.For example when formation is full or arrive delay upper bound (avoiding fragmentation to introduce too much delay), processing queue from first to last, otherwise continue to receive the access forecasting sequence.
Carry out repeatedly step 201~204, until access is after forecasting sequence receives, or formation completely reaches treatment conditions, formation S state (access 6,13 is repeated accesses, ignores reception) as shown in table 3:
Table 3
Entries in queues s 1 s 2 s 3 s 4 s 5 s 6
l i 3 2 5 6 4 3
o i 10 20 30 50 3 7
f i f(3,10) f(2,7) f(5,8) f(6,15) f(4,53) f(3,0)
Entries in queues s 7 s 8 s 9 s 10 s 11 s 12
l i 5 4 3 2 3 3
o i 35 40 12 15 20 23
f i f(5,25) f(4,0) f(3,32) f(2,0) f(3,3) f(3,0)
5. perform step 205, scan queue S also according to condition divides subsequence as described above.
Particularly, scan queue item s from first to last 1~s 12, divide subsequence according to three conditions: condition 1, s i..., s i+mMeet f i+1〉=T Fi+1..., f i+m〉=T Fi+m, m 〉=T m, f I+m+1<T Fi+m+1Or m=12; Condition 2, s i+1..., s i+mMeet f i<T fi..., f i+m<T Fi+m, m 〉=T ' m, f I+m+1〉=T Fi+m+1Or m=12; Condition 3, except all the other situations of condition 1 and condition 2 (do not satisfy condition 1 or condition 2, m=12 or s simultaneously m+1..., s m+kSatisfy condition 1 or condition 2), wherein, T fi=0.75 * f (l i, 0), T m, T ' mRound numbers 4.After been scanned, comparative result is as shown in table 4, wherein, and s 1~s 5The match bandwidth is less, and 2, s satisfies condition 6~s 8The match amount of bandwidth differs, and 3, s satisfies condition 9~s 12The match bandwidth is large (except s 9), satisfy condition 1, simultaneously subsequence is divided result and made corresponding mark in formation.
Table 4
Figure BDA00003520751100101
Figure BDA00003520751100111
6. perform step 206, the judgement subsequence satisfies condition; , if meet bar 1, go to step 209; , if meet bar 2, install step 208; , if meet bar 3, install step 207.
Particularly, according to the mark result of step 205, s 1~s 5Satisfy condition 2, go to step 208.
7. perform step 208, with s 1~s 5Data object be written to new storage space
Particularly, find and can write s 1~s 5Data object the Coutinuous store space reference position and write, suppose to find in this example reference position 100, with s 1~s 5Data object be written to respectively 100~102,103~104,105~109,110~115, in 116~119 storage space, and the access entrance of these data objects is changed to new memory location.
8. perform step 209, with s 1~s 5Remove (queue head pointer is pointed to s6) from formation
9. perform step 210 (formation this moment non-NULL, go to step 206)
10. perform step 206, the judgement subsequence satisfies condition.
Particularly, according to the mark result of step 205, s 6~s 8Satisfy condition 3, go to step 207.
11. perform step 207, judgement subsequence s 6~s 8Whether meet Writing condition.
Particularly, obtain and can write s 6~s 8The storage space reference position o of data object, the calculated population bandwidth
Figure BDA00003520751100112
Calculate hypothesis and write bandwidth f '=f (l 6+ l 7+ l 8, | o-o ' 5-l 5|), as calculated, o=120 (s 5End position), o ' 5=116 (the new writing positions of the data object of s5), l 5=4, f '=f (12,0), threshold value T f=T ' f=0.75 * f (l 6+ l 7+ l 8, 0)=0.75 * f (12,0) (s i~s i+mCorresponding threshold value T f=T ' f=0.75 * f (l i+ ... + l i+m, 0)).At this moment, meet f<T fAnd f '>T ' f, turn 208.
12. perform step 208, with s 6~s 8Data object be written to new Coutinuous store space
Particularly, with s 6~s 8Data object be written to respectively according to the order of sequence 120~122,123~127, in 128~131 storage space, and the access entrance of these data objects is changed to new memory location.
13. perform step 209, with subsequence s 6~s 8Remove from formation and (queue head pointer is pointed to s 9)
14. perform step 210 (formation this moment non-NULL, turn 206)
15. perform step 206, the judgement subsequence satisfies condition
Particularly, according to the mark result of step 205, s 9~s 12Satisfy condition 1, turn 209.
16. perform step 209, with s 9~s 12Remove (queue head pointer is pointed to s1) from formation
17. perform step 210 (formation this moment, for empty, turns 201)
The inventor also finds in storage system in content, utilizes the backup load under true environment to test above-mentioned method.Test result shows,, along with the growth of data volume, reads bandwidth and has improved 12%~60%, and data redundancy is controlled at 1%~2%, has substantially eliminated and has read the continuous downward trend of bandwidth; In same system, utilize the test result of the data synchronized loading in two weeks by a definite date to show, read bandwidth and improved 5~8 times, the degradation that reads bandwidth of having avoided the data fragmentation to cause.
Although the present invention is described by preferred embodiment, yet the present invention is not limited to embodiment as described herein, also comprises without departing from the present invention various changes and the variation done.

Claims (11)

1. one kind is gone the fragmentation method, and described method comprises:
Step 1) determine the expection read/write bandwidth of the data object of each accessing operation in the access forecasting sequence, the described read/write bandwidth of looking ahead is to beat step-length as variable, the match bandwidth that obtains based on match bandwidth function take this accessing operation corresponding data object length and access;
Step 2) certain accessing operation starts from the access forecasting sequence, less than bandwidth threshold, the data object of these accessing operations is written to according to the order of sequence new continuous storage space for the read/write bandwidth of looking ahead of the data object of the continuous a plurality of accessing operations that surpass predetermined number and each accessing operation.
2. method according to claim 1, described step 1) in, described match bandwidth function is to go in the storage system of fragmentation at needs, beat step-length as variable take data object length and access, based on the benchmark read/write band data of in different pieces of information object length and access, beating under step-length and measuring, the bandwidth function expression that obtains by approximating method.
3. method according to claim 2, described match bandwidth function obtains through the following steps:
Step a) is gone in the storage system of fragmentation at needs, take data object length x and access, beats step-length y as variable, measures the benchmark read/write band data under one group of different x and y;
Step b) from x, selected master variable in y, another is time variable;
Step c) find the master variable value that approaches the most with master variable;
Steps d) find the interval of time variable;
Step e) based on measured different x and the benchmark read/write band data match x under y, the read/write bandwidth f (x, y) when y is any value.
4. method according to claim 3, described step c) in, when master variable is x, find x iMake Δ x=h (x iThe absolute value of)-h (x) is minimum; When master variable is y, find y iMake Δ y=h (y iThe absolute value of)-h (y) is minimum;
Described steps d) in, when inferior variable is y, find y jMake y ∈ [y j, y j+1], when inferior variable is x, find x jMake x ∈ [x j, x j+1];
Step e) in, when inferior variable is y, make f ( x , y ) = f ij + 1 + h ( y ) - h ( y j ) h ( y j + 1 ) - hy j | f ij - f ij + 1 | + g ( Δx ) , Wherein, the further correction of g (Δ x) expression to fitting result; When inferior variable is x, make f ( x , y ) = f ij + h ( x ) - h ( x j ) h ( x j + 1 ) - h ( x j ) | f ij - f i + 1 j | + g ( Δy ) , Wherein, the further correction of g (Δ y) expression to fitting result.
5. one of according to claim 1-4 described methods, also comprising step 3) certain accessing operation starts from the access forecasting sequence, the read/write band of looking ahead for the data object of the continuous a plurality of accessing operations that surpass predetermined number and each accessing operation is wider than bandwidth threshold, skip these accessing operations, continue to process ensuing accessing operation.
6. method according to claim 5, also comprise described step 4), in the access forecasting sequence not by step 2) or step 3) subsequence s that process, that formed by continuous a plurality of accessing operations i...., s i+mIf the overall bandwidth of this subsequence less than the hypothesis continuous bandwidth of certain threshold value and this subsequence greater than certain threshold value, the data object of each accessing operation in this subsequence is written to new continuous storage space, and the access entrance of these data objects is changed to new position; Wherein, described hypothesis continuous bandwidth represents if the bandwidth after the data object of accessing operation in this subsequence is deposited continuously; Subsequence s i..., s i+mOverall bandwidth
Figure FDA00003520751000021
l iFor s iThe length of data object, f iFor s iThe expection read/write bandwidth of data object, m is the number of accessing operation in this subsequence.
7. method according to claim 5, described bandwidth threshold goes maximum bandwidth value or the continuous bandwidth of the storage system of fragmentation to determine as required, described continuous bandwidth is that step-length is 0 in the situation that access is beated, the match bandwidth that obtains based on match bandwidth function.
8. one kind is gone the fragmentation system, and described system comprises:
Bandwidth is determined device, the expection read/write bandwidth that is used for the data object of definite each accessing operation of access forecasting sequence, the described read/write bandwidth of looking ahead is to beat step-length as variable, the match bandwidth that obtains based on match bandwidth function take this accessing operation corresponding data object length and access;
Remove the fragment device, be used for from certain accessing operation of access forecasting sequence, less than bandwidth threshold, the data object of these accessing operations is written to according to the order of sequence new continuous storage space for the read/write bandwidth of looking ahead of the data object of the continuous a plurality of accessing operations that surpass predetermined number and each accessing operation.
9. system according to claim 8, wherein, described match bandwidth function is to go in the storage system of fragmentation at needs, beat step-length as variable take data object length and access, based on the benchmark read/write band data of in different pieces of information object length and access, beating under step-length and measuring, the bandwidth function expression that obtains by approximating method.
10. according to claim 8 or claim 9 system, the described fragment device that goes also is used for from certain accessing operation of access forecasting sequence, the read/write band of looking ahead for the data object of the continuous a plurality of accessing operations that surpass predetermined number and each accessing operation is wider than bandwidth threshold, skip these accessing operations, continue to process ensuing accessing operation.
11. system according to claim 10, the described fragment device that goes also is used for all being greater than or less than subsequence s bandwidth threshold, that consist of continuous a plurality of accessing operations for the read/write bandwidth of looking ahead that does not meet the data object of the continuous a plurality of accessing operations that surpass predetermined number i..., s i+mIf the overall bandwidth of this subsequence less than the hypothesis continuous bandwidth of certain threshold value and this subsequence greater than certain threshold value, the data object of each accessing operation in this subsequence is written to new continuous storage space, and the access entrance of these data objects is changed to new position; Wherein, described hypothesis continuous bandwidth represents if the bandwidth after the data object of accessing operation in this subsequence is deposited continuously; Subsequence s i..., s i+mOverall bandwidth
Figure FDA00003520751000031
l iFor s iThe length of data object, f iFor s iThe expection read/write bandwidth of data object, m is the number of accessing operation in this subsequence.
CN201310298326.2A 2013-07-16 2013-07-16 Go flaking method and system Active CN103389946B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310298326.2A CN103389946B (en) 2013-07-16 2013-07-16 Go flaking method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310298326.2A CN103389946B (en) 2013-07-16 2013-07-16 Go flaking method and system

Publications (2)

Publication Number Publication Date
CN103389946A true CN103389946A (en) 2013-11-13
CN103389946B CN103389946B (en) 2016-08-10

Family

ID=49534224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310298326.2A Active CN103389946B (en) 2013-07-16 2013-07-16 Go flaking method and system

Country Status (1)

Country Link
CN (1) CN103389946B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021512A (en) * 2017-11-22 2018-05-11 深圳忆联信息系统有限公司 A kind of solid state hard disc mapping management process and solid state hard disc
CN108282378A (en) * 2017-01-05 2018-07-13 阿里巴巴集团控股有限公司 A kind of method and apparatus of monitoring traffic in network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5930828A (en) * 1997-03-26 1999-07-27 Executive Software International Real-time apparatus and method for minimizing disk fragmentation in a computer system
US20020129200A1 (en) * 2001-03-08 2002-09-12 Yutaka Arakawa Apparatus and method for defragmentation in disk storage system
US20070050390A1 (en) * 2005-08-24 2007-03-01 Maynard Nicholas C Method and Apparatus For The Defragmentation Of A File System
CN101460932A (en) * 2006-06-08 2009-06-17 Nxp股份有限公司 Device for remote defragmentation of an embedded device
CN102929884A (en) * 2011-08-10 2013-02-13 阿里巴巴集团控股有限公司 Method and device for compressing virtual hard disk image file

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5930828A (en) * 1997-03-26 1999-07-27 Executive Software International Real-time apparatus and method for minimizing disk fragmentation in a computer system
US20020129200A1 (en) * 2001-03-08 2002-09-12 Yutaka Arakawa Apparatus and method for defragmentation in disk storage system
US20070050390A1 (en) * 2005-08-24 2007-03-01 Maynard Nicholas C Method and Apparatus For The Defragmentation Of A File System
CN101460932A (en) * 2006-06-08 2009-06-17 Nxp股份有限公司 Device for remote defragmentation of an embedded device
CN102929884A (en) * 2011-08-10 2013-02-13 阿里巴巴集团控股有限公司 Method and device for compressing virtual hard disk image file

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108282378A (en) * 2017-01-05 2018-07-13 阿里巴巴集团控股有限公司 A kind of method and apparatus of monitoring traffic in network
CN108282378B (en) * 2017-01-05 2021-11-09 阿里巴巴集团控股有限公司 Method and device for monitoring network flow
CN108021512A (en) * 2017-11-22 2018-05-11 深圳忆联信息系统有限公司 A kind of solid state hard disc mapping management process and solid state hard disc

Also Published As

Publication number Publication date
CN103389946B (en) 2016-08-10

Similar Documents

Publication Publication Date Title
US10303596B2 (en) Read-write control method for memory, and corresponding memory and server
US8996791B2 (en) Flash memory device, memory control device, memory control method, and storage system
US9665587B2 (en) Selective fragmentation repair
US20120136842A1 (en) Partitioning method of data blocks
US8255406B2 (en) Data management using multi-state bloom filter
CN103116536B (en) The capacity check method of memory storage
CN105138282A (en) Storage space recycling method and storage system
CN112286459A (en) Data processing method, device, equipment and medium
US20140351628A1 (en) Information processing device, control circuit, computer-readable recording medium for control program, and control method
CN110888851B (en) Method and device for creating and decompressing compressed file, and electronic and storage device
CN106598997B (en) Method and device for calculating text theme attribution degree
US9483494B1 (en) Opportunistic fragmentation repair
CN110504002B (en) Hard disk data consistency test method and device
US11934696B2 (en) Machine learning assisted quality of service (QoS) for solid state drives
CN103389946A (en) Fragmentization removal method and system
US20110107056A1 (en) Method for determining data correlation and a data processing method for a memory
CN110851434A (en) Data storage method, device and equipment
US8898389B2 (en) Managing high speed memory
CN116700623A (en) Data storage method, system, electronic equipment and storage medium
US20070130185A1 (en) Efficient deletion of leaf node items within tree data structure
CN107943415B (en) Method and system for searching free cluster based on FAT file system
CN105335296B (en) A kind of data processing method, apparatus and system
CN112882659B (en) Information obtaining method and device, electronic equipment and storage medium
CN107329807B (en) Data delay processing method and device, and computer readable storage medium
CN111880735B (en) Data migration method, device, equipment and storage medium in storage system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant