CN103389946B - Go flaking method and system - Google Patents

Go flaking method and system Download PDF

Info

Publication number
CN103389946B
CN103389946B CN201310298326.2A CN201310298326A CN103389946B CN 103389946 B CN103389946 B CN 103389946B CN 201310298326 A CN201310298326 A CN 201310298326A CN 103389946 B CN103389946 B CN 103389946B
Authority
CN
China
Prior art keywords
bandwidth
access
data object
variable
read
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310298326.2A
Other languages
Chinese (zh)
Other versions
CN103389946A (en
Inventor
严得辰
刘立坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201310298326.2A priority Critical patent/CN103389946B/en
Publication of CN103389946A publication Critical patent/CN103389946A/en
Application granted granted Critical
Publication of CN103389946B publication Critical patent/CN103389946B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides one to remove flaking method, first the method determines the expection read/write bandwidth of each data object accessing operation in access forecasting sequence, the described read/write bandwidth that prefetches is with data object length corresponding to this access operation and access step-length of beating as variable, the matching bandwidth obtained based on matching bandwidth function.Then from the beginning of from access forecasting sequence, certain accesses operation, the read/write bandwidth that prefetches of continuous multiple access operations and each data object accessing operation for exceeding predetermined number is less than bandwidth threshold, and these data objects accessing operation are sequentially written to new continuous print memory space.The method can accurately judge fragmentation data, in the case of little effect current system performance, improves reading performance, and the environment in change has preferable adaptability.

Description

Go flaking method and system
Technical field
The invention belongs to Computer Storage field, the method particularly relating to process fragmentation of data.
Background technology
Fragmentation refers to that continuous print file data the most relatively is stored after storage medium in logic, and its storage position becomes Obtain relative distribution, bring the operation of a large amount of random read-write when data access.Storage system based on disk for current main flow For system, fragmentation can cause the continuous decline of read/write bandwidth, accordingly, it would be desirable to one removes flaking method, reduces fragmentation Impact, reduces random read-write and accesses.
Existing go flaking method be typically input access forecasting sequence, according to this access forecasting sequence identification fragmentation Fragmentation data is also written to new memory area by data, thus reaches the purpose of fragmentation.Described access forecasting sequence It is a kind of prediction to the most most possible access order, there is multiple Forecasting Methodology, but inherently can be divided into two kinds: (1) static prediction, as the logic storage order according to data is predicted;(2) dynamic prediction, as according to current accessed sequential prediction.
During removing fragmentation, it is usually required mainly for solve two problems: (1) which type of data object is fragmentation number According to, if i.e. according to accessing forecasting sequence access current storage system, the access of which data object can cause read/write bandwidth to go out Existing more decline;(2) which data should be written to new storage position, thus is reducing what random read-write in the future accessed Meanwhile, the current performance of little effect system as far as possible.Existing subject matter present in flaking method is gone to show as follows Several aspects: first, the judgement to fragmentation data is not accurate enough, it is impossible to effectively improving performance;Secondly, Fragmentation is removed In, the data volume needing movement is too much, and during causing operation, declining significantly occurs in systematic function.
Summary of the invention
Therefore, it is an object of the invention to overcome the defect of above-mentioned prior art, it is provided that a kind of new remove flaking method.
It is an object of the invention to be achieved through the following technical solutions:
On the one hand, the invention provides one goes flaking method, described method to include:
Step 1) determine the expection read/write bandwidth of each data object accessing operation in access forecasting sequence, described pre- Taking read/write bandwidth is with data object length corresponding to this access operation and access step-length of beating as variable, based on matching bandwidth Function and the matching bandwidth that obtains;
Step 2) from accessing in forecasting sequence from the beginning of certain access operation, for exceeding continuous multiple access of predetermined number The read/write bandwidth that prefetches of operation and each data object accessing operation is less than bandwidth threshold, and these are accessed the data of operation Object is sequentially written to new continuous print memory space.
In the above-mentioned methods, described step 1) in, described matching bandwidth function can be to need the storage of fragmentation In system, with data object length and access step-length of beating as variable, beat step based in different pieces of information object length and access The long lower benchmark read/write bandwidth data measured, the bandwidth function expression obtained by approximating method.
In the above-mentioned methods, described matching bandwidth function can obtain through the following steps:
Step a), in the storage system needing fragmentation, beats step-length y for becoming with data object length x and access Amount, measures the benchmark read/write bandwidth data under one group of difference x and y;
Step b) is a selected master variable from x, y, and another is time variable;
Step c) find with master variable closest to master variable value;
Step d) finds the interval of time variable;
Step e) is based on the benchmark read/write bandwidth data matching x under measured different x and y, when y is any value Read/write bandwidth f (x, y).
In the above-mentioned methods, in described step c), when master variable is x, find xiMake Δ x=h (xi)-h (x) absolute Value minimum;When master variable is y, find yiMake Δ y=h (yi) absolute value of-h (y) is minimum;
In described step d), when secondary variable is y, find yjMake y ∈ [yj, yj+1], when secondary variable is x, find xjMake x ∈[xj, xj+1];
In step e), when secondary variable is y, make f ( x , y ) = f ij + 1 + h ( y ) - h ( y j ) h ( y j + 1 ) - hy j | f ij - f ij + 1 | + g ( Δx ) , Wherein, (Δ x) represents the further correction to fitting result to g;When secondary variable is x, make f ( x , y ) = f ij + h ( x ) - h ( x j ) h ( x j + 1 ) - h ( x j ) | f ij - f i + 1 j | + g ( Δy ) , Wherein, (Δ y) represents to enter fitting result g One step correction.
In the above-mentioned methods, may also include step 3) from accessing in forecasting sequence from the beginning of certain access operation, for exceeding Continuous multiple access operations of predetermined number and the read/write bandwidth that prefetches of each data object accessing operation are more than bandwidth threshold Value, then skip these and access operation, continues with ensuing access and operates.
In the above-mentioned methods, may also include described step 4), for accessing in forecasting sequence not by step 2) or step 3) Subsequence s that process, that be made up of continuous multiple access operationsi..., si+mIf the overall bandwidth of this subsequence is less than certain The hypothesis continuous bandwidth of individual threshold value and this subsequence is more than certain threshold value, then by this subsequence, each accesses the data pair operated As being written to new continuous print memory space, and the access entrance of these data objects is changed to new position;Wherein, described Assume that continuous bandwidth represents if the bandwidth after being deposited continuously by the data object accessing operation in this subsequence;Subsequence si..., si+mOverall bandwidthliFor siThe length of data object, fiFor siData object pre- Phase read/write bandwidth, m is the number accessing operation in this subsequence.
In the above-mentioned methods, described bandwidth threshold can go as required fragmentation storage system maximum bandwidth value or Continuous bandwidth determines, described continuous bandwidth is to beat in the case of step-length is 0 in access, obtains based on matching bandwidth function Matching bandwidth.
Another aspect, the present invention also provides for one and goes fragmentation system, described system to include:
Bandwidth determines device, accesses the expection read/write of each data object accessing operation in forecasting sequence for determining Bandwidth, described in prefetch read/write bandwidth be with data object length corresponding to this access operation and access step-length of beating as variable, base The matching bandwidth obtained in matching bandwidth function;
Remove crumb units, for from accessing in forecasting sequence from the beginning of certain access operation, for exceeding the company of predetermined number Continue multiple read/write bandwidth that prefetches accessing operation and each data object accessing operation and be less than bandwidth threshold, these are accessed The data object of operation is sequentially written to new continuous print memory space.
In said system, described matching bandwidth function can be in the storage system needing fragmentation, with data Object length and access step-length of beating is variable, based in different pieces of information object length with access the benchmark measured under step-length of beating Read/write bandwidth data, the bandwidth function expression obtained by approximating method.
In said system, described in go crumb units to can be additionally used in from accessing from the beginning of certain accesses operation in forecasting sequence, The read/write bandwidth that prefetches of continuous multiple access operations and each data object accessing operation for exceeding predetermined number is more than Bandwidth threshold, then skip these and access operation, continues with ensuing access and operates.
In said system, described in go crumb units to can be additionally used in the continuous multiple visits for being unsatisfactory for exceeding predetermined number Ask operation data object prefetch read/write bandwidth both greater than or less than bandwidth threshold, operated by continuous multiple access and constitute Subsequence si..., si+mIf the overall bandwidth of this subsequence is less than certain threshold value and the hypothesis continuous bandwidth of this subsequence More than certain threshold value, then each data object accessing operation in this subsequence is written to new continuous print memory space, and The access entrance of these data objects is changed to new position;Wherein, described hypothesis continuous bandwidth represents if by this sub-sequence Row access the bandwidth after the data object of operation is deposited continuously;Subsequence si..., si+mOverall bandwidthliFor siThe length of data object, fiFor siThe expection read/write bandwidth of data object, m is this sub-sequence Row access the number of operation.
Compared with prior art, provided by the present invention remove flaking method, can accurately judge fragmentation number According to, in the case of little effect current system performance, improving reading performance, the environment in change has preferable adaptability.
Accompanying drawing explanation
Embodiments of the present invention is further illustrated referring to the drawings, wherein:
Fig. 1 is the schematic flow sheet removing flaking method according to one embodiment of the invention;
Fig. 2 is the schematic flow sheet removing flaking method according to another embodiment of the present invention.
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage are clearer, below in conjunction with accompanying drawing by concrete real The present invention is described in more detail to execute example.Should be appreciated that specific embodiment described herein only in order to explain the present invention, It is not intended to limit the present invention.
Access forecasting sequence and be actually a series of access operation, and read/write bandwidth is main and each access operates The size of the data object accessed and access change in location distance before or after accessing this data object (or can also It is referred to as access to beat step-length) relevant.Generally, the degree of fragmentation of the data object that read/write bandwidth is less in forecasting sequence is accessed Higher, it is contemplated that read/write bandwidth is a kind of posteriority index, it is impossible to obtain in advance.Therefore, in an embodiment of the present invention, Have employed read/write bandwidth approximating method, calculate the pre-of the data object that in access forecasting sequence, each access operation is accessed Phase read/write bandwidth (the most also referred to as accesses the matching bandwidth that operation is corresponding), and carries out fragmentation based on this index The judgement of degree.
Fig. 1 gives the schematic flow sheet removing flaking method according to an embodiment of the invention.The method includes: step Rapid 1) the expection read/write bandwidth of the data object every time accessing operation in access forecasting sequence is determined;Step 2) from accessing prediction In sequence certain access operation start, if it exceeds predetermined number continuous multiple access operation data objects prefetch reading/ Write bandwidth and be less than bandwidth threshold, these data objects accessing operation are sequentially written to new continuous print memory space.
With reference to Fig. 1, more specifically, in step 1) determine the pre-of the data object that in access forecasting sequence, access operates every time Phase read/write bandwidth.The described read/write bandwidth that prefetches is the data object length corresponding with this access operation and access step-length of beating is Variable, the matching bandwidth obtained based on matching bandwidth function.According to one embodiment of present invention, described step 1) can include The following step:
Step 11) receive the access operation accessed in forecasting sequence one by one, and it is saved in queue S: < s1..., snIn >.? Including but not limited to being obtained by certain mode (such as metadata access, index search etc.) in the information preserved in queue S Each original position o accessing the accessed data object of operationiAnd length l of data objecti.The access preserved in queue S is pre- Order-checking row can be the forecasting sequence (static prediction or dynamic prediction) directly obtained.It may also be pre-through pretreatment Order-checking row (such as removing the forecasting sequence of repetition);Such as, if the kth received accesses operation skWith s in queue S1..., sk-1One of them access same data object, the most not by this access operate skAdd in enqueue S.
Step 12) calculate each access operation s in queue SiCorresponding matching bandwidth fi
As mentioned above, described matching bandwidth is actually with data object length and the access of this access operation correspondence Step-length of beating is variable, the bandwidth obtained based on matching bandwidth function.Described matching bandwidth function is to remove fragmentation at needs Storage system in, with data object length and access step-length of beating as variable, based on one group in different pieces of information object length and Access the benchmark read/write bandwidth data of the measurement beated under step-length, the bandwidth function expression obtained by approximating method.At this In one embodiment of invention, obtain the expression formula of matching bandwidth function through the following steps:
Step a), in the storage system needing fragmentation, beats step-length y for becoming with data object length x and access Amount, measures the benchmark read/write bandwidth data under one group of difference x and y, as shown in table 1.Wherein xi, yjThe difference representing x and y takes Value, measuring corresponding read/write bandwidth through reality is fij, described different x and y refer to the incremental manner of variable, include but not limited to Difference is incremented by, and geometric ratio is incremented by.
Table 1
Parameter y0 y1 ... yn
x0 f00 f01 ... fon
x1 f10 f11 ... f1n
... ... ... ... ...
xm fm0 fm2 ... fmn
Step b) is a selected master variable from x, y, and another is time variable.
Step c) find with master variable closest to master variable value (such as, x0~xmIn a value), i.e. work as master When variable is x, find xiMake Δ x=h (xi) absolute value of-h (x) is minimum, when master variable is y, finds yiMake Δ y=h (yi) absolute value of-h (y) is minimum, wherein function h (x) is including but not limited to linear function (such as h (x)=x), logarithmic function (as H (x)=log2X) etc..
Step d) finds the interval of time variable, i.e. when secondary variable is y, finds yjMake y ∈ [yj, yj+1], when secondary change When amount is for x, find xjMake x ∈ [xj, xj+1]。
Step e) based on read/write bandwidth f that benchmark data matching x, y are during any value (x, y): in secondary variable-value district Between take a rational value in corresponding bandwidth interval in proportion as fitting result, i.e. when secondary variable is y, make f ( x , y ) = f ij + 1 + h ( y ) - h ( y j ) h ( y j + 1 ) - hy j ) | f ij - f ij + 1 | + g ( &Delta;x ) , Wherein, (Δ x) represents fitting result g Revise further.When secondary variable is x, make f ( x , y ) = f ij + h ( x ) - h ( x j ) h ( x j + 1 ) - h ( x j ) | f ij - f i + 1 j | + g ( &Delta;y ) , Its In, (Δ y) represents the further correction to fitting result to g.
Obtained the expression formula of matching bandwidth function by above-mentioned steps a)-step e), wherein comprise two variable data Object length x and access are beated step-length y.Described data object length can be the physical length in units of byte, it is also possible to Being the logic length of self-defined unit, described access step-length of beating refers to access position before or after this data object accessing Change distance, this distance can be the physical distance in units of byte, it is also possible to be the logical reach of self-defined unit.
Such as, measured benchmark data can be to read band data, can use for data object length x and access step-length of beating From small to large by 2 integer power as value mode, i.e. take xi=2i, yj=2j, according to xi, yjThe most from first to last read one Big file, using meansigma methods as fij.For each xiAlso need to benchmark during measuring sequence reading (i.e. accessing step-length y=0 of beating) Band data, the band data (continuous bandwidth can be referred to as) when the most there is not fragment.Here may select master variable is Accessing size of data x, secondary variable is step-length y of beating.Then value (the x closest to master variable is found0~xmIn a value) and Secondary variable-value is interval.Make h (x)=log2X, Δ x=i-log2X, then i=[log2X+0.5] time Δ x minimum.Order, Secondary variable-value interval is [2j, 2j+1], as y=0, interval is [0,1].Make g (Δ x)=Δ x × (fij-fij+1), the most as above Literary composition is described, can obtain the wide fitting function that reads tape f ( x , y ) = f ij + 1 + h ( y ) - h ( y j ) h ( y j + 1 ) - hy j | f ij - f ij + 1 | + g ( &Delta;x ) .
Then, according to obtained matching bandwidth function, calculate each access in queue S and operate siCorresponding matching bandwidth fi=f (li, | oi-oi-1-li-1|).Visible, this matching bandwidth is actually according to length (such as, the l of data objecti) and visit Ask before or after this object access change in location distance (such as, | oi-oi-1-li-1|), obtain matching bandwidth by above-mentioned The calculated numerical value of function.Receive when such as accessing forecasting sequence, arrive delay upper bound (for avoiding fragmentation The threshold value introducing too much delay and arrange), or start to go to step 2 when receiving queue S is full etc.) continue executing with.
With continued reference to Fig. 1, in step 2) from accessing in forecasting sequence from the beginning of certain access operation, if it exceeds predetermined number The read/write bandwidth that prefetch of data objects of continuous multiple access operations be both less than bandwidth threshold, these are accessed the number of operation New continuous print memory space sequentially it is written to according to object.So, the method by matching bandwidth selection calculated above to that A little data objects carry out fragmentation, are negatively affected presently written bandwidth to reduce as far as possible.If it exceeds predetermined number The read/write bandwidth that prefetches of continuous multiple data object accessing operation is both greater than bandwidth threshold, then skip these and access operation, continue Continuous process is ensuing accesses operation.
Wherein, described bandwidth threshold can be static threshold, can go the maximum belt of the storage system of fragmentation as required Width values determines.Bandwidth threshold such as can be set to the percentage ratio of storage system maximum bandwidth, such as storage system maximum bandwidth More than 60%;When described bandwidth threshold can also be dynamic threshold, can determine according to continuous bandwidth, this continuous bandwidth is from plan Closing function and calculate gained, i.e. accessing step-length of beating is matching bandwidth f (x, 0) when 0.Such as can be set to bandwidth threshold to set It is set to the percentage ratio of continuous bandwidth, such as more than the 70% of continuous bandwidth.
Avoid the write operation impact on systematic entirety energy, again to more effectively reduce system fragmentation program simultaneously In one embodiment, in step 2) according to three conditions, access forecasting sequence is divided into the three continuous subsequences of class, and to all kinds of sons Sequence carries out different disposal.As described above, queue S: < s1..., sn> preserve is received access forecasting sequence, from Head scans receiving queue S to tail, according to following three conditions this access forecasting sequence is divided into multiple subsequence:
Condition 1:si..., si+mMeet fi+1≥Tfi+1..., fi+m≥Tfi+m, m >=Tm, fi+m+1< Tfi+m+1Or i+m=n; Here show that certain accesses operation (such as s from accessing forecasting sequencei) start, exceed a certain number of (i.e. Tm) the most multiple The matching bandwidth accessing operation is both greater than bandwidth threshold, say, that these access the degree of fragmentation of the data object operated relatively Low, this kind of subsequence can be ignored.Now si..., si+mCorresponding data object is without being written to new continuous print storage position Put, to si..., si+mIt is not required to any process, continues to identify next subsequence.Wherein, TfiRepresent for siMatching bandwidth Bandwidth threshold, TmFor an integer thresholds set in advance.As described above, bandwidth threshold TfiCan be that described bandwidth threshold can To be static threshold, as may be configured as needing more than the 60% of the storage system maximum bandwidth of fragment;It can also be dynamic threshold During value, as may be configured as more than the 70% of continuous bandwidth, it is plan when 0 that this continuous bandwidth refers to access step-length of beating Crossed belt width f (x, 0).
Condition 2:si..., si+mMeet fi< Tfi..., fi+m< Tfi+m, m >=T 'm, fi+m+1≥Tfi+m+1Or i+m=n;This In show from from accessing forecasting sequence, certain accesses operation (such as si) start, exceed a certain number of (such as T 'm) continuous many The individual matching bandwidth accessing operation is both less than bandwidth threshold, say, that these access the degree of fragmentation of the data object operated Higher, need to go fragmentation to process this kind of subsequence.Now si..., si+mCorresponding data object needs to be written to New continuous print storage.Wherein, TfiRepresent for siThe bandwidth threshold of matching bandwidth, T 'mFor an integer set in advance Threshold value.
For meeting the subsequence s of condition 2i..., si+m, can be by si..., si+mCorresponding data object is sequentially written to New continuous print memory space, and the access entrance of these data objects is changed to new position.Wherein, write data pair As reading from locally stored position, it is also possible to obtained from other equipment by network.
Condition 3, except remaining situation of condition 1 and condition 2.Such as, si..., si+mIt is unsatisfactory for condition 1 or condition 2, simultaneously I+m=n, or si+m+1..., si+m+kMeet condition 1 or condition 2.
The subsequence of condition, such as s is divided for being unsatisfactory for (1) and (2)i..., si+m, calculate the overall band of this sequence Width, namely performs all overall bandwidth accessing operation generation rather than the bandwidth of certain access operation in this sequence.If More than certain threshold value, (such as threshold value a) then illustrates the data object degree of fragmentation that this sequence accesses to the overall bandwidth of this sequence The highest, it is not necessary to process this sequence.Whereas if overall bandwidth is less than threshold value a, then calculate the hypothesis continuous bandwidth of this sequence, should Assume that continuous bandwidth refers to if the bandwidth after being deposited continuously by the data object accessing operation in this sequence.If overall bandwidth is little In threshold value a and assume that (such as threshold value b) illustrates if the data pair will answered these sequence pair continuous bandwidth more than certain threshold value To be effectively improved bandwidth as being written to Coutinuous store region, the data object therefore can answered these sequence pair is carried out accordingly Process, i.e. this subsequence si..., si+mCorresponding data object is written to new continuous print memory space.If overall bandwidth It is less than threshold value b, even if illustrating to be written to the data object that these sequence pair are answered continuously less than threshold value a and hypothesis continuous bandwidth Memory area can not improve bandwidth, does not processes this sequence.Wherein, the setting of threshold value a and threshold value b can refer to Bandwidth threshold described in literary composition is arranged.
Wherein overall bandwidth and hypothesis continuous bandwidth calculate as follows:
si..., si+mOverall bandwidth numerical value f = l i + &CenterDot; &CenterDot; &CenterDot; + l i + m l i f i + &CenterDot; &CenterDot; &CenterDot; + l i + m f i + m ;
Acquisition can write si..., si+mCoutinuous store space original position o of corresponding data object also calculates hypothesis even Continuous bandwidth f '=f (li+…+li+m, | o-o 'i-1-li-1|), if f is < TfAnd f ' > T 'f, by si..., si+mCorresponding data Object is sequentially written to new continuous print memory space, and the access entrance of these data objects is changed to new position.Its In, o 'i-1And li-1For forecasting sequence si-1The up-to-date original position (new position may be had been written into) of corresponding data object And size, TfWith T 'fSetting can refer to bandwidth threshold mentioned above and arrange, write data object can be from locally stored Position is read, it is also possible to obtained from other equipment by network.
When queue S is disposed, after emptying queue S, continue to another and access forecasting sequence.
Above-mentioned fragment method is gone in order to be more fully understood that, more detailed to going Fragmentation to carry out below in conjunction with table 2 and Fig. 2 Thin illustration.Table 2 gives the concrete access forecasting sequence that Fragmentation is to be received, wherein comprises this Access in forecasting sequence and access sequence number i of operation and the long l of data object of correspondenceiDegree and original position o of this data objecti
Table 2
Sequence number 1 2 3 4 5 6 7 8 9 10 11 12 13 14
li 3 2 5 6 4 3 3 5 4 3 2 3 3 3
oi 10 20 30 50 3 10 7 35 40 13 16 20 7 23
Wherein, a length of the 12 of receiving queue S are set, during original state, it is assumed that it is 0 that last stored accesses position, specifically Execution process is:
1. perform step 201, receive the access operation 1~14 accessed in forecasting sequence successively, often receive one and access behaviour Go to step 202.
2. perform step 202, it is determined whether be the access operation repeated;If it is step 201 is gone to;If it is not, Then go to step 203.
Specifically, if currently received access operation with queue in received access operation accessed data object phase With, then ignore currently received access and operate, go to step 201, to avoid the access band of the same data object of double counting Width, such as, when receiving access operation 6,13, accesses operation 6,13 and 1,7 accessed data objects identical (original position and data Length is identical).
3. perform step 203, calculate the matching bandwidth of this access operation.
Specifically, the currently received data pair accessing operation the accessed data object tail entry relative to queue S are obtained The step-length of beating of elephant, by data object size and the step-length input such as matching bandwidth function obtained above of beating of acquisition, and will Data object size, the information such as data object original position and calculated matching bandwidth recorded queue tail, such as, right In accessing in operation 1, data object size is 3, and step-length of beating is 10 (to be jumped to data object initiate from last visit position 0 Position 10), a width of f of matching band1=f (3,10), accesses in operation 2, and data object size is 2, and step-length of beating is 7 (from access The data object end position 13 of operation 1 jumps to data object original position 20), matching bandwidth f2=f (2,7).
4. perform step 204, it is judged that process whether queue condition meets.Such as completely or arrive delay upper bound when queue Time (avoiding fragmentation to introduce too much delay), from first to last process queue, otherwise continue to access forecasting sequence.
Step 201~204 performs repeatedly, until after access forecasting sequence receives, or queue the most completely reaches to process bar Part, queue S state (accessing 6,13 is repeated accesses, ignores reception) as shown in table 3:
Table 3
Entries in queues s1 s2 s3 s4 s5 s6
li 3 2 5 6 4 3
oi 10 20 30 50 3 7
fi F (3,10) F (2,7) F (5,8) F (6,15) F (4,53) F (3,0)
Entries in queues s7 s8 s9 s10 s11 s12
li 5 4 3 2 3 3
oi 35 40 12 15 20 23
fi F (5,25) F (4,0) F (3,32) F (2,0) F (3,3) F (3,0)
5. performing step 205, scan queue S also the most according to condition divides subsequence.
Specifically, from first to last scan queue item s1~s12, according to three conditions division subsequences: condition 1, si..., si+mMeet fi+1≥Tfi+1..., fi+m≥Tfi+m, m >=Tm, fi+m+1< Tfi+m+1Or m=12;Condition 2, si+1..., si+mMeet fi < Tfi..., fi+m< Tfi+m, m >=T ' m, fi+m+1≥Tfi+m+1Or m=12;Condition 3, except remaining situation of condition 1 and condition 2 (it is unsatisfactory for condition 1 or condition 2, m=12 or s simultaneouslym+1..., sm+kMeet condition 1 or condition 2), wherein, Tfi=0.75 × f(li, 0), Tm, T 'mRound numbers 4.After scanned, comparative result is as shown in table 4, wherein, and s1~s5Matching bandwidth is less, meets Condition 2, s6~s8Matching amount of bandwidth differs, and meets condition 3, s9~s12Matching bandwidth is relatively big (except s9), meet condition 1, simultaneously Subsequence division result is made corresponding labelling in queue.
Table 4
6. perform step 206, it is judged that subsequence meets condition;If meeting bar 1, then go to step 209;If meeting bar 2, then device step 208;If meeting bar 3, then device step 207.
Specifically, according to the labelling result of step 205, s1~s5Meet condition 2, go to step 208.
7. perform step 208, by s1~s5Data object be written to new memory space
Specifically, find and can write s1~s5The original position in Coutinuous store space of data object and write, it is assumed that Find original position 100 in this example, by s1~s5Data object be respectively written into 100~102,103~104,105~ 109,110~115, in the memory space of 116~119, and the access entrance of these data objects is changed to new storage position Put.
8. perform step 209, by s1~s5(queue head pointer is pointed to s6) is removed from queue
9. perform step 210 (now queue not empty, go to step 206)
10. perform step 206, it is judged that subsequence meets condition.
Specifically, according to the labelling result of step 205, s6~s8Meet condition 3, go to step 207.
11. perform step 207, it is judged that subsequence s6~s8Whether meet Writing condition.
Specifically, acquisition can write s6~s8Memory space original position o of data object, calculate overall bandwidthCalculate and assume write bandwidth f '=f (l6+l7+l8, | o-o '5-l5|), it is computed, o=120 (s5Ending Position), o '5=116 (the newly written positions of data object of s5), l5=4, then f '=f (12,0), threshold value Tf=T 'f=0.75 × f(l6+l7+l8, 0) and=0.75 × f (12,0) (si~si+mCorresponding threshold value Tf=T 'f=0.75 × f (li+…+li+m, 0)).This Time, meet f < TfAnd f ' > T 'f, turn 208.
12. perform step 208, by s6~s8Data object be written to new Coutinuous store space
Specifically, by s6~s8Data object be sequentially respectively written into 120~122,123~127,128~131 deposit In storage space, and the access entrance of these data objects is changed to new storage position.
13. perform step 209, by subsequence s6~s8Remove from queue and (queue head pointer is pointed to s9)
14. perform step 210 (now queue not empty turns 206)
15. perform step 206, it is judged that subsequence meets condition
Specifically, according to the labelling result of step 205, s9~s12Meet condition 1, turn 209.
16. perform step 209, by s9~s12(queue head pointer is pointed to s1) is removed from queue
17. perform step 210 (now queue is empty, turns 201)
Inventor also, in content finds storage system, utilizes the backup load under true environment to carry out above-mentioned method Test.Test result shows, along with the growth of data volume, reads bandwidth and improves 12%~60%, and data redundancy controls 1%~2%, essentially eliminate the reading continuous downward trend of bandwidth;In same system, utilize the data syn-chronization of two weeks by a definite date The test result of load shows, reads bandwidth and improves 5~8 times, it is to avoid the reading bandwidth that fragmentation of data causes serious Decline.
Although the present invention has been described by means of preferred embodiments, but the present invention is not limited to described here Embodiment, the most also include done various changes and change.

Claims (7)

1. go a flaking method, described method to include:
Step 1) determine and access the expection read/write bandwidth of each data object accessing operation in forecasting sequence, described expection reads/ Writing bandwidth is with data object length corresponding to this access operation and access step-length of beating as variable, based on matching bandwidth function The matching bandwidth obtained, described matching bandwidth function is in the storage system needing fragmentation, with data object length and Access step-length of beating is variable, based in different pieces of information object length with access the benchmark read/write bandwidth measured under step-length of beating Data, the bandwidth function expression obtained by approximating method;
Step 2) from accessing in forecasting sequence from the beginning of certain access operation, for exceeding continuous multiple access operations of predetermined number And the expection read/write bandwidth of each data object accessing operation is less than bandwidth threshold, and these are accessed the data object of operation Sequentially it is written to new continuous print memory space;
Wherein, described matching bandwidth function obtains through the following steps:
Step a) is in the storage system needing fragmentation, with data object length x and access step-length y of beating as variable, surveys Measure the benchmark read/write bandwidth data under one group of difference x and y;
Step b) is a selected master variable from x, y, and another is time variable;
Step c) find with master variable closest to master variable value;
Step d) finds the interval of time variable;
Step e) is based on the benchmark read/write bandwidth data matching x under measured different x and y, and y is read/write during any value Bandwidth f (x, y);
Wherein, in described step c), when master variable is x, find xiMake Δ x=h (xi) absolute value of-h (x) is minimum;Work as main transformer When amount is for y, find yiMake Δ y=h (yi) absolute value of-h (y) is minimum;
In described step d), when secondary variable is y, find yjMake y ∈ [yj,yj+1], when secondary variable is x, find xjMake x ∈ [xj,xj+1];
In step e), when secondary variable is y, make Wherein, g Δ x represents the further correction to fitting result;When secondary variable is x, makeWherein, (Δ y) represents to enter fitting result g One step correction.
Method the most according to claim 1, also includes step 3) from accessing in forecasting sequence from the beginning of certain access operation, The expection read/write bandwidth of continuous multiple access operations and each data object accessing operation for exceeding predetermined number is more than Bandwidth threshold, then skip these and access operation, continues with ensuing access and operates.
Method the most according to claim 2, also includes step 4), for accessing in forecasting sequence not by step 2) or step 3) subsequence s that process, that be made up of continuous multiple access operationsi,…,si+mIf the overall bandwidth of this subsequence is less than certain The hypothesis continuous bandwidth of individual threshold value and this subsequence is more than certain threshold value, then by this subsequence, each accesses the data pair operated As being written to new continuous print memory space, and the access entrance of these data objects is changed to new position;Wherein, described Assume that continuous bandwidth represents if the bandwidth after being deposited continuously by the data object accessing operation in this subsequence;Subsequence si,…,si+mOverall bandwidthliFor siThe length of data object, fiFor siThe expection of data object Read/write bandwidth, m is the number accessing operation in this subsequence.
Method the most according to claim 2, described bandwidth threshold goes the maximum belt of the storage system of fragmentation as required Width values or continuous bandwidth determine, described continuous bandwidth is to beat in the case of step-length is 0 in access, based on matching bandwidth letter The matching bandwidth counted and obtain.
5. go a fragmentation system, described system to include:
Bandwidth determines device, accesses the expection read/write bandwidth of each data object accessing operation in forecasting sequence for determining, Described expection read/write bandwidth is with data object length corresponding to this access operation and access step-length of beating as variable, based on plan Crossed belt width function and the matching bandwidth that obtains, described matching bandwidth function is in the storage system needing fragmentation, with number It is variable according to object length and access step-length of beating, based in different pieces of information object length with access the base measured under step-length of beating Quasi-read/write bandwidth data, the bandwidth function expression obtained by approximating method;
Remove crumb units, for from accessing in forecasting sequence from the beginning of certain access operation, for exceeding the continuous many of predetermined number These, less than bandwidth threshold, are accessed operation by the individual expection read/write bandwidth of operation and each data object accessing operation that accesses Data object be sequentially written to new continuous print memory space;
Wherein, described matching bandwidth function obtains through the following steps:
Step a) is in the storage system needing fragmentation, with data object length x and access step-length y of beating as variable, surveys Measure the benchmark read/write bandwidth data under one group of difference x and y;
Step b) is a selected master variable from x, y, and another is time variable;
Step c) find with master variable closest to master variable value;
Step d) finds the interval of time variable;
Step e) is based on the benchmark read/write bandwidth data matching x under measured different x and y, and y is read/write during any value Bandwidth f (x, y);
Wherein, in described step c), when master variable is x, find xiMake Δ x=h (xi) absolute value of-h (x) is minimum;Work as main transformer When amount is for y, find yiMake Δ y=h (yi) absolute value of-h (y) is minimum;
In described step d), when secondary variable is y, find yjMake y ∈ [yj,yj+1], when secondary variable is x, find xjMake x ∈ [xj,xj+1];
In step e), when secondary variable is y, make Wherein, g Δ x represents the further correction to fitting result;When secondary variable is x, makeWherein, (Δ y) represents to enter fitting result g One step correction.
System the most according to claim 5, described in go crumb units to be additionally operable to from accessing forecasting sequence certain accesses behaviour Work starts, for exceeding continuous multiple access operations and the expection read/write of each data object accessing operation of predetermined number Band is wider than bandwidth threshold, then skip these and access operation, continues with ensuing access and operates.
System the most according to claim 6, described in go crumb units to be additionally operable to the company for being unsatisfactory for exceeding predetermined number The expection read/write bandwidth of continuous multiple data object accessing operation both greater than or less than bandwidth threshold, by continuous multiple access The subsequence s that operation is constitutedi,…,si+mIf the overall bandwidth of this subsequence is less than certain threshold value and the hypothesis of this subsequence Continuous bandwidth is more than certain threshold value, then each data object accessing operation in this subsequence is written to new continuous print storage Space, and the access entrance of these data objects is changed to new position;Wherein, described hypothesis continuous bandwidth represents if will This subsequence accesses the bandwidth after the data object of operation is deposited continuously;Subsequence si,…,si+mOverall bandwidthliFor siThe length of data object, fiFor siThe expection read/write bandwidth of data object, m is this sub-sequence Row access the number of operation.
CN201310298326.2A 2013-07-16 2013-07-16 Go flaking method and system Active CN103389946B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310298326.2A CN103389946B (en) 2013-07-16 2013-07-16 Go flaking method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310298326.2A CN103389946B (en) 2013-07-16 2013-07-16 Go flaking method and system

Publications (2)

Publication Number Publication Date
CN103389946A CN103389946A (en) 2013-11-13
CN103389946B true CN103389946B (en) 2016-08-10

Family

ID=49534224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310298326.2A Active CN103389946B (en) 2013-07-16 2013-07-16 Go flaking method and system

Country Status (1)

Country Link
CN (1) CN103389946B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108282378B (en) * 2017-01-05 2021-11-09 阿里巴巴集团控股有限公司 Method and device for monitoring network flow
CN108021512A (en) * 2017-11-22 2018-05-11 深圳忆联信息系统有限公司 A kind of solid state hard disc mapping management process and solid state hard disc

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5930828A (en) * 1997-03-26 1999-07-27 Executive Software International Real-time apparatus and method for minimizing disk fragmentation in a computer system
CN101460932A (en) * 2006-06-08 2009-06-17 Nxp股份有限公司 Device for remote defragmentation of an embedded device
CN102929884A (en) * 2011-08-10 2013-02-13 阿里巴巴集团控股有限公司 Method and device for compressing virtual hard disk image file

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3629216B2 (en) * 2001-03-08 2005-03-16 株式会社東芝 Disk storage system having defragmentation function, and defragmentation method in the same system
GB0517305D0 (en) * 2005-08-24 2005-10-05 Ibm Method and apparatus for the defragmentation of a file system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5930828A (en) * 1997-03-26 1999-07-27 Executive Software International Real-time apparatus and method for minimizing disk fragmentation in a computer system
CN101460932A (en) * 2006-06-08 2009-06-17 Nxp股份有限公司 Device for remote defragmentation of an embedded device
CN102929884A (en) * 2011-08-10 2013-02-13 阿里巴巴集团控股有限公司 Method and device for compressing virtual hard disk image file

Also Published As

Publication number Publication date
CN103389946A (en) 2013-11-13

Similar Documents

Publication Publication Date Title
US10678686B2 (en) Estimation method for read and write access performance using average read retry times and a valid data weight ratio
CN103116536B (en) The capacity check method of memory storage
US8468134B1 (en) System and method for measuring consistency within a distributed storage system
US20090254719A1 (en) Switch apparatus
CN105512264A (en) Performance prediction method of concurrency working loads in distributed database
KR102419234B1 (en) Product quality analysis support system
CN103389946B (en) Go flaking method and system
CN110297743B (en) Load testing method and device and storage medium
CN115269289A (en) Slow disk detection method and device, electronic equipment and storage medium
CN109634960B (en) Key value data storage method, device, equipment and storage medium
RU2011152760A (en) OPTIMIZATION OF THE CODE USING A COMPILATOR WITH TWO ORDERING BYTE FOLLOWING
EP2950210A1 (en) Control method and device for system log recording
CN116700623A (en) Data storage method, system, electronic equipment and storage medium
CN116778986A (en) Method and device for constructing read reference voltage calibration model, and calibration method and device
CN104573339A (en) Method and device for determining geological parameters of shale gas reservoir
CN114625719B (en) Dynamic set management method and system based on mobile filtering framework
CN104850548B (en) A kind of method and system for realizing big data platform input/output processing
CN115938464A (en) Test method and system of solid state disk, electronic device and readable storage medium
KR20170041837A (en) Method and device for detecting authorized memory access
US20020032683A1 (en) Method and device for sorting data, and a computer product
US8341376B1 (en) System, method, and computer program for repartitioning data based on access of the data
CN103793339B (en) Data Cache performance heuristic approach based on internal storage access storehouse distance
CN113419706A (en) Rapid random number generation method and system and inspection method and system thereof
CN109710888B (en) Natural gas pipeline damage prediction method and device based on punishment regression
CN109213967B (en) Carrier rocket data prediction method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant