CN103389946B - Go flaking method and system - Google Patents
Go flaking method and system Download PDFInfo
- Publication number
- CN103389946B CN103389946B CN201310298326.2A CN201310298326A CN103389946B CN 103389946 B CN103389946 B CN 103389946B CN 201310298326 A CN201310298326 A CN 201310298326A CN 103389946 B CN103389946 B CN 103389946B
- Authority
- CN
- China
- Prior art keywords
- bandwidth
- access
- data object
- variable
- read
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides one to remove flaking method, first the method determines the expection read/write bandwidth of each data object accessing operation in access forecasting sequence, the described read/write bandwidth that prefetches is with data object length corresponding to this access operation and access step-length of beating as variable, the matching bandwidth obtained based on matching bandwidth function.Then from the beginning of from access forecasting sequence, certain accesses operation, the read/write bandwidth that prefetches of continuous multiple access operations and each data object accessing operation for exceeding predetermined number is less than bandwidth threshold, and these data objects accessing operation are sequentially written to new continuous print memory space.The method can accurately judge fragmentation data, in the case of little effect current system performance, improves reading performance, and the environment in change has preferable adaptability.
Description
Technical field
The invention belongs to Computer Storage field, the method particularly relating to process fragmentation of data.
Background technology
Fragmentation refers to that continuous print file data the most relatively is stored after storage medium in logic, and its storage position becomes
Obtain relative distribution, bring the operation of a large amount of random read-write when data access.Storage system based on disk for current main flow
For system, fragmentation can cause the continuous decline of read/write bandwidth, accordingly, it would be desirable to one removes flaking method, reduces fragmentation
Impact, reduces random read-write and accesses.
Existing go flaking method be typically input access forecasting sequence, according to this access forecasting sequence identification fragmentation
Fragmentation data is also written to new memory area by data, thus reaches the purpose of fragmentation.Described access forecasting sequence
It is a kind of prediction to the most most possible access order, there is multiple Forecasting Methodology, but inherently can be divided into two kinds:
(1) static prediction, as the logic storage order according to data is predicted;(2) dynamic prediction, as according to current accessed sequential prediction.
During removing fragmentation, it is usually required mainly for solve two problems: (1) which type of data object is fragmentation number
According to, if i.e. according to accessing forecasting sequence access current storage system, the access of which data object can cause read/write bandwidth to go out
Existing more decline;(2) which data should be written to new storage position, thus is reducing what random read-write in the future accessed
Meanwhile, the current performance of little effect system as far as possible.Existing subject matter present in flaking method is gone to show as follows
Several aspects: first, the judgement to fragmentation data is not accurate enough, it is impossible to effectively improving performance;Secondly, Fragmentation is removed
In, the data volume needing movement is too much, and during causing operation, declining significantly occurs in systematic function.
Summary of the invention
Therefore, it is an object of the invention to overcome the defect of above-mentioned prior art, it is provided that a kind of new remove flaking method.
It is an object of the invention to be achieved through the following technical solutions:
On the one hand, the invention provides one goes flaking method, described method to include:
Step 1) determine the expection read/write bandwidth of each data object accessing operation in access forecasting sequence, described pre-
Taking read/write bandwidth is with data object length corresponding to this access operation and access step-length of beating as variable, based on matching bandwidth
Function and the matching bandwidth that obtains;
Step 2) from accessing in forecasting sequence from the beginning of certain access operation, for exceeding continuous multiple access of predetermined number
The read/write bandwidth that prefetches of operation and each data object accessing operation is less than bandwidth threshold, and these are accessed the data of operation
Object is sequentially written to new continuous print memory space.
In the above-mentioned methods, described step 1) in, described matching bandwidth function can be to need the storage of fragmentation
In system, with data object length and access step-length of beating as variable, beat step based in different pieces of information object length and access
The long lower benchmark read/write bandwidth data measured, the bandwidth function expression obtained by approximating method.
In the above-mentioned methods, described matching bandwidth function can obtain through the following steps:
Step a), in the storage system needing fragmentation, beats step-length y for becoming with data object length x and access
Amount, measures the benchmark read/write bandwidth data under one group of difference x and y;
Step b) is a selected master variable from x, y, and another is time variable;
Step c) find with master variable closest to master variable value;
Step d) finds the interval of time variable;
Step e) is based on the benchmark read/write bandwidth data matching x under measured different x and y, when y is any value
Read/write bandwidth f (x, y).
In the above-mentioned methods, in described step c), when master variable is x, find xiMake Δ x=h (xi)-h (x) absolute
Value minimum;When master variable is y, find yiMake Δ y=h (yi) absolute value of-h (y) is minimum;
In described step d), when secondary variable is y, find yjMake y ∈ [yj, yj+1], when secondary variable is x, find xjMake x
∈[xj, xj+1];
In step e), when secondary variable is y, make Wherein, (Δ x) represents the further correction to fitting result to g;When secondary variable is x, make Wherein, (Δ y) represents to enter fitting result g
One step correction.
In the above-mentioned methods, may also include step 3) from accessing in forecasting sequence from the beginning of certain access operation, for exceeding
Continuous multiple access operations of predetermined number and the read/write bandwidth that prefetches of each data object accessing operation are more than bandwidth threshold
Value, then skip these and access operation, continues with ensuing access and operates.
In the above-mentioned methods, may also include described step 4), for accessing in forecasting sequence not by step 2) or step 3)
Subsequence s that process, that be made up of continuous multiple access operationsi..., si+mIf the overall bandwidth of this subsequence is less than certain
The hypothesis continuous bandwidth of individual threshold value and this subsequence is more than certain threshold value, then by this subsequence, each accesses the data pair operated
As being written to new continuous print memory space, and the access entrance of these data objects is changed to new position;Wherein, described
Assume that continuous bandwidth represents if the bandwidth after being deposited continuously by the data object accessing operation in this subsequence;Subsequence
si..., si+mOverall bandwidthliFor siThe length of data object, fiFor siData object pre-
Phase read/write bandwidth, m is the number accessing operation in this subsequence.
In the above-mentioned methods, described bandwidth threshold can go as required fragmentation storage system maximum bandwidth value or
Continuous bandwidth determines, described continuous bandwidth is to beat in the case of step-length is 0 in access, obtains based on matching bandwidth function
Matching bandwidth.
Another aspect, the present invention also provides for one and goes fragmentation system, described system to include:
Bandwidth determines device, accesses the expection read/write of each data object accessing operation in forecasting sequence for determining
Bandwidth, described in prefetch read/write bandwidth be with data object length corresponding to this access operation and access step-length of beating as variable, base
The matching bandwidth obtained in matching bandwidth function;
Remove crumb units, for from accessing in forecasting sequence from the beginning of certain access operation, for exceeding the company of predetermined number
Continue multiple read/write bandwidth that prefetches accessing operation and each data object accessing operation and be less than bandwidth threshold, these are accessed
The data object of operation is sequentially written to new continuous print memory space.
In said system, described matching bandwidth function can be in the storage system needing fragmentation, with data
Object length and access step-length of beating is variable, based in different pieces of information object length with access the benchmark measured under step-length of beating
Read/write bandwidth data, the bandwidth function expression obtained by approximating method.
In said system, described in go crumb units to can be additionally used in from accessing from the beginning of certain accesses operation in forecasting sequence,
The read/write bandwidth that prefetches of continuous multiple access operations and each data object accessing operation for exceeding predetermined number is more than
Bandwidth threshold, then skip these and access operation, continues with ensuing access and operates.
In said system, described in go crumb units to can be additionally used in the continuous multiple visits for being unsatisfactory for exceeding predetermined number
Ask operation data object prefetch read/write bandwidth both greater than or less than bandwidth threshold, operated by continuous multiple access and constitute
Subsequence si..., si+mIf the overall bandwidth of this subsequence is less than certain threshold value and the hypothesis continuous bandwidth of this subsequence
More than certain threshold value, then each data object accessing operation in this subsequence is written to new continuous print memory space, and
The access entrance of these data objects is changed to new position;Wherein, described hypothesis continuous bandwidth represents if by this sub-sequence
Row access the bandwidth after the data object of operation is deposited continuously;Subsequence si..., si+mOverall bandwidthliFor siThe length of data object, fiFor siThe expection read/write bandwidth of data object, m is this sub-sequence
Row access the number of operation.
Compared with prior art, provided by the present invention remove flaking method, can accurately judge fragmentation number
According to, in the case of little effect current system performance, improving reading performance, the environment in change has preferable adaptability.
Accompanying drawing explanation
Embodiments of the present invention is further illustrated referring to the drawings, wherein:
Fig. 1 is the schematic flow sheet removing flaking method according to one embodiment of the invention;
Fig. 2 is the schematic flow sheet removing flaking method according to another embodiment of the present invention.
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage are clearer, below in conjunction with accompanying drawing by concrete real
The present invention is described in more detail to execute example.Should be appreciated that specific embodiment described herein only in order to explain the present invention,
It is not intended to limit the present invention.
Access forecasting sequence and be actually a series of access operation, and read/write bandwidth is main and each access operates
The size of the data object accessed and access change in location distance before or after accessing this data object (or can also
It is referred to as access to beat step-length) relevant.Generally, the degree of fragmentation of the data object that read/write bandwidth is less in forecasting sequence is accessed
Higher, it is contemplated that read/write bandwidth is a kind of posteriority index, it is impossible to obtain in advance.Therefore, in an embodiment of the present invention,
Have employed read/write bandwidth approximating method, calculate the pre-of the data object that in access forecasting sequence, each access operation is accessed
Phase read/write bandwidth (the most also referred to as accesses the matching bandwidth that operation is corresponding), and carries out fragmentation based on this index
The judgement of degree.
Fig. 1 gives the schematic flow sheet removing flaking method according to an embodiment of the invention.The method includes: step
Rapid 1) the expection read/write bandwidth of the data object every time accessing operation in access forecasting sequence is determined;Step 2) from accessing prediction
In sequence certain access operation start, if it exceeds predetermined number continuous multiple access operation data objects prefetch reading/
Write bandwidth and be less than bandwidth threshold, these data objects accessing operation are sequentially written to new continuous print memory space.
With reference to Fig. 1, more specifically, in step 1) determine the pre-of the data object that in access forecasting sequence, access operates every time
Phase read/write bandwidth.The described read/write bandwidth that prefetches is the data object length corresponding with this access operation and access step-length of beating is
Variable, the matching bandwidth obtained based on matching bandwidth function.According to one embodiment of present invention, described step 1) can include
The following step:
Step 11) receive the access operation accessed in forecasting sequence one by one, and it is saved in queue S: < s1..., snIn >.?
Including but not limited to being obtained by certain mode (such as metadata access, index search etc.) in the information preserved in queue S
Each original position o accessing the accessed data object of operationiAnd length l of data objecti.The access preserved in queue S is pre-
Order-checking row can be the forecasting sequence (static prediction or dynamic prediction) directly obtained.It may also be pre-through pretreatment
Order-checking row (such as removing the forecasting sequence of repetition);Such as, if the kth received accesses operation skWith s in queue S1...,
sk-1One of them access same data object, the most not by this access operate skAdd in enqueue S.
Step 12) calculate each access operation s in queue SiCorresponding matching bandwidth fi。
As mentioned above, described matching bandwidth is actually with data object length and the access of this access operation correspondence
Step-length of beating is variable, the bandwidth obtained based on matching bandwidth function.Described matching bandwidth function is to remove fragmentation at needs
Storage system in, with data object length and access step-length of beating as variable, based on one group in different pieces of information object length and
Access the benchmark read/write bandwidth data of the measurement beated under step-length, the bandwidth function expression obtained by approximating method.At this
In one embodiment of invention, obtain the expression formula of matching bandwidth function through the following steps:
Step a), in the storage system needing fragmentation, beats step-length y for becoming with data object length x and access
Amount, measures the benchmark read/write bandwidth data under one group of difference x and y, as shown in table 1.Wherein xi, yjThe difference representing x and y takes
Value, measuring corresponding read/write bandwidth through reality is fij, described different x and y refer to the incremental manner of variable, include but not limited to
Difference is incremented by, and geometric ratio is incremented by.
Table 1
Parameter | y0 | y1 | ... | yn |
x0 | f00 | f01 | ... | fon |
x1 | f10 | f11 | ... | f1n |
... | ... | ... | ... | ... |
xm | fm0 | fm2 | ... | fmn |
Step b) is a selected master variable from x, y, and another is time variable.
Step c) find with master variable closest to master variable value (such as, x0~xmIn a value), i.e. work as master
When variable is x, find xiMake Δ x=h (xi) absolute value of-h (x) is minimum, when master variable is y, finds yiMake Δ y=h
(yi) absolute value of-h (y) is minimum, wherein function h (x) is including but not limited to linear function (such as h (x)=x), logarithmic function (as
H (x)=log2X) etc..
Step d) finds the interval of time variable, i.e. when secondary variable is y, finds yjMake y ∈ [yj, yj+1], when secondary change
When amount is for x, find xjMake x ∈ [xj, xj+1]。
Step e) based on read/write bandwidth f that benchmark data matching x, y are during any value (x, y): in secondary variable-value district
Between take a rational value in corresponding bandwidth interval in proportion as fitting result, i.e. when secondary variable is y, make Wherein, (Δ x) represents fitting result g
Revise further.When secondary variable is x, make Its
In, (Δ y) represents the further correction to fitting result to g.
Obtained the expression formula of matching bandwidth function by above-mentioned steps a)-step e), wherein comprise two variable data
Object length x and access are beated step-length y.Described data object length can be the physical length in units of byte, it is also possible to
Being the logic length of self-defined unit, described access step-length of beating refers to access position before or after this data object accessing
Change distance, this distance can be the physical distance in units of byte, it is also possible to be the logical reach of self-defined unit.
Such as, measured benchmark data can be to read band data, can use for data object length x and access step-length of beating
From small to large by 2 integer power as value mode, i.e. take xi=2i, yj=2j, according to xi, yjThe most from first to last read one
Big file, using meansigma methods as fij.For each xiAlso need to benchmark during measuring sequence reading (i.e. accessing step-length y=0 of beating)
Band data, the band data (continuous bandwidth can be referred to as) when the most there is not fragment.Here may select master variable is
Accessing size of data x, secondary variable is step-length y of beating.Then value (the x closest to master variable is found0~xmIn a value) and
Secondary variable-value is interval.Make h (x)=log2X, Δ x=i-log2X, then i=[log2X+0.5] time Δ x minimum.Order,
Secondary variable-value interval is [2j, 2j+1], as y=0, interval is [0,1].Make g (Δ x)=Δ x × (fij-fij+1), the most as above
Literary composition is described, can obtain the wide fitting function that reads tape
Then, according to obtained matching bandwidth function, calculate each access in queue S and operate siCorresponding matching bandwidth
fi=f (li, | oi-oi-1-li-1|).Visible, this matching bandwidth is actually according to length (such as, the l of data objecti) and visit
Ask before or after this object access change in location distance (such as, | oi-oi-1-li-1|), obtain matching bandwidth by above-mentioned
The calculated numerical value of function.Receive when such as accessing forecasting sequence, arrive delay upper bound (for avoiding fragmentation
The threshold value introducing too much delay and arrange), or start to go to step 2 when receiving queue S is full etc.) continue executing with.
With continued reference to Fig. 1, in step 2) from accessing in forecasting sequence from the beginning of certain access operation, if it exceeds predetermined number
The read/write bandwidth that prefetch of data objects of continuous multiple access operations be both less than bandwidth threshold, these are accessed the number of operation
New continuous print memory space sequentially it is written to according to object.So, the method by matching bandwidth selection calculated above to that
A little data objects carry out fragmentation, are negatively affected presently written bandwidth to reduce as far as possible.If it exceeds predetermined number
The read/write bandwidth that prefetches of continuous multiple data object accessing operation is both greater than bandwidth threshold, then skip these and access operation, continue
Continuous process is ensuing accesses operation.
Wherein, described bandwidth threshold can be static threshold, can go the maximum belt of the storage system of fragmentation as required
Width values determines.Bandwidth threshold such as can be set to the percentage ratio of storage system maximum bandwidth, such as storage system maximum bandwidth
More than 60%;When described bandwidth threshold can also be dynamic threshold, can determine according to continuous bandwidth, this continuous bandwidth is from plan
Closing function and calculate gained, i.e. accessing step-length of beating is matching bandwidth f (x, 0) when 0.Such as can be set to bandwidth threshold to set
It is set to the percentage ratio of continuous bandwidth, such as more than the 70% of continuous bandwidth.
Avoid the write operation impact on systematic entirety energy, again to more effectively reduce system fragmentation program simultaneously
In one embodiment, in step 2) according to three conditions, access forecasting sequence is divided into the three continuous subsequences of class, and to all kinds of sons
Sequence carries out different disposal.As described above, queue S: < s1..., sn> preserve is received access forecasting sequence, from
Head scans receiving queue S to tail, according to following three conditions this access forecasting sequence is divided into multiple subsequence:
Condition 1:si..., si+mMeet fi+1≥Tfi+1..., fi+m≥Tfi+m, m >=Tm, fi+m+1< Tfi+m+1Or i+m=n;
Here show that certain accesses operation (such as s from accessing forecasting sequencei) start, exceed a certain number of (i.e. Tm) the most multiple
The matching bandwidth accessing operation is both greater than bandwidth threshold, say, that these access the degree of fragmentation of the data object operated relatively
Low, this kind of subsequence can be ignored.Now si..., si+mCorresponding data object is without being written to new continuous print storage position
Put, to si..., si+mIt is not required to any process, continues to identify next subsequence.Wherein, TfiRepresent for siMatching bandwidth
Bandwidth threshold, TmFor an integer thresholds set in advance.As described above, bandwidth threshold TfiCan be that described bandwidth threshold can
To be static threshold, as may be configured as needing more than the 60% of the storage system maximum bandwidth of fragment;It can also be dynamic threshold
During value, as may be configured as more than the 70% of continuous bandwidth, it is plan when 0 that this continuous bandwidth refers to access step-length of beating
Crossed belt width f (x, 0).
Condition 2:si..., si+mMeet fi< Tfi..., fi+m< Tfi+m, m >=T 'm, fi+m+1≥Tfi+m+1Or i+m=n;This
In show from from accessing forecasting sequence, certain accesses operation (such as si) start, exceed a certain number of (such as T 'm) continuous many
The individual matching bandwidth accessing operation is both less than bandwidth threshold, say, that these access the degree of fragmentation of the data object operated
Higher, need to go fragmentation to process this kind of subsequence.Now si..., si+mCorresponding data object needs to be written to
New continuous print storage.Wherein, TfiRepresent for siThe bandwidth threshold of matching bandwidth, T 'mFor an integer set in advance
Threshold value.
For meeting the subsequence s of condition 2i..., si+m, can be by si..., si+mCorresponding data object is sequentially written to
New continuous print memory space, and the access entrance of these data objects is changed to new position.Wherein, write data pair
As reading from locally stored position, it is also possible to obtained from other equipment by network.
Condition 3, except remaining situation of condition 1 and condition 2.Such as, si..., si+mIt is unsatisfactory for condition 1 or condition 2, simultaneously
I+m=n, or si+m+1..., si+m+kMeet condition 1 or condition 2.
The subsequence of condition, such as s is divided for being unsatisfactory for (1) and (2)i..., si+m, calculate the overall band of this sequence
Width, namely performs all overall bandwidth accessing operation generation rather than the bandwidth of certain access operation in this sequence.If
More than certain threshold value, (such as threshold value a) then illustrates the data object degree of fragmentation that this sequence accesses to the overall bandwidth of this sequence
The highest, it is not necessary to process this sequence.Whereas if overall bandwidth is less than threshold value a, then calculate the hypothesis continuous bandwidth of this sequence, should
Assume that continuous bandwidth refers to if the bandwidth after being deposited continuously by the data object accessing operation in this sequence.If overall bandwidth is little
In threshold value a and assume that (such as threshold value b) illustrates if the data pair will answered these sequence pair continuous bandwidth more than certain threshold value
To be effectively improved bandwidth as being written to Coutinuous store region, the data object therefore can answered these sequence pair is carried out accordingly
Process, i.e. this subsequence si..., si+mCorresponding data object is written to new continuous print memory space.If overall bandwidth
It is less than threshold value b, even if illustrating to be written to the data object that these sequence pair are answered continuously less than threshold value a and hypothesis continuous bandwidth
Memory area can not improve bandwidth, does not processes this sequence.Wherein, the setting of threshold value a and threshold value b can refer to
Bandwidth threshold described in literary composition is arranged.
Wherein overall bandwidth and hypothesis continuous bandwidth calculate as follows:
si..., si+mOverall bandwidth numerical value
Acquisition can write si..., si+mCoutinuous store space original position o of corresponding data object also calculates hypothesis even
Continuous bandwidth f '=f (li+…+li+m, | o-o 'i-1-li-1|), if f is < TfAnd f ' > T 'f, by si..., si+mCorresponding data
Object is sequentially written to new continuous print memory space, and the access entrance of these data objects is changed to new position.Its
In, o 'i-1And li-1For forecasting sequence si-1The up-to-date original position (new position may be had been written into) of corresponding data object
And size, TfWith T 'fSetting can refer to bandwidth threshold mentioned above and arrange, write data object can be from locally stored
Position is read, it is also possible to obtained from other equipment by network.
When queue S is disposed, after emptying queue S, continue to another and access forecasting sequence.
Above-mentioned fragment method is gone in order to be more fully understood that, more detailed to going Fragmentation to carry out below in conjunction with table 2 and Fig. 2
Thin illustration.Table 2 gives the concrete access forecasting sequence that Fragmentation is to be received, wherein comprises this
Access in forecasting sequence and access sequence number i of operation and the long l of data object of correspondenceiDegree and original position o of this data objecti。
Table 2
Sequence number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
li | 3 | 2 | 5 | 6 | 4 | 3 | 3 | 5 | 4 | 3 | 2 | 3 | 3 | 3 |
oi | 10 | 20 | 30 | 50 | 3 | 10 | 7 | 35 | 40 | 13 | 16 | 20 | 7 | 23 |
Wherein, a length of the 12 of receiving queue S are set, during original state, it is assumed that it is 0 that last stored accesses position, specifically
Execution process is:
1. perform step 201, receive the access operation 1~14 accessed in forecasting sequence successively, often receive one and access behaviour
Go to step 202.
2. perform step 202, it is determined whether be the access operation repeated;If it is step 201 is gone to;If it is not,
Then go to step 203.
Specifically, if currently received access operation with queue in received access operation accessed data object phase
With, then ignore currently received access and operate, go to step 201, to avoid the access band of the same data object of double counting
Width, such as, when receiving access operation 6,13, accesses operation 6,13 and 1,7 accessed data objects identical (original position and data
Length is identical).
3. perform step 203, calculate the matching bandwidth of this access operation.
Specifically, the currently received data pair accessing operation the accessed data object tail entry relative to queue S are obtained
The step-length of beating of elephant, by data object size and the step-length input such as matching bandwidth function obtained above of beating of acquisition, and will
Data object size, the information such as data object original position and calculated matching bandwidth recorded queue tail, such as, right
In accessing in operation 1, data object size is 3, and step-length of beating is 10 (to be jumped to data object initiate from last visit position 0
Position 10), a width of f of matching band1=f (3,10), accesses in operation 2, and data object size is 2, and step-length of beating is 7 (from access
The data object end position 13 of operation 1 jumps to data object original position 20), matching bandwidth f2=f (2,7).
4. perform step 204, it is judged that process whether queue condition meets.Such as completely or arrive delay upper bound when queue
Time (avoiding fragmentation to introduce too much delay), from first to last process queue, otherwise continue to access forecasting sequence.
Step 201~204 performs repeatedly, until after access forecasting sequence receives, or queue the most completely reaches to process bar
Part, queue S state (accessing 6,13 is repeated accesses, ignores reception) as shown in table 3:
Table 3
Entries in queues | s1 | s2 | s3 | s4 | s5 | s6 |
li | 3 | 2 | 5 | 6 | 4 | 3 |
oi | 10 | 20 | 30 | 50 | 3 | 7 |
fi | F (3,10) | F (2,7) | F (5,8) | F (6,15) | F (4,53) | F (3,0) |
Entries in queues | s7 | s8 | s9 | s10 | s11 | s12 |
li | 5 | 4 | 3 | 2 | 3 | 3 |
oi | 35 | 40 | 12 | 15 | 20 | 23 |
fi | F (5,25) | F (4,0) | F (3,32) | F (2,0) | F (3,3) | F (3,0) |
5. performing step 205, scan queue S also the most according to condition divides subsequence.
Specifically, from first to last scan queue item s1~s12, according to three conditions division subsequences: condition 1, si...,
si+mMeet fi+1≥Tfi+1..., fi+m≥Tfi+m, m >=Tm, fi+m+1< Tfi+m+1Or m=12;Condition 2, si+1..., si+mMeet fi
< Tfi..., fi+m< Tfi+m, m >=T ' m, fi+m+1≥Tfi+m+1Or m=12;Condition 3, except remaining situation of condition 1 and condition 2
(it is unsatisfactory for condition 1 or condition 2, m=12 or s simultaneouslym+1..., sm+kMeet condition 1 or condition 2), wherein, Tfi=0.75 ×
f(li, 0), Tm, T 'mRound numbers 4.After scanned, comparative result is as shown in table 4, wherein, and s1~s5Matching bandwidth is less, meets
Condition 2, s6~s8Matching amount of bandwidth differs, and meets condition 3, s9~s12Matching bandwidth is relatively big (except s9), meet condition 1, simultaneously
Subsequence division result is made corresponding labelling in queue.
Table 4
6. perform step 206, it is judged that subsequence meets condition;If meeting bar 1, then go to step 209;If meeting bar
2, then device step 208;If meeting bar 3, then device step 207.
Specifically, according to the labelling result of step 205, s1~s5Meet condition 2, go to step 208.
7. perform step 208, by s1~s5Data object be written to new memory space
Specifically, find and can write s1~s5The original position in Coutinuous store space of data object and write, it is assumed that
Find original position 100 in this example, by s1~s5Data object be respectively written into 100~102,103~104,105~
109,110~115, in the memory space of 116~119, and the access entrance of these data objects is changed to new storage position
Put.
8. perform step 209, by s1~s5(queue head pointer is pointed to s6) is removed from queue
9. perform step 210 (now queue not empty, go to step 206)
10. perform step 206, it is judged that subsequence meets condition.
Specifically, according to the labelling result of step 205, s6~s8Meet condition 3, go to step 207.
11. perform step 207, it is judged that subsequence s6~s8Whether meet Writing condition.
Specifically, acquisition can write s6~s8Memory space original position o of data object, calculate overall bandwidthCalculate and assume write bandwidth f '=f (l6+l7+l8, | o-o '5-l5|), it is computed, o=120 (s5Ending
Position), o '5=116 (the newly written positions of data object of s5), l5=4, then f '=f (12,0), threshold value Tf=T 'f=0.75 ×
f(l6+l7+l8, 0) and=0.75 × f (12,0) (si~si+mCorresponding threshold value Tf=T 'f=0.75 × f (li+…+li+m, 0)).This
Time, meet f < TfAnd f ' > T 'f, turn 208.
12. perform step 208, by s6~s8Data object be written to new Coutinuous store space
Specifically, by s6~s8Data object be sequentially respectively written into 120~122,123~127,128~131 deposit
In storage space, and the access entrance of these data objects is changed to new storage position.
13. perform step 209, by subsequence s6~s8Remove from queue and (queue head pointer is pointed to s9)
14. perform step 210 (now queue not empty turns 206)
15. perform step 206, it is judged that subsequence meets condition
Specifically, according to the labelling result of step 205, s9~s12Meet condition 1, turn 209.
16. perform step 209, by s9~s12(queue head pointer is pointed to s1) is removed from queue
17. perform step 210 (now queue is empty, turns 201)
Inventor also, in content finds storage system, utilizes the backup load under true environment to carry out above-mentioned method
Test.Test result shows, along with the growth of data volume, reads bandwidth and improves 12%~60%, and data redundancy controls
1%~2%, essentially eliminate the reading continuous downward trend of bandwidth;In same system, utilize the data syn-chronization of two weeks by a definite date
The test result of load shows, reads bandwidth and improves 5~8 times, it is to avoid the reading bandwidth that fragmentation of data causes serious
Decline.
Although the present invention has been described by means of preferred embodiments, but the present invention is not limited to described here
Embodiment, the most also include done various changes and change.
Claims (7)
1. go a flaking method, described method to include:
Step 1) determine and access the expection read/write bandwidth of each data object accessing operation in forecasting sequence, described expection reads/
Writing bandwidth is with data object length corresponding to this access operation and access step-length of beating as variable, based on matching bandwidth function
The matching bandwidth obtained, described matching bandwidth function is in the storage system needing fragmentation, with data object length and
Access step-length of beating is variable, based in different pieces of information object length with access the benchmark read/write bandwidth measured under step-length of beating
Data, the bandwidth function expression obtained by approximating method;
Step 2) from accessing in forecasting sequence from the beginning of certain access operation, for exceeding continuous multiple access operations of predetermined number
And the expection read/write bandwidth of each data object accessing operation is less than bandwidth threshold, and these are accessed the data object of operation
Sequentially it is written to new continuous print memory space;
Wherein, described matching bandwidth function obtains through the following steps:
Step a) is in the storage system needing fragmentation, with data object length x and access step-length y of beating as variable, surveys
Measure the benchmark read/write bandwidth data under one group of difference x and y;
Step b) is a selected master variable from x, y, and another is time variable;
Step c) find with master variable closest to master variable value;
Step d) finds the interval of time variable;
Step e) is based on the benchmark read/write bandwidth data matching x under measured different x and y, and y is read/write during any value
Bandwidth f (x, y);
Wherein, in described step c), when master variable is x, find xiMake Δ x=h (xi) absolute value of-h (x) is minimum;Work as main transformer
When amount is for y, find yiMake Δ y=h (yi) absolute value of-h (y) is minimum;
In described step d), when secondary variable is y, find yjMake y ∈ [yj,yj+1], when secondary variable is x, find xjMake x ∈
[xj,xj+1];
In step e), when secondary variable is y, make Wherein, g Δ x represents the further correction to fitting result;When secondary variable is x, makeWherein, (Δ y) represents to enter fitting result g
One step correction.
Method the most according to claim 1, also includes step 3) from accessing in forecasting sequence from the beginning of certain access operation,
The expection read/write bandwidth of continuous multiple access operations and each data object accessing operation for exceeding predetermined number is more than
Bandwidth threshold, then skip these and access operation, continues with ensuing access and operates.
Method the most according to claim 2, also includes step 4), for accessing in forecasting sequence not by step 2) or step
3) subsequence s that process, that be made up of continuous multiple access operationsi,…,si+mIf the overall bandwidth of this subsequence is less than certain
The hypothesis continuous bandwidth of individual threshold value and this subsequence is more than certain threshold value, then by this subsequence, each accesses the data pair operated
As being written to new continuous print memory space, and the access entrance of these data objects is changed to new position;Wherein, described
Assume that continuous bandwidth represents if the bandwidth after being deposited continuously by the data object accessing operation in this subsequence;Subsequence
si,…,si+mOverall bandwidthliFor siThe length of data object, fiFor siThe expection of data object
Read/write bandwidth, m is the number accessing operation in this subsequence.
Method the most according to claim 2, described bandwidth threshold goes the maximum belt of the storage system of fragmentation as required
Width values or continuous bandwidth determine, described continuous bandwidth is to beat in the case of step-length is 0 in access, based on matching bandwidth letter
The matching bandwidth counted and obtain.
5. go a fragmentation system, described system to include:
Bandwidth determines device, accesses the expection read/write bandwidth of each data object accessing operation in forecasting sequence for determining,
Described expection read/write bandwidth is with data object length corresponding to this access operation and access step-length of beating as variable, based on plan
Crossed belt width function and the matching bandwidth that obtains, described matching bandwidth function is in the storage system needing fragmentation, with number
It is variable according to object length and access step-length of beating, based in different pieces of information object length with access the base measured under step-length of beating
Quasi-read/write bandwidth data, the bandwidth function expression obtained by approximating method;
Remove crumb units, for from accessing in forecasting sequence from the beginning of certain access operation, for exceeding the continuous many of predetermined number
These, less than bandwidth threshold, are accessed operation by the individual expection read/write bandwidth of operation and each data object accessing operation that accesses
Data object be sequentially written to new continuous print memory space;
Wherein, described matching bandwidth function obtains through the following steps:
Step a) is in the storage system needing fragmentation, with data object length x and access step-length y of beating as variable, surveys
Measure the benchmark read/write bandwidth data under one group of difference x and y;
Step b) is a selected master variable from x, y, and another is time variable;
Step c) find with master variable closest to master variable value;
Step d) finds the interval of time variable;
Step e) is based on the benchmark read/write bandwidth data matching x under measured different x and y, and y is read/write during any value
Bandwidth f (x, y);
Wherein, in described step c), when master variable is x, find xiMake Δ x=h (xi) absolute value of-h (x) is minimum;Work as main transformer
When amount is for y, find yiMake Δ y=h (yi) absolute value of-h (y) is minimum;
In described step d), when secondary variable is y, find yjMake y ∈ [yj,yj+1], when secondary variable is x, find xjMake x ∈
[xj,xj+1];
In step e), when secondary variable is y, make Wherein, g Δ x represents the further correction to fitting result;When secondary variable is x, makeWherein, (Δ y) represents to enter fitting result g
One step correction.
System the most according to claim 5, described in go crumb units to be additionally operable to from accessing forecasting sequence certain accesses behaviour
Work starts, for exceeding continuous multiple access operations and the expection read/write of each data object accessing operation of predetermined number
Band is wider than bandwidth threshold, then skip these and access operation, continues with ensuing access and operates.
System the most according to claim 6, described in go crumb units to be additionally operable to the company for being unsatisfactory for exceeding predetermined number
The expection read/write bandwidth of continuous multiple data object accessing operation both greater than or less than bandwidth threshold, by continuous multiple access
The subsequence s that operation is constitutedi,…,si+mIf the overall bandwidth of this subsequence is less than certain threshold value and the hypothesis of this subsequence
Continuous bandwidth is more than certain threshold value, then each data object accessing operation in this subsequence is written to new continuous print storage
Space, and the access entrance of these data objects is changed to new position;Wherein, described hypothesis continuous bandwidth represents if will
This subsequence accesses the bandwidth after the data object of operation is deposited continuously;Subsequence si,…,si+mOverall bandwidthliFor siThe length of data object, fiFor siThe expection read/write bandwidth of data object, m is this sub-sequence
Row access the number of operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310298326.2A CN103389946B (en) | 2013-07-16 | 2013-07-16 | Go flaking method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310298326.2A CN103389946B (en) | 2013-07-16 | 2013-07-16 | Go flaking method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103389946A CN103389946A (en) | 2013-11-13 |
CN103389946B true CN103389946B (en) | 2016-08-10 |
Family
ID=49534224
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310298326.2A Active CN103389946B (en) | 2013-07-16 | 2013-07-16 | Go flaking method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103389946B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108282378B (en) * | 2017-01-05 | 2021-11-09 | 阿里巴巴集团控股有限公司 | Method and device for monitoring network flow |
CN108021512A (en) * | 2017-11-22 | 2018-05-11 | 深圳忆联信息系统有限公司 | A kind of solid state hard disc mapping management process and solid state hard disc |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5930828A (en) * | 1997-03-26 | 1999-07-27 | Executive Software International | Real-time apparatus and method for minimizing disk fragmentation in a computer system |
CN101460932A (en) * | 2006-06-08 | 2009-06-17 | Nxp股份有限公司 | Device for remote defragmentation of an embedded device |
CN102929884A (en) * | 2011-08-10 | 2013-02-13 | 阿里巴巴集团控股有限公司 | Method and device for compressing virtual hard disk image file |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3629216B2 (en) * | 2001-03-08 | 2005-03-16 | 株式会社東芝 | Disk storage system having defragmentation function, and defragmentation method in the same system |
GB0517305D0 (en) * | 2005-08-24 | 2005-10-05 | Ibm | Method and apparatus for the defragmentation of a file system |
-
2013
- 2013-07-16 CN CN201310298326.2A patent/CN103389946B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5930828A (en) * | 1997-03-26 | 1999-07-27 | Executive Software International | Real-time apparatus and method for minimizing disk fragmentation in a computer system |
CN101460932A (en) * | 2006-06-08 | 2009-06-17 | Nxp股份有限公司 | Device for remote defragmentation of an embedded device |
CN102929884A (en) * | 2011-08-10 | 2013-02-13 | 阿里巴巴集团控股有限公司 | Method and device for compressing virtual hard disk image file |
Also Published As
Publication number | Publication date |
---|---|
CN103389946A (en) | 2013-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10678686B2 (en) | Estimation method for read and write access performance using average read retry times and a valid data weight ratio | |
CN103116536B (en) | The capacity check method of memory storage | |
US8468134B1 (en) | System and method for measuring consistency within a distributed storage system | |
US20090254719A1 (en) | Switch apparatus | |
CN105512264A (en) | Performance prediction method of concurrency working loads in distributed database | |
KR102419234B1 (en) | Product quality analysis support system | |
CN103389946B (en) | Go flaking method and system | |
CN110297743B (en) | Load testing method and device and storage medium | |
CN115269289A (en) | Slow disk detection method and device, electronic equipment and storage medium | |
CN109634960B (en) | Key value data storage method, device, equipment and storage medium | |
RU2011152760A (en) | OPTIMIZATION OF THE CODE USING A COMPILATOR WITH TWO ORDERING BYTE FOLLOWING | |
EP2950210A1 (en) | Control method and device for system log recording | |
CN116700623A (en) | Data storage method, system, electronic equipment and storage medium | |
CN116778986A (en) | Method and device for constructing read reference voltage calibration model, and calibration method and device | |
CN104573339A (en) | Method and device for determining geological parameters of shale gas reservoir | |
CN114625719B (en) | Dynamic set management method and system based on mobile filtering framework | |
CN104850548B (en) | A kind of method and system for realizing big data platform input/output processing | |
CN115938464A (en) | Test method and system of solid state disk, electronic device and readable storage medium | |
KR20170041837A (en) | Method and device for detecting authorized memory access | |
US20020032683A1 (en) | Method and device for sorting data, and a computer product | |
US8341376B1 (en) | System, method, and computer program for repartitioning data based on access of the data | |
CN103793339B (en) | Data Cache performance heuristic approach based on internal storage access storehouse distance | |
CN113419706A (en) | Rapid random number generation method and system and inspection method and system thereof | |
CN109710888B (en) | Natural gas pipeline damage prediction method and device based on punishment regression | |
CN109213967B (en) | Carrier rocket data prediction method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |