CN107766170B - Differential log type erasure code updating method for single storage pool - Google Patents

Differential log type erasure code updating method for single storage pool Download PDF

Info

Publication number
CN107766170B
CN107766170B CN201610710868.XA CN201610710868A CN107766170B CN 107766170 B CN107766170 B CN 107766170B CN 201610710868 A CN201610710868 A CN 201610710868A CN 107766170 B CN107766170 B CN 107766170B
Authority
CN
China
Prior art keywords
data
check
block
update
storage space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610710868.XA
Other languages
Chinese (zh)
Other versions
CN107766170A (en
Inventor
陈付
陕振
张淑萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Computer Technology and Applications
Original Assignee
Beijing Institute of Computer Technology and Applications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Computer Technology and Applications filed Critical Beijing Institute of Computer Technology and Applications
Priority to CN201610710868.XA priority Critical patent/CN107766170B/en
Publication of CN107766170A publication Critical patent/CN107766170A/en
Application granted granted Critical
Publication of CN107766170B publication Critical patent/CN107766170B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's

Abstract

The invention relates to a differential log type erasure code updating method for a single storage pool, and belongs to the technical field of computers. The invention combines the idea of carrying out full coverage updating on the data block and carrying out verification on the differential log of the verification block with a PDN-P updating mode, and provides a new data updating mode PDN-PDS.

Description

Differential log type erasure code updating method for single storage pool
Technical Field
The invention relates to the technical field of computers, in particular to a differential log type erasure code updating method for a single storage pool.
Background
Erasure coding techniques have been widely applied to data storage systems to achieve high fault tolerance, but there is a great burden on data update performance, especially for distributed block storage systems in which data update operations are frequent. Here, the update principle of CRS erasure codes and four conventional common update methods are introduced first: two typical erasure code update modes (DUM and PUM) and two partial update modes (PUM-P and PDN-P).
Erasure code data update complexity refers to the average number of parity blocks affected by modifying, updating, or overwriting a block of data. For example, for CRS (6+3, 3), each data block is protected by 3 parity blocks, so the optimal update complexity is 3. The update complexity can significantly affect the update performance of erasure coding systems, especially small block updates. When the storage cluster employs erasure codes, the update burden can be complex because the update process includes disk I/O, transmission bandwidth, and CPU computations. Assuming that the storage system uses CRS (k + m, m) coding, a data segment is divided into k data blocks, and then m parity blocks are generated by CRS coding. The manner of generating the check block can be described with reference to fig. 1. Wherein, the check block
Figure BDA0001087670260000011
In the process ofjIs updated to Dj' when, the data block needs to be updatedDjAnd all check blocks P1、P2、…、PmAre respectively updated to Dj′、P1′、P2′、…、Pm', while other data blocks are not modified.
Four common data update methods are described below:
(1) DUM (data blocks by a specific Updating manager) method.
Fig. 2 shows the data flow of the DUM method, and the update process of the DUM is identical to the data encoding process, i.e. the update manager node reads the original data blocks in this stripe that are not updated, and regenerates all new parity blocks together with the updated data blocks by means of the cauchy generator matrix. Suppose data block D1Will be updated to D1' the update manager reads the division D in this stripe through the network1All other data blocks except for the new data block D, and use the new data block D1' to re-encode. M new check blocks P generated later1‘、P2‘、…、Pm' and new data Block D1' parallel transmission to respective m check nodes and D1The data node of (1). In the DUM, in order to update one data block, k-1 data blocks need to be read from the disks of k-1 data nodes, m check blocks need to be written into the disks of m check nodes, a new data block needs to be written into one data node, and m + k blocks need to be transmitted through a network. Among these, the involved operations in DUM are:
Figure BDA0001087670260000021
(2) PUM (parity blocks by a specific Updating manager) method.
For small block updates, a natural idea is to regenerate new parity blocks by parity blocks and data blocks to be updated, which saves disk I/O and network transmission bandwidth over DUM. The theoretical formula of the PUM is derived below, in which a data block DxIs modified into Dx', then each new parity chunk is:
Figure BDA0001087670260000022
Figure BDA0001087670260000031
fig. 3 shows the data flow of the PUM: to update the data block D1For example, the update manager reads all parity blocks P in a stripe1、P2、…、PmAnd a data block D1(ii) a A new parity chunk P is then calculated by equation (7) above1‘、P2‘、…、Pm'; finally, all the updated check blocks and new data blocks D are processed1' to the respective check node and data node. In contrast to the DUM, the PUM reads only the m parity chunks to be updated instead of the k-1 data chunks. When m is<The disk I/O and the network transmission bandwidth of the PUM are less than that of the DUM at k-2; in addition, P in PUMiThe calculation of' includes only one subtraction, one multiplication and one addition. To update a data block, the PUM needs to read m +1 blocks from m check nodes and a data node, and write m +1 blocks to the local hard disk, while transmitting 2m +2 blocks over the network.
(3) PUM-P (Party blocks by an Updating Manager and the Party nodes) method.
In the PUM, the update manager performs all computations and updates the parity chunks Pi' to the check node. FIG. 4 shows the data flow of PUM-P: in PUM-P, the update manager calculates Pi *=Ci1×(D1′-D1) I ∈ {1,2, …, m }, and then Pi *Respectively sending the data to corresponding m check nodes through a network, and then sending the check blocks P from a local hard diskiDirectly reading the data into the memory without considering the network transmission load, and finally, carrying out addition operation on the check nodes to obtain a new check block Pi‘=Pi *+Pi. To update a block of data, the PUM-P needs to read m +1 blocks from the local hard disk and write m +1 blocksGo to the local hard disk and transmit m +2 blocks over the network. In contrast to PUM, PUM-P only reads one data block over the network because the check nodes can read the check block and perform an addition operation to compute a new check block. Thus, the overall I/O burden of PUM-P is lower than that of PUM.
FIG. 5 illustrates the data flow for PDN-P: based on the PUM-P, PDN-P will calculate Pi *=Ci1×(D1′-D1) Transfer from update manager to update data Block D1Thereby saving transmission D over the network1The burden of (2). Provisional calculation result Pi' sent directly to the corresponding check node. To update a block of data, the PDN-P needs to read m +1 blocks from the local hard disk, write m +1 blocks to the local hard disk, and transmit m +1 blocks over the network. That is, the update network traffic of the PDN-P is lower than that of the PUM-P.
Formula (7) describes the linear coding property of the erasure code, and three improvements of PUM, PUM-P and PDN-P are proposed on the basis of the DUM method by using the linear property, so that the operation of taking and summing all data blocks is avoided, and a new parity block is calculated based on the change of the data blocks. Although the above improvements reduce the amount of calculation, the network transmission bandwidth and disk I/O pressure are still large, and the data updating efficiency needs to be further improved.
For CRS coding, formula Pi′=Cix×(Dx′-Dx)+PiI e 1,2, …, m can be further generalized to the case of a partial update in a data block. Suppose now old data block DiOne word at offset o is updated, the old parity block P accordinglyiThe word at offset o needs to be updated. Can be expressed as:
Pi′(o)=Pi(o)+Cix(Dx′(o)-Dx(o)) (8) Here, CixTo generate coefficients of the check data, Pi' (o) and Pi(o) indicates new parity chunks P, respectivelyi' and old parity Block PiA word at offset o; dx' (o) and Dx(o) represent new data blocks D, respectivelyx' and old data block DxThe word at offset o. That is, the delta of a block update is multiplied by a coefficient CixThe delta of parity block updates is obtained.
Disclosure of Invention
Technical problem to be solved
The technical problem to be solved by the invention is as follows: how to design an erasure code data updating method to improve the updating efficiency of the check block, reduce the disk seek when reading a plurality of updates and save the storage space of the check node.
(II) technical scheme
In order to solve the above technical problem, the present invention provides a delta log erasure code updating method for a single storage pool, which comprises the following steps:
s1, receiving the erasure code data updating request of the client by the main object storage device, and determining the updated data block D according to the data positioning methodiAnd the corresponding updated offset position o, which in turn will update the offset position o and the updated data
Figure BDA0001087670260000051
Is sent to DiA node of (2); i belongs to {1,2, …, k }, and i represents the number of data blocks;
S2、Dithe node of (2) reads the data block D from the local diskiAnd according to D1The transmitted update information calculates updated data
Figure BDA0001087670260000052
And the original data DiDelta at data block position o
Figure BDA0001087670260000053
Namely, it is
Figure BDA0001087670260000054
Then, the strategy of checking partial log is adopted to carry out on the data block DiMaking overwrite updates, i.e.
Figure BDA0001087670260000055
Figure BDA0001087670260000056
Representing an area other than the update portion; finally, the offset position o and the data block update delta are updated
Figure BDA0001087670260000057
To the first parity chunk P1I.e. only in the check block P1A single storage pool is divided on the nodes to store data difference logs; the method for storing the data difference log is realized by adopting a self-adaptive management algorithm of a single storage pool so as to dynamically adjust and predict the size of a storage space of each updated check block in the storage pool;
s3, when verification updating fusion is needed, P1Updating all data of a storage pool with a delta
Figure BDA0001087670260000061
To other check blocks P2,P3,…,PmM represents the number of check blocks, and all the check blocks are fused and updated to obtain
Figure BDA0001087670260000062
j∈{1,2,…,m},Pj' (o) and Pj(o) indicates new parity chunks P, respectivelyj' and old parity Block PjWord at update offset position o, CjiTo generate coefficients of the check data.
Preferably, the adaptive management algorithm for the single storage pool in step S2 includes the following steps:
first, a default initial storage space size is set, which is large enough to hold all the check difference logs;
then, the contraction and fusion operations are performed periodically on each node:
wherein the puncturing operation is performed less frequently than the fusing operation and is set to condition Cond0, and the check block set having the check part log on a node representsFor S, for each time interval t, and for each check block p ∈ S, use rt(p) denotes the size of the memory space, ut(p) represents the usage of storage space, at the end of each time segment t, u is measured using an exponential weighted smoothed mean in obtaining the usage of storage spacet(p):
Figure BDA0001087670260000063
Here use (p) represents the amount of memory space that has been used in this time segment, rt(p) is the current storage space size of the parity block p, and α is a smoothing parameter;
determining the unnecessary size c (p) of the storage space calculated in the calculation of the contraction space corresponding to the parity block p, according to whether the usage rate reaches 90% and the condition Cond0 is satisfied, then contracting the unused space c (p), and rounding down to a multiple of the size of the data block:
Figure BDA0001087670260000071
ChunkSize indicates the size of the allocated storage data block, so that the storage space of the check block p in the next time period t +1 is:
rt+1(p)=rt(p)-c(p)
if only one data block size of storage space is left before the contraction, the contraction is not carried out any more;
when the usage rate of the unified storage pool space of one check block reaches 90%, performing fusion operation to fuse all the check difference logs and the check blocks, recovering all the storage space, namely the check block does not have corresponding storage space, and then reallocating the storage space when a new check difference log is generated.
Preferably, in step S1, the data location method is a replication run method based on extensible hash.
(III) advantageous effects
The invention combines the idea of carrying out full coverage updating on the data block and carrying out verification on the differential log of the verification block with a PDN-P updating mode, and provides a new data updating mode PDN-PDS.
Drawings
Fig. 1 is a diagram of check block generation for CRS encoding;
FIG. 2 is a diagram of a DUM update method;
FIG. 3 is a diagram of a PUM update method;
FIG. 4 is a diagram of a PUM-P update method;
figure 5 is a diagram of a PDN-P update method;
fig. 6 is a diagram of a single storage pool PDN-PDS method of an embodiment of the present invention;
fig. 7 is a comparative example between the conventional updating method and the updating method according to the embodiment of the present invention.
Detailed Description
In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.
With the linear coding property of the erasure code (equation (8)), a delta approach can be employed for data updating, eliminating redundant network traffic by transmitting only data deltas of the same size as the modified data range. It first determines the modification range of the data block and then calculates for this data block the difference, which is the amount of change between the old and new data within the modification range of the data block, multiplied by the corresponding coefficient to obtain the check difference. The range of modified data and the calculated data delta and check delta are then sent to the corresponding data node and all other check nodes, respectively, for updating. The transmission of complete data blocks and check blocks is not required, and only the range of modified data, the data difference and the check difference are required to be transmitted, so that the network transmission bandwidth is reduced, and the method is very suitable for a cluster storage environment.
TABLE 1 Log update Classification
Data delta not journaling Data delta log
Checking for difference not logs All over-write Data partial logs
Checking difference logs Verifying partial logs Full journal
In addition, classification can be made according to whether the data delta and the check delta are logged (see table 1). The log has the advantages that the updating efficiency is improved, namely, the data block or the check block is not required to be read to perform fusion operation with the differential quantity; the benefit of not logging (direct overwrite) is that reading a new block of data or a new check block after an update operation is faster, and there is no need to fuse the log with the original data to obtain the new data. The method for checking partial logs is very suitable for the characteristics of more data block reading and less updating and less check block reading and more updating in erasure code updating. Therefore, the present invention will improve the PDN-P using the idea of checking part of the log.
Specifically, a new data updating method PDN-PDS (PDN-P with Delta logging and Single Storage Pool) is provided by combining the idea of performing full coverage updating on a data block and performing verification partial log of differential log on a verification block with a PDN-P updating method. Fig. 6 depicts the data flow of the PDN-PDS and details the update procedure.
S1 main OSD (D)1) (Object Storage Device) receives erasure code data update request from client, and determines updated data block D according to data positioning method RUSH (extensible hash based Replication)iAnd the corresponding update offset position o, followed by the update position information o and the updated data
Figure BDA0001087670260000091
Is sent to DiA node;
S2、Direading data block D from local disk by nodeiAnd according to D1The transmitted update information calculates updated data
Figure BDA0001087670260000092
And the original data DiDelta at data block position o
Figure BDA0001087670260000093
Namely, it is
Figure BDA0001087670260000094
Then, the data block is subjected to overwriting updating by adopting a strategy of checking partial logs, namely
Figure BDA0001087670260000095
(
Figure BDA0001087670260000096
Indicating an area other than the update portion. ) Finally, the offset position o and the data block update delta are updated
Figure BDA0001087670260000097
Node P sent to check block1I.e. only in the check block P1The single storage pool is divided on the nodes to store the data difference log, and the storage pool adopts an efficient data difference log storage mode, namely an adaptive management algorithm (see the following table) of the single storage pool.
S3, when verification updating fusion is needed, P1Updating all data delta of storage pool
Figure BDA0001087670260000101
Is sent to P2,P3,…,PmAnd performing fusion updating on all check blocks
Figure BDA0001087670260000102
j belongs to {1,2, …, m }, m represents the number of check blocks, Pj' (o) and Pj(o) indicates new parity chunks P, respectivelyj' and old parity Block PjWord at update offset position o, CjiTo generate coefficients of the check data.
PDN-PDS in a single storage pool, i.e. at the first check block node P1Using a single storage pool to store and manage all delta data logs corresponding to this parity chunk
Figure BDA0001087670260000103
While the other m-1 parity chunks do not need to perform storage management of the parity delta log (here, P)1Is general in that the ECG packets are spread, i.e., each node may be P at the same time1And PjJ ∈ {2,3, …, m }). However, using only one storage pool may reduce the reliability of parity data, i.e., when P is1Node and data block DiWhen the nodes fail simultaneously, the previous data update is completely disabled. Thus, a further measure can be taken to narrow the time window for the check block node to perform data delta update log fusion, and can assume that a data block D is within a suitable time window (e.g., 15 minutes or 30 minutes)iNode and a check block P1It is rare that nodes fail simultaneously. Thus, P of a single poolThe DN-PDS can save the storage space and improve the updating efficiency on the premise of less influence on the data reliability. Wherein, the check block P1Data delta log generated by corresponding multiple data block updating
Figure BDA0001087670260000104
A free area stored on the check node. However, this method easily causes fragmentation of the log storage space and waste of space due to voids after garbage collection, and when the check block and the check difference log are updated in a fusion manner, the check difference log placed in a dispersed manner may bring a large disk seek burden, thereby resulting in low reading efficiency of a new check block. Therefore, an efficient data delta log storage method, namely an adaptive management algorithm of a single storage pool, needs to be designed.
Adaptive management algorithm for single storage pool:
in order to save the storage space and improve the storage efficiency of the storage pool, the invention designs a working load-aware storage pool adaptive management scheme, which can dynamically adjust and predict the size of the storage space of each updated check block in the storage pool. This solution has three main components: (1) predicting a storage space size of each parity block in a next time segment using the measured workload pattern; (2) shrinking the storage space and releasing unused space back to the storage pool; (3) and fusing the check difference quantity in the storage space with the corresponding check block. In order to avoid small unusable holes in the storage space recycled by the shrink operation, the size of the storage space and the shrink size need to be set to be multiples of the size of the parity chunks, which ensures that the whole data chunk or parity chunk can be placed in the recycled storage space.
TABLE 2 adaptive management algorithm for single pool
Figure BDA0001087670260000111
Figure BDA0001087670260000121
The algorithm of Table 2 describes the basic flow of an adaptive management scheme for unified storage pools:
first, a default initial storage size (2 or 4 times the data block size) is set that is large enough to hold all of the check delta logs.
Then, the contraction and fusion operations are performed periodically on each node:
the condition Cond0 is added to make the frequency of the contraction operation lower than that of the fusion operation, and the purpose of adding this condition is to avoid too many fusion operations caused by too small storage space due to frequent contraction operations, which would cause fusion update operations of all the check difference logs of the check block, and this operation is costly. The check block set with check part log on one node is represented as S, and for each time interval t and each check block p ∈ S, r is usedt(p) denotes the size of the memory space, ut(p) represents the usage of the storage space. Intuitively, ut(p) represents the percentage of used storage space. At the end of each time segment t, u is measured using an exponential weighted smoothed average in obtaining storage pool utilization getUtilityt(p):
Figure BDA0001087670260000122
Here use (p) represents the amount of memory space that has been used in this time segment, rt(p) is the current storage size of the parity block p, and α is the smoothing parameter (usually 0.3).
According to whether the utilization rate utility reaches 90% or not and whether the condition Cond0 is satisfied or not, an unnecessary storage space size c (p) is determined, and the size is the size of the recycled storage space of the corresponding check block p calculated in the calculation of the contraction space computeShrinkSize. Then, the space c (p) that is no longer used is aggressively shrunk and rounded down to a multiple of the chunk size:
Figure BDA0001087670260000131
ChunkSize indicates the allocated storage data block size. The doShrink function of row 8 of table 2 will attempt to recover from the existing memory size rt(p) shrinking the space of c (p) size. Thus, the storage space size of the parity block p in the next time period t +1 is:
rt+1(p)=rt(p)-c(p) (11)
if only one block size of storage space is left before puncturing, no puncturing will be performed.
When the usage rate of the unified storage pool space of one check block reaches 90%, performing a domarge fusion operation to fuse all the check difference logs and the check blocks, and recovering all the storage spaces, namely the check block does not have a corresponding storage space, and then reallocating the storage spaces when a new check difference log is generated. Here domarge is not on the update path, so that the I/O access performance of the system is only affected to a limited extent. In addition, the domarge operation may also be triggered when a check block needs to be read (degraded case or failover). The triggering frequency of the domarge operation is not high in general, because of two reasons: 1) the initial space of each update check block is large, and the common updates are very small, so that the use rate of 90% cannot be reached quickly; 2) a degraded condition or a fault recovery triggers a domarge operation, but both of these fault occurrences are less likely to occur relative to normal reads and writes.
The PDN-PDS of a single storage pool is provided by combining PDN-P and a verification partial log updating idea based on dispersion, and an EC (4, 2) encoding example is used.
The incoming data stream in fig. 7 describes the sequence of operations: (1) writing the first data segment containing data blocks a and b, (2) updating part a 'of a, (3) writing another new data segment containing data blocks c and d, (4) last updating part b' of data block b. It can be seen that:
(1) PDN-P overwrites the data block for data updating, and simultaneously verifies and overwrites the verification blocks of the verification nodes 1 and 2;
(2) the PDN-PDS overwrites the data update on the data block, using a single storage pool behind the check block of check node 1 to deposit all data delta logs (Δ a + Δ b).
With respect to PDN-P, the PDN-PDS of the present invention uses only one storage pool to store numbers
The data difference logs are placed together, so that the updating efficiency of the check block is improved, and the reading is reduced
Multiple disk seek in updating process is adopted, storage space of check node is saved, and simultaneously, the method needs to be carried out
Certain measures are taken to further ensure the reliability of the data.
The indexes considered in the five aspects are as follows:
(1) calculation amount: updating a data block, updating a check block and managing a storage pool;
(2) transmission bandwidth: the network transmits the data block difference quantity;
(3) disk I/O: data block disk read/write and check block disk read/write;
(4) storage space: difference logs of data nodes and difference logs of check nodes;
(5) data reliability: fault tolerance of data.
The three levels "high", "medium", and "low" are used in table 3 below to qualitatively compare the various aspects of update scheme complexity. The advantage of using a single storage pool for the PDN-PDS is that the amount of computation is reduced, the transmission bandwidth and the disk I/O are further reduced, and less disk space is used, but there is a certain cost that the data reliability becomes "high", which means that the data reliability of the PDN-PDS is slightly reduced, i.e. the latest update data is lost in case of a failure of (a settable) one updated data block node and the first check block node at the same time within a certain time period. However, a single node failure accounts for 99.75% of all fault fixes, such missing update data is rare, and the time period for check update fusion can be reduced appropriately to further reduce this possibility.
TABLE 3 update complexity contrast for improved schemes
Figure BDA0001087670260000151
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (2)

1. A differential log erasure code updating method for a single storage pool, comprising the steps of:
s1, receiving the erasure code data updating request of the client by the main object storage device, and determining the updated data block D according to the data positioning methodiAnd the corresponding updated offset position o, which in turn will update the offset position o and the updated data
Figure FDA0002953260210000011
Is sent to DiA node of (2); i belongs to {1,2, …, k }, i represents a data block sequence number, and k represents the number of data blocks;
S2、Dithe node of (2) reads the data block D from the local diskiAnd calculating updated data according to the update information sent by the main object storage device
Figure FDA0002953260210000012
And the original data DiDelta at data block position o
Figure FDA0002953260210000013
Namely, it is
Figure FDA0002953260210000014
Then, the strategy of checking partial log is adopted to carry out on the data block DiCarry out coveringWrite updates, i.e.
Figure FDA0002953260210000015
Figure FDA0002953260210000016
Representing an area other than the update portion; finally, the offset position o and the data block update delta are updated
Figure FDA0002953260210000017
To the first parity chunk P1I.e. only in the check block P1A single storage pool is divided on the nodes to store data difference logs; the method for storing the data difference log is realized by adopting a self-adaptive management algorithm of a single storage pool so as to dynamically adjust and predict the size of a storage space of each updated check block in the storage pool;
s3, when verification updating fusion is needed, P1Updating all data of a storage pool with a delta
Figure FDA0002953260210000018
To other check blocks P2,P3,…,PmM represents the number of check blocks, and all the check blocks are fused and updated to obtain
Figure FDA0002953260210000019
j∈{ 1,2,…,m} ,Pj' (o) and Pj(o) indicates new parity chunks P, respectivelyj' and old parity Block PjWord at update offset position o, CjiCoefficients for generating the check data;
the adaptive management algorithm for the single storage pool in step S2 includes the following steps:
first, a default initial storage space size is set, which is large enough to hold all the check difference logs;
then, the contraction and fusion operations are performed periodically on each node:
wherein the contraction operation is performed less frequently than the fusion operation and is set to the condition Cond0, the check block set on a node having a check partial log is represented as S, and r is used for the t-th time interval and each check block p ∈ St(p) represents the size of the memory space in the t-th time interval, ut(p) represents the utilization rate of the storage space in the t-th time interval, and at the end of the t-th time interval, u is measured by using an exponential weight smooth mean value when the utilization rate of the storage space is obtainedt(p):
Figure FDA0002953260210000021
Here use (p) represents the amount of memory space that has been used in this time segment, rt(p) is the current storage space size of the parity block p, and α is a smoothing parameter; u. oft-1(p) represents the usage rate of memory space in the t-1 th time interval;
determining the unnecessary size c (p) of the storage space calculated in the calculation of the contraction space corresponding to the parity block p, according to whether the usage rate reaches 90% and the condition Cond0 is satisfied, then contracting the unused space c (p), and rounding down to a multiple of the size of the data block:
Figure FDA0002953260210000022
ChunkSize indicates the size of the allocated storage data block, so that the size of the storage space of the check block p in the t +1 th time interval is:
rt+1(p)=rt(p)-c(p)
if only one data block size of storage space is left before the contraction, the contraction is not carried out any more;
when the usage rate of the unified storage pool space of one check block reaches 90%, performing fusion operation to fuse all the check difference logs and the check blocks, recovering all the storage space, namely the check block does not have corresponding storage space, and then reallocating the storage space when a new check difference log is generated.
2. The method of claim 1, wherein in step S1, the data location method is a replication run method based on extensible hash.
CN201610710868.XA 2016-08-23 2016-08-23 Differential log type erasure code updating method for single storage pool Active CN107766170B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610710868.XA CN107766170B (en) 2016-08-23 2016-08-23 Differential log type erasure code updating method for single storage pool

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610710868.XA CN107766170B (en) 2016-08-23 2016-08-23 Differential log type erasure code updating method for single storage pool

Publications (2)

Publication Number Publication Date
CN107766170A CN107766170A (en) 2018-03-06
CN107766170B true CN107766170B (en) 2021-04-09

Family

ID=61264215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610710868.XA Active CN107766170B (en) 2016-08-23 2016-08-23 Differential log type erasure code updating method for single storage pool

Country Status (1)

Country Link
CN (1) CN107766170B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109491835B (en) * 2018-10-25 2022-04-12 哈尔滨工程大学 Data fault-tolerant method based on dynamic block code
CN110262922B (en) * 2019-05-15 2021-02-09 中国科学院计算技术研究所 Erasure code updating method and system based on duplicate data log
CN111522825A (en) * 2020-04-09 2020-08-11 陈尚汉 Efficient information updating method and system based on check information block shared cache mechanism
CN114138526A (en) * 2021-11-09 2022-03-04 华中科技大学 Mixed data updating method and storage system for erasure codes of memory key values

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102681793A (en) * 2012-04-16 2012-09-19 华中科技大学 Local data updating method based on erasure code cluster storage system
US9063910B1 (en) * 2011-11-15 2015-06-23 Emc Corporation Data recovery after triple disk failure
CN105359108A (en) * 2013-08-05 2016-02-24 英特尔公司 Storage systems with adaptive erasure code generation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8103903B2 (en) * 2010-02-22 2012-01-24 International Business Machines Corporation Read-modify-write protocol for maintaining parity coherency in a write-back distributed redundancy data storage system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9063910B1 (en) * 2011-11-15 2015-06-23 Emc Corporation Data recovery after triple disk failure
CN102681793A (en) * 2012-04-16 2012-09-19 华中科技大学 Local data updating method based on erasure code cluster storage system
CN105359108A (en) * 2013-08-05 2016-02-24 英特尔公司 Storage systems with adaptive erasure code generation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Two Efficient Partial-Updating Schemes for Erasure-Coded Storage Clusters";Fenghao Zhang 等;《2012 IEEE Seventh International Conference on Networking, Architecture, and Storage》;20120924;第21-30页 *

Also Published As

Publication number Publication date
CN107766170A (en) 2018-03-06

Similar Documents

Publication Publication Date Title
CN107766170B (en) Differential log type erasure code updating method for single storage pool
US11150986B2 (en) Efficient compaction on log-structured distributed file system using erasure coding for resource consumption reduction
US11003533B2 (en) Data processing method, system, and apparatus
US10613933B2 (en) System and method for providing thin-provisioned block storage with multiple data protection classes
EP3014450B1 (en) Erasure coding across multiple zones
KR20150061258A (en) Operating System and Method for Parity chunk update processing in distributed Redundant Array of Inexpensive Disks system
US11698728B2 (en) Data updating technology
US20160006463A1 (en) The construction of mbr (minimum bandwidth regenerating) codes and a method to repair the storage nodes
US20210271557A1 (en) Data encoding, decoding and recovering method for a distributed storage system
CN110442535B (en) Method and system for improving reliability of distributed solid-state disk key value cache system
CN108062419B (en) File storage method, electronic equipment, system and medium
CN103544202A (en) Method and system used for arranging data processing
CN112835743B (en) Distributed account book data storage optimization method and device, electronic equipment and medium
WO2014056381A1 (en) Data redundancy implementation method and device
US11886705B2 (en) System and method for using free space to improve erasure code locality
Shen et al. Cross-rack-aware updates in erasure-coded data centers
CN109582213A (en) Data reconstruction method and device, data-storage system
JP7355616B2 (en) Distributed storage systems and how to update parity in distributed storage systems
CN114816837A (en) Erasure code fusion method and system, electronic device and storage medium
CN116257186A (en) Data object erasure code storage method, device, equipment and medium
US11507283B1 (en) Enabling host computer systems to access logical volumes by dynamic updates to data structure rules
CN113377569A (en) Method, apparatus and computer program product for recovering data
WO2023082556A1 (en) Memory key value erasure code-oriented hybrid data update method, and storage medium
WO2023082629A1 (en) Data storage method and apparatus, electronic device, and storage medium
US20200341846A1 (en) Method, device and computer program product for managing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant