CN107766170A - The Journaled correcting and eleting codes update method of residual quantity of single storage pool - Google Patents

The Journaled correcting and eleting codes update method of residual quantity of single storage pool Download PDF

Info

Publication number
CN107766170A
CN107766170A CN201610710868.XA CN201610710868A CN107766170A CN 107766170 A CN107766170 A CN 107766170A CN 201610710868 A CN201610710868 A CN 201610710868A CN 107766170 A CN107766170 A CN 107766170A
Authority
CN
China
Prior art keywords
data
check
block
update
storage pool
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610710868.XA
Other languages
Chinese (zh)
Other versions
CN107766170B (en
Inventor
陈付
陕振
张淑萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Computer Technology and Applications
Original Assignee
Beijing Institute of Computer Technology and Applications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Computer Technology and Applications filed Critical Beijing Institute of Computer Technology and Applications
Priority to CN201610710868.XA priority Critical patent/CN107766170B/en
Publication of CN107766170A publication Critical patent/CN107766170A/en
Application granted granted Critical
Publication of CN107766170B publication Critical patent/CN107766170B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of Journaled correcting and eleting codes update method of the residual quantity of single storage pool, belong to field of computer technology.The present invention will update to data block all standing and the thinking of the check part daily record of check block progress residual quantity daily record be combined with PDN P update modes, propose a kind of new data refresh mode PDN PDS, relative to PDN P, PDN PDS are only put together data residual quantity daily record using a storage pool, both the renewal efficiency for having improved check block decreases disk tracking when reading multiple renewals, saves the memory space of check-node.

Description

Differential log type erasure code updating method for single storage pool
Technical Field
The invention relates to the technical field of computers, in particular to a differential log type erasure code updating method for a single storage pool.
Background
Erasure coding techniques have been widely applied to data storage systems to achieve high fault tolerance, but there is a great burden on data update performance, especially for distributed block storage systems in which data update operations are frequent. Here, the update principle of CRS erasure codes and four conventional common update methods are introduced first: two typical erasure code update modes (DUM and PUM) and two partial update modes (PUM-P and PDN-P).
Erasure code data update complexity refers to the average number of parity blocks affected by modifying, updating, or overwriting a block of data. For example, for CRS (6 +3,3), each data block is protected by m =3 parity blocks, so the optimal update complexity is 3. The update complexity can significantly affect the update performance of erasure coding systems, especially small block updates. When the storage cluster employs erasure codes, the update burden can be complex because the update process includes disk I/O, transmission bandwidth, and CPU computations. Assuming that the storage system uses CRS (k + m, m) coding, a data segment is divided into k data blocks, and then m parity blocks are generated by CRS coding. The manner of generating the check block can be illustrated with fig. 1. Wherein, the check block
In the step D j Is updated to D j ' when, data block D needs to be updated j And all check blocks P 1 、P 2 、…、P m Are respectively updated to D j ′、P 1 ′、P 2 ′、…、P m ', while other data blocks are not modified.
Four common data update methods are described below:
(1) DUM (Data blocks by a specific Updating Manager) method.
Fig. 2 shows the data flow of the DUM method, and the update process of the DUM is identical to the data encoding process, i.e. the update manager node reads the original data blocks in this stripe that are not updated, and regenerates all new parity blocks together with the updated data blocks by means of the cauchy generator matrix. Suppose data block D 1 Will be updated to D 1 ', the update manager reads the divide-by-D in this stripe over the network 1 All other data blocks except for the new data block D, and use the new data block D 1 ' to re-encode. M new check blocks P generated later 1 ‘、P 2 ‘、…、P m ' and New data Block D 1 ' parallel transmission to respective m check nodes and D 1 The data node of (2). In DUM, to update a block of data,it is necessary to read k-1 data blocks from the disks of k-1 data nodes, write m check blocks to the disks of m check nodes and write a new data block to one data node, and transmit m + k blocks through the network. The involved operations in the DUM are:
(2) The PUM (Parity blocks by a specific Updating Manager) method.
For small block updates, a natural idea is to regenerate new parity blocks by parity blocks and data blocks to be updated, which saves disk I/O and network transmission bandwidth over DUM. The theoretical formula for the PUM is derived below, where a data block D x Is modified into D x ', then each new parity chunk is:
fig. 3 shows the data flow of PUM: to update data block D 1 For example, the update manager reads all parity blocks P in a stripe 1 、P 2 、…、P m And data block D 1 (ii) a Then, a new parity chunk P is calculated by the above equation (7) 1 ‘、P 2 ‘、…、P m '; finally, all the updated check blocks and new data blocks D are processed 1 ' to the respective check node and data node. In contrast to the DUM, the PUM reads only the m parity chunks to be updated instead of the k-1 data chunks. When m is&At k-2, both the disk I/O and the network transmission bandwidth of the PUM are less than those of the DUM; in addition, P in PUM i The calculation of' includes only one subtraction, one multiplication and one addition. To update a block, the PUM needs to update the block from m check nodes and oneM +1 blocks are read by a number of data nodes, and m +1 blocks are written into a local hard disk, while 2m +2 blocks need to be transmitted over the network.
(3) PUM-P (Party blocks by an Updating Manager and the Party Nodes) method.
In the PUM, the update manager performs all computations and updates the new parity chunks P i ' to the check node. FIG. 4 shows the data flow of PUM-P: in PUM-P, the update manager calculates P i * =C i1 ×(D 1 ′-D 1 ) I ∈ {1,2, \8230;, m }, and then P i * Respectively transmitting the data to m check nodes via network, and transmitting the check blocks P from local hard disk i Directly reading into memory without considering network transmission load, and finally performing an addition operation on the check nodes to obtain a new check block P i ‘=P i * +P i . To update a block of data, PUM-P needs to read m +1 blocks from the local hard disk, write m +1 blocks to the local hard disk, and transmit m +2 blocks over the network. In contrast to PUM, PUM-P only reads one data block over the network because the check nodes can read the check block and perform an addition operation to compute a new check block. Thus, the overall I/O burden of PUM-P is lower than that of PUM.
FIG. 5 illustrates the data flow for PDN-P: based on the PUM-P, PDN-P will calculate P i * =C i1 ×(D 1 ′-D 1 ) Transfer from update manager to update data Block D 1 Thereby saving transmission D over the network 1 The burden of (2). Provisional calculation result P i ' sent directly to the corresponding check node. To update a block of data, the PDN-P needs to read m +1 blocks from the local hard disk, write m +1 blocks to the local hard disk, and transmit m +1 blocks over the network. That is, the updated network traffic of the PDN-P is lower than that of the PUM-P.
Formula (7) describes the linear coding property of the erasure code, and three improvements of PUM, PUM-P and PDN-P are proposed on the basis of the DUM method by using the linear property, so that the operation of taking and summing all data blocks is avoided, and a new parity block is calculated based on the change of the data blocks. Although the above improvements reduce the amount of computation, the network transmission bandwidth and the pressure of disk I/O are still large, and the data updating efficiency needs to be further improved.
For CRS coding, formula P i ′=C ix ×(D x ′-D x )+P i I e {1,2, \8230;, m } can be further generalized to a partial update in a data block. Suppose now old data block D i One word at offset o is updated, the old parity block P accordingly i The word at offset o needs to be updated. Can be expressed as:
P i ′(o)=P i (o)+C ix (D x ′(o)-D x (o)) (8)
here, C ix To generate coefficients of the check data, P i ' (o) and P i (o) indicates new check blocks P, respectively i ' and old parity Block P i A word at offset o; d x ' (o) and D x (o) represent new data blocks D, respectively x ' and old data block D x The word at offset o. That is, the delta of the block update is multiplied by a coefficient C ix The delta of parity block updates is obtained.
Disclosure of Invention
Technical problem to be solved
The technical problem to be solved by the invention is as follows: how to design an erasure code data updating method to improve the updating efficiency of the check block, reduce the disk seek when reading a plurality of updates and save the storage space of the check node.
(II) technical scheme
In order to solve the above technical problem, the present invention provides a delta log erasure code updating method for a single storage pool, which comprises the following steps:
s1, receiving an erasure code data updating request of a client by a main object storage device, and determining an updated data block D according to a data positioning method i And correspondingly updates the offset position o, which in turn will update the offset position o andnew dataIs sent to D i A node of (2); i belongs to {1,2, \8230;, k }, i represents the number of data blocks;
S2、D i the node of (2) reads the data block D from the local disk i And according to D 1 The transmitted update information calculates updated dataAnd the original data D i Delta at data block position oNamely thatThen, the strategy of checking partial log is adopted to carry out on the data block D i Making overwrite updates, i.e. Representing an area other than the update portion; finally, the offset position o and the data block update delta are updatedTo the first parity chunk P 1 I.e. only in the check block P 1 A single storage pool is divided on the nodes to store data difference logs; the method for storing the data difference log is realized by adopting a self-adaptive management algorithm of a single storage pool so as to dynamically adjust and predict the size of a storage space of each updated check block in the storage pool;
s3, when verification updating fusion is needed, P 1 Updating all data of a storage pool with a deltaIs sent to other check blocksP 2 ,P 3 ,…,P m M represents the number of check blocks, and all the check blocks are fused and updated to obtainj∈{1,2,…,m},P j ' (o) and P j (o) indicates new check blocks P, respectively j ' and old parity Block P j Word at update offset position o, C ji To generate coefficients of the check data.
Preferably, the adaptive management algorithm for the single storage pool in step S2 includes the following steps:
first, a default initial storage space size is set, which is large enough to hold all the check difference logs;
then, the contraction and fusion operations are performed periodically on each node:
the frequency of contraction operation is lower than that of fusion operation and is set as a condition Cond0, a check block set with a check part log on one node is represented as S, and r is used for each time interval t and each check block p belonging to S t (p) denotes the size of the memory space, u t (p) represents the usage of the storage space, and at the end of each time interval t, u is measured using an exponential weighted smoothed mean in obtaining the usage of the storage space t (p):
Where use (p) denotes the amount of memory space that has been used in this time segment, r t (p) is the current storage space size of the check block p, and α is a smoothing parameter;
depending on whether the utilization reaches 90% and the condition Cond0 is met, an unnecessary memory size c (p) is determined, which is the size of the reclaimed memory of the corresponding parity block p calculated in the calculation of the shrinking space, and then the unused space c (p) is shrunk and rounded down to a multiple of the data block size:
ChunkSize indicates the size of the allocated storage data block, so that the storage space size of the check block p in the next time period t +1 is:
r t+1 (p)=r t (p)-c(p)
if only one data block size of storage space is left before the contraction, the contraction will not be performed;
when the usage rate of the unified storage pool space of one check block reaches 90%, performing fusion operation to fuse all the check difference logs and the check blocks, recovering all the storage space, namely the check block does not have corresponding storage space, and then reallocating the storage space when a new check difference log is generated.
Preferably, in step S1, the data location method is a replication run method based on extensible hash.
(III) advantageous effects
The invention combines the idea of carrying out full coverage updating on the data block and carrying out verification on the differential log of the verification block with a PDN-P updating mode, and provides a new data updating mode PDN-PDS.
Drawings
Fig. 1 is a diagram of check block generation for CRS encoding;
FIG. 2 is a diagram of a DUM update method;
FIG. 3 is a diagram of a PUM update method;
FIG. 4 is a diagram of a PUM-P update method;
figure 5 is a diagram of a PDN-P update method;
fig. 6 is a diagram of a single storage pool PDN-PDS method of an embodiment of the present invention;
fig. 7 is a comparative example between the conventional updating method and the updating method according to the embodiment of the present invention.
Detailed Description
In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.
With the linear coding property of the erasure code (equation (8)), a delta approach can be employed for data updating, eliminating redundant network traffic by transmitting only data deltas of the same size as the modified data range. It first determines the modification range of the data block and then calculates for this data block the difference, which is the amount of change between the old and new data within the modification range of the data block, multiplied by the corresponding coefficient to obtain the check difference. The range of modified data and the calculated data delta and check delta are then sent to the corresponding data node and all other check nodes, respectively, for updating. The transmission of complete data blocks and check blocks is not required, and only the range of modified data, the data difference and the check difference are required to be transmitted, so that the network transmission bandwidth is reduced, and the method is very suitable for a cluster storage environment.
TABLE 1 Log update Classification
Data delta not journaling Data delta log
Checking for difference not logs All over-write Data partial logs
Checking difference logs Verifying partial logs Full journal
In addition, a classification can be made according to whether the data delta and the check delta are logged (see table 1). The log has the advantages that the updating efficiency is improved, namely, the data block or the check block is not required to be read to perform fusion operation with the differential quantity; the advantage of not logging (direct overwrite) is that reading a new block of data or a new check block after an update operation is faster and does not require fusing the log with the original data to obtain the new data. The method for checking partial logs is very suitable for the characteristics of more data block reading and less updating and less check block reading and more updating in erasure code updating. Therefore, the invention will improve PDN-P using the idea of checking partial logs.
Specifically, a new data updating method PDN-PDS (PDN-P with Delta logging and Single Storage Pool) is provided by combining the idea of performing full coverage updating on a data block and performing verification partial log of differential log on a verification block with a PDN-P updating method. Fig. 6 depicts the data flow of the PDN-PDS and details the update procedure.
S1, main OSD (D) 1 ) (Object Storage Device) receives erasure code data updating request of client, and determines updated data block D according to data positioning method RUSH (Replication based on extensible hash) i And the corresponding update offset position o, followed by the update position information o and the updated dataIs sent to D i A node;
S2、D i reading data block D from local disk by node i And according to D 1 The transmitted update information calculates updated dataAnd the original data D i Delta at data block location oNamely, it isThe data block is then overwritten and updated using a policy that checks a portion of the journal, i.e.(Indicating areas other than the update portion. ) Finally, the offset position o and the data block update delta are updatedNode P sent to check block 1 I.e. only in the check block P 1 The single storage pool is divided on the nodes to store the data difference log, and the storage pool adopts an efficient data difference log storage mode, namely an adaptive management algorithm (see the following table) of the single storage pool.
S3, when verification updating fusion is needed, P 1 Updating all data delta of storage poolIs sent to P 2 ,P 3 ,…,P m And performing fusion updating on all check blocksj belongs to {1,2, \8230;, m }, m represents the number of check blocks, P j ' (o) and P j (o) indicates new parity chunks P, respectively j ' and old parity Block P j Word at update offset position o, C ji To generate coefficients of the check data.
PDN-PDS in a single storage pool, i.e. at the first check block node P 1 Using a single storage pool to store and manage all delta data logs corresponding to this parity chunkWhile the other m-1 parity chunks do not need to perform storage management of the parity delta log (here, P) 1 Is general in that the ECG packets are spread, i.e., each node may be P at the same time 1 And P j J ∈ {2,3, \8230;, m }). However, using only one storage pool may reduce the reliability of parity data, i.e., when P is 1 Node and data block D i When the nodes fail at the same time, the previous data update is completely disabled. Thus, a further measure can be taken to narrow the time window for the check block node to perform data delta update log fusion, and can assume that a data block D is within a suitable time window (e.g., 15 minutes or 30 minutes) i Node and a check block P 1 It is rare that nodes fail simultaneously. Therefore, the PDN-PDS with a single storage pool can save storage space and improve updating efficiency on the premise of less influence on data reliability. Wherein, the check block P 1 Data delta log generated by corresponding multiple data block updatingA free area stored on the check node. However, this method easily causes fragmentation of the log storage space and waste of space due to voids after garbage collection, and when the check block and the check difference log are updated in a fusion manner, the check difference log placed in a dispersed manner may bring a large disk seek burden, thereby resulting in low reading efficiency of a new check block. Therefore, an efficient data differential log storage method, namely an adaptive management algorithm of a single storage pool, needs to be designed.
Adaptive management algorithm for single storage pool:
in order to save the storage space and improve the storage efficiency of the storage pool, the invention designs a working load-aware storage pool adaptive management scheme, which can dynamically adjust and predict the size of the storage space of each updated check block in the storage pool. This scheme has three main components: (1) Predicting a storage space size of each check block in a next time segment using the measured workload pattern; (2) Shrinking the storage space and releasing the unused space back to the storage pool; (3) And fusing the check difference quantity in the storage space with the corresponding check block. In order to avoid small unusable holes in the storage space recycled by the shrink operation, the size of the storage space and the shrink size need to be set to be multiples of the size of the parity chunks, which ensures that the whole data chunk or parity chunk can be placed in the recycled storage space.
TABLE 2 adaptive management algorithm for single storage pool
The algorithm of Table 2 describes the basic flow of an adaptive management scheme for a unified storage pool:
first, a default initial storage size (2 or 4 times the data block size) is set that is large enough to hold all of the check delta logs.
Then, the contraction and fusion operations are performed periodically on each node:
the condition Cond0 is added to make the frequency of the contraction operation lower than that of the fusion operation, and the purpose of adding this condition is to avoid excessive fusion operation caused by too small storage space due to frequent contraction operation, which would cause all the checking differential logs of the checking blockThe operation of merging and updating is relatively expensive. The check block set with check part logs on one node is represented as S, and r is used for each time interval t and each check block p epsilon S t (p) represents the size of the storage space, u t (p) represents the usage of the storage space. Intuitively, u t (p) represents the percentage of used storage space. At the end of each time segment t, u is measured using an exponential-weight smoothed mean in obtaining storage pool utilization getUtility t (p):
Where use (p) denotes the amount of memory space that has been used in this time segment, r t (p) is the current storage size of the parity block p, and α is the smoothing parameter (commonly 0.3).
According to whether the utilization reaches 90% or not and whether the condition Cond0 is satisfied or not, an unnecessary storage space size c (p) is then determined, which is the size of the reclaimed storage space of the corresponding check block p calculated in calculating the contracted space computeShrinkSize. Then, the unused space c (p) is aggressively shrunk and rounded down to a multiple of the chunk size:
ChunkSize indicates the allocated storage data block size. The doShrink function of row 8 of table 2 will attempt to recover from the existing memory size r t The space of c (p) size is contracted in (p). Thus, the storage space size of the parity block p in the next time period t +1 is:
r t+1 (p)=r t (p)-c(p) (11)
if only one block size of storage space is left before puncturing, no puncturing will be performed.
When the unified storage pool space utilization rate of one check block reaches 90%, performing a dowerge fusion operation to fuse all the check difference amount logs with the check block, recovering all the storage spaces, namely the check block does not have a corresponding storage space, and then redistributing the storage spaces when a new check difference amount log is generated. Here domarge is not on the update path, so that the I/O access performance of the system is only affected to a limited extent. In addition, the domarge operation may also be triggered when a check block needs to be read (degraded case or failover). The triggering frequency of the domarge operation is not high in general, because of two reasons: 1) The initial space of each update check block is large, and the common updates are very small, so that the use rate of 90% cannot be reached quickly; 2) A degraded condition or a fault recovery triggers a domarge operation, but both of these fault occurrences are less likely to occur relative to normal reads and writes.
The PDN-PDS of a single storage pool is provided by combining PDN-P and a verification partial log updating idea based on dispersion, and an EC (4, 2) coding example is used.
The incoming data stream in fig. 7 describes the sequence of operations: (1) write the first data segment containing data blocks a and b, (2) update part a 'of a, (3) write another new data segment containing data blocks c and d, (4) finally update part b' of data block b. It can be seen that:
(1) PDN-P overwrites the data block for data updating, and simultaneously verifies and overwrites the verification blocks of the verification nodes 1 and 2;
(2) The PDN-PDS overwrites the data update on the data block, using a single storage pool behind the check block of check node 1 to deposit all data delta logs (Δ a + Δ b).
With respect to PDN-P, the PDN-PDS of the present invention uses only one storage pool to store numbers
The data difference logs are placed together, so that the updating efficiency of the check block is improved, and the reading is reduced
Multiple disk seek during updating is performed, storage space of check nodes is saved, and meanwhile, the seek needs to be performed
Certain measures are taken to further ensure the reliability of the data.
The indexes considered in the five aspects are as follows:
(1) Calculation amount: updating a data block, updating a check block and managing a storage pool;
(2) Transmission bandwidth: the network transmits the data block difference quantity;
(3) Disk I/O: data block disk read/write and check block disk read/write;
(4) Storage space: difference logs of data nodes and difference logs of check nodes;
(5) Data reliability: fault tolerance of data.
The three levels "high", "medium", and "low" are used in table 3 below to qualitatively compare the various aspects of update scheme complexity. The advantage of using a single storage pool for the PDN-PDS is to reduce the amount of computation, further reduce the transmission bandwidth and disk I/O, and use less disk space, but there is a certain cost that the data reliability becomes "high", which means that the data reliability of the PDN-PDS will be slightly reduced, that is, in case that (it can be set) one updated data block node and the first check block node fail at the same time within a certain period of time, the latest updated data will be lost. However, a single node failure accounts for 99.75% of all fault fixes, such missing update data is rare, and the time period for check update fusion can be reduced appropriately to further reduce this possibility.
TABLE 3 update complexity contrast for improved schemes
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (3)

1. A differential log erasure code updating method for a single storage pool, comprising the steps of:
s1, receiving an erasure code data updating request of a client by a main object storage device, and determining an updated data block D according to a data positioning method i And the corresponding updated offset position o, which in turn will update the offset position o and the updated dataIs sent to D i The node of (2); i belongs to {1,2, \8230;, k }, and i represents the number of data blocks;
S2、D i the node of (2) reads the data block D from the local disk i And according to D 1 The transmitted update information calculates updated dataAnd original data D i Delta at data block position oNamely thatThen, the data block D is subjected to a strategy of checking partial logs i Making overwrite updates, i.e. Representing an area other than the update portion; finally, the offset position o and the data block update delta are updatedTo the first parity chunk P 1 I.e. only in the check block P 1 A single storage pool is divided on the nodes to store data difference logs; the date of data difference storageThe method is realized by adopting a self-adaptive management algorithm of a single storage pool so as to dynamically adjust and predict the size of the storage space of each updated check block in the storage pool;
s3, when verification updating fusion is needed, P 1 Updating all data of a storage pool with a deltaTo other check blocks P 2 ,P 3 ,…,P m M represents the number of check blocks, and all the check blocks are fused and updated to obtainj∈{1,2,…,m},P j ' (o) and P j (o) indicates new parity chunks P, respectively j ' and old parity Block P j Word at update offset position o, C ji To generate coefficients of the check data.
2. The method of claim 1, wherein the adaptive management algorithm for the single storage pool in step S2 comprises the steps of:
first, a default initial storage space size is set, which is large enough to hold all the check difference logs;
then, the contraction and fusion operations are performed periodically on each node:
the frequency of contraction operation is lower than that of fusion operation and is set as a condition Cond0, a check block set with a check part log on one node is represented as S, and r is used for each time interval t and each check block p belonging to S t (p) denotes the size of the memory space, u t (p) represents the usage of storage space, at the end of each time segment t, u is measured using an exponential weighted smoothed mean in obtaining the usage of storage space t (p):
Where use (p) denotes the amount of memory space that has been used in this time segment, r t (p) is the current storage space size of the check block p, and α is a smoothing parameter;
depending on whether the utilization reaches 90% and whether the condition Cond0 is met, an unnecessary memory size c (p) is determined, which is the size of the reclaimed memory calculated in calculating the shrink space for the corresponding parity block p, and then the unused space c (p) is shrunk and rounded down to a multiple of the data block size:
ChunkSize indicates the size of the allocated storage data block, so that the storage space of the check block p in the next time period t +1 is:
r t+1 (p)=r t (p)-c(p)
if only one data block size of storage space is left before the contraction, the contraction is not carried out any more;
when the unified storage pool space utilization rate of one check block reaches 90%, performing fusion operation to fuse all the check difference amount logs and the check blocks, recovering all the storage space, namely the check block does not have corresponding storage space, and then generating a new check difference amount log, and reallocating the storage space.
3. The method according to claim 1 or 2, wherein in step S1, the data location method is a replication run method based on extensible hashing.
CN201610710868.XA 2016-08-23 2016-08-23 Differential log type erasure code updating method for single storage pool Active CN107766170B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610710868.XA CN107766170B (en) 2016-08-23 2016-08-23 Differential log type erasure code updating method for single storage pool

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610710868.XA CN107766170B (en) 2016-08-23 2016-08-23 Differential log type erasure code updating method for single storage pool

Publications (2)

Publication Number Publication Date
CN107766170A true CN107766170A (en) 2018-03-06
CN107766170B CN107766170B (en) 2021-04-09

Family

ID=61264215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610710868.XA Active CN107766170B (en) 2016-08-23 2016-08-23 Differential log type erasure code updating method for single storage pool

Country Status (1)

Country Link
CN (1) CN107766170B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109491835A (en) * 2018-10-25 2019-03-19 哈尔滨工程大学 A kind of data fault tolerance method based on Dynamic Packet code
CN110262922A (en) * 2019-05-15 2019-09-20 中国科学院计算技术研究所 Correcting and eleting codes update method and system based on copy data log
CN111522825A (en) * 2020-04-09 2020-08-11 陈尚汉 Efficient information updating method and system based on check information block shared cache mechanism
WO2023082556A1 (en) * 2021-11-09 2023-05-19 华中科技大学 Memory key value erasure code-oriented hybrid data update method, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110208995A1 (en) * 2010-02-22 2011-08-25 International Business Machines Corporation Read-modify-write protocol for maintaining parity coherency in a write-back distributed redundancy data storage system
CN102681793A (en) * 2012-04-16 2012-09-19 华中科技大学 Local data updating method based on erasure code cluster storage system
US9063910B1 (en) * 2011-11-15 2015-06-23 Emc Corporation Data recovery after triple disk failure
CN105359108A (en) * 2013-08-05 2016-02-24 英特尔公司 Storage systems with adaptive erasure code generation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110208995A1 (en) * 2010-02-22 2011-08-25 International Business Machines Corporation Read-modify-write protocol for maintaining parity coherency in a write-back distributed redundancy data storage system
US9063910B1 (en) * 2011-11-15 2015-06-23 Emc Corporation Data recovery after triple disk failure
CN102681793A (en) * 2012-04-16 2012-09-19 华中科技大学 Local data updating method based on erasure code cluster storage system
CN105359108A (en) * 2013-08-05 2016-02-24 英特尔公司 Storage systems with adaptive erasure code generation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FENGHAO ZHANG 等: ""Two Efficient Partial-Updating Schemes for Erasure-Coded Storage Clusters"", 《2012 IEEE SEVENTH INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE, AND STORAGE》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109491835A (en) * 2018-10-25 2019-03-19 哈尔滨工程大学 A kind of data fault tolerance method based on Dynamic Packet code
CN109491835B (en) * 2018-10-25 2022-04-12 哈尔滨工程大学 Data fault-tolerant method based on dynamic block code
CN110262922A (en) * 2019-05-15 2019-09-20 中国科学院计算技术研究所 Correcting and eleting codes update method and system based on copy data log
CN111522825A (en) * 2020-04-09 2020-08-11 陈尚汉 Efficient information updating method and system based on check information block shared cache mechanism
WO2023082556A1 (en) * 2021-11-09 2023-05-19 华中科技大学 Memory key value erasure code-oriented hybrid data update method, and storage medium

Also Published As

Publication number Publication date
CN107766170B (en) 2021-04-09

Similar Documents

Publication Publication Date Title
US11150986B2 (en) Efficient compaction on log-structured distributed file system using erasure coding for resource consumption reduction
US10613933B2 (en) System and method for providing thin-provisioned block storage with multiple data protection classes
CN107766170B (en) Differential log type erasure code updating method for single storage pool
US9916478B2 (en) Data protection enhancement using free space
CN110442535B (en) Method and system for improving reliability of distributed solid-state disk key value cache system
US11422703B2 (en) Data updating technology
CN114415976B (en) Distributed data storage system and method
KR20150061258A (en) Operating System and Method for Parity chunk update processing in distributed Redundant Array of Inexpensive Disks system
CN103544202A (en) Method and system used for arranging data processing
US20150012493A1 (en) Reducing latency and cost in resilient cloud file systems
CN108062419B (en) File storage method, electronic equipment, system and medium
CN112835743B (en) Distributed account book data storage optimization method and device, electronic equipment and medium
CN109445681B (en) Data storage method, device and storage system
Shen et al. Cross-rack-aware updates in erasure-coded data centers
US11886705B2 (en) System and method for using free space to improve erasure code locality
CN110018783A (en) A kind of date storage method, apparatus and system
CN109582213A (en) Data reconstruction method and device, data-storage system
JP7355616B2 (en) Distributed storage systems and how to update parity in distributed storage systems
CN113377569A (en) Method, apparatus and computer program product for recovering data
WO2023197937A1 (en) Data processing method and apparatus, storage medium, and computer program product
CN107329699A (en) One kind, which is entangled, deletes rewrite method and system
WO2023082556A1 (en) Memory key value erasure code-oriented hybrid data update method, and storage medium
CN113391945A (en) Method, electronic device and computer program product for storage management
US11561859B2 (en) Method, device and computer program product for managing data
CN108174136B (en) Cloud disk video coding storage method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant