CN107766170A

CN107766170A - The Journaled correcting and eleting codes update method of residual quantity of single storage pool

Info

Publication number: CN107766170A
Application number: CN201610710868.XA
Authority: CN
Inventors: 陈付; 陕振; 张淑萍
Original assignee: Beijing Institute of Computer Technology and Applications
Current assignee: Beijing Institute of Computer Technology and Applications
Priority date: 2016-08-23
Filing date: 2016-08-23
Publication date: 2018-03-06
Anticipated expiration: 2036-08-23
Also published as: CN107766170B

Abstract

The present invention relates to a kind of Journaled correcting and eleting codes update method of the residual quantity of single storage pool, belong to field of computer technology.The present invention will update to data block all standing and the thinking of the check part daily record of check block progress residual quantity daily record be combined with PDN P update modes, propose a kind of new data refresh mode PDN PDS, relative to PDN P, PDN PDS are only put together data residual quantity daily record using a storage pool, both the renewal efficiency for having improved check block decreases disk tracking when reading multiple renewals, saves the memory space of check-node.

Description

Differential log type erasure code updating method for single storage pool

Technical Field

The invention relates to the technical field of computers, in particular to a differential log type erasure code updating method for a single storage pool.

Background

Erasure coding techniques have been widely applied to data storage systems to achieve high fault tolerance, but there is a great burden on data update performance, especially for distributed block storage systems in which data update operations are frequent. Here, the update principle of CRS erasure codes and four conventional common update methods are introduced first: two typical erasure code update modes (DUM and PUM) and two partial update modes (PUM-P and PDN-P).

Erasure code data update complexity refers to the average number of parity blocks affected by modifying, updating, or overwriting a block of data. For example, for CRS (6 +3,3), each data block is protected by m =3 parity blocks, so the optimal update complexity is 3. The update complexity can significantly affect the update performance of erasure coding systems, especially small block updates. When the storage cluster employs erasure codes, the update burden can be complex because the update process includes disk I/O, transmission bandwidth, and CPU computations. Assuming that the storage system uses CRS (k + m, m) coding, a data segment is divided into k data blocks, and then m parity blocks are generated by CRS coding. The manner of generating the check block can be illustrated with fig. 1. Wherein, the check block

In the step D _j Is updated to D _j ' when, data block D needs to be updated _j And all check blocks P ₁ 、P ₂ 、…、P _m Are respectively updated to D _j ′、P ₁ ′、P ₂ ′、…、P _m ', while other data blocks are not modified.

Four common data update methods are described below:

(1) DUM (Data blocks by a specific Updating Manager) method.

Fig. 2 shows the data flow of the DUM method, and the update process of the DUM is identical to the data encoding process, i.e. the update manager node reads the original data blocks in this stripe that are not updated, and regenerates all new parity blocks together with the updated data blocks by means of the cauchy generator matrix. Suppose data block D ₁ Will be updated to D ₁ ', the update manager reads the divide-by-D in this stripe over the network ₁ All other data blocks except for the new data block D, and use the new data block D ₁ ' to re-encode. M new check blocks P generated later ₁ ‘、P ₂ ‘、…、P _m ' and New data Block D ₁ ' parallel transmission to respective m check nodes and D ₁ The data node of (2). In DUM, to update a block of data,it is necessary to read k-1 data blocks from the disks of k-1 data nodes, write m check blocks to the disks of m check nodes and write a new data block to one data node, and transmit m + k blocks through the network. The involved operations in the DUM are:

(2) The PUM (Parity blocks by a specific Updating Manager) method.

For small block updates, a natural idea is to regenerate new parity blocks by parity blocks and data blocks to be updated, which saves disk I/O and network transmission bandwidth over DUM. The theoretical formula for the PUM is derived below, where a data block D _x Is modified into D _x ', then each new parity chunk is:

fig. 3 shows the data flow of PUM: to update data block D ₁ For example, the update manager reads all parity blocks P in a stripe ₁ 、P ₂ 、…、P _m And data block D ₁ (ii) a Then, a new parity chunk P is calculated by the above equation (7) ₁ ‘、P ₂ ‘、…、P _m '; finally, all the updated check blocks and new data blocks D are processed ₁ ' to the respective check node and data node. In contrast to the DUM, the PUM reads only the m parity chunks to be updated instead of the k-1 data chunks. When m is&At k-2, both the disk I/O and the network transmission bandwidth of the PUM are less than those of the DUM; in addition, P in PUM _i The calculation of' includes only one subtraction, one multiplication and one addition. To update a block, the PUM needs to update the block from m check nodes and oneM +1 blocks are read by a number of data nodes, and m +1 blocks are written into a local hard disk, while 2m +2 blocks need to be transmitted over the network.

(3) PUM-P (Party blocks by an Updating Manager and the Party Nodes) method.

In the PUM, the update manager performs all computations and updates the new parity chunks P _i ' to the check node. FIG. 4 shows the data flow of PUM-P: in PUM-P, the update manager calculates P _i ^* ＝C _i1 ×(D ₁ ′-D ₁ ) I ∈ {1,2, \8230;, m }, and then P _i ^* Respectively transmitting the data to m check nodes via network, and transmitting the check blocks P from local hard disk _i Directly reading into memory without considering network transmission load, and finally performing an addition operation on the check nodes to obtain a new check block P _i ‘＝P _i ^* +P _i . To update a block of data, PUM-P needs to read m +1 blocks from the local hard disk, write m +1 blocks to the local hard disk, and transmit m +2 blocks over the network. In contrast to PUM, PUM-P only reads one data block over the network because the check nodes can read the check block and perform an addition operation to compute a new check block. Thus, the overall I/O burden of PUM-P is lower than that of PUM.

FIG. 5 illustrates the data flow for PDN-P: based on the PUM-P, PDN-P will calculate P _i ^* ＝C _i1 ×(D ₁ ′-D ₁ ) Transfer from update manager to update data Block D ₁ Thereby saving transmission D over the network ₁ The burden of (2). Provisional calculation result P _i ' sent directly to the corresponding check node. To update a block of data, the PDN-P needs to read m +1 blocks from the local hard disk, write m +1 blocks to the local hard disk, and transmit m +1 blocks over the network. That is, the updated network traffic of the PDN-P is lower than that of the PUM-P.

Formula (7) describes the linear coding property of the erasure code, and three improvements of PUM, PUM-P and PDN-P are proposed on the basis of the DUM method by using the linear property, so that the operation of taking and summing all data blocks is avoided, and a new parity block is calculated based on the change of the data blocks. Although the above improvements reduce the amount of computation, the network transmission bandwidth and the pressure of disk I/O are still large, and the data updating efficiency needs to be further improved.

For CRS coding, formula P _i ′＝C _ix ×(D _x ′-D _x )+P _i I e {1,2, \8230;, m } can be further generalized to a partial update in a data block. Suppose now old data block D _i One word at offset o is updated, the old parity block P accordingly _i The word at offset o needs to be updated. Can be expressed as:

P _i ′(o)＝P _i (o)+C _ix (D _x ′(o)-D _x (o)) (8)

here, C _ix To generate coefficients of the check data, P _i ' (o) and P _i (o) indicates new check blocks P, respectively _i ' and old parity Block P _i A word at offset o; d _x ' (o) and D _x (o) represent new data blocks D, respectively _x ' and old data block D _x The word at offset o. That is, the delta of the block update is multiplied by a coefficient C _ix The delta of parity block updates is obtained.

Disclosure of Invention

Technical problem to be solved

The technical problem to be solved by the invention is as follows: how to design an erasure code data updating method to improve the updating efficiency of the check block, reduce the disk seek when reading a plurality of updates and save the storage space of the check node.

(II) technical scheme

In order to solve the above technical problem, the present invention provides a delta log erasure code updating method for a single storage pool, which comprises the following steps:

s1, receiving an erasure code data updating request of a client by a main object storage device, and determining an updated data block D according to a data positioning method _i And correspondingly updates the offset position o, which in turn will update the offset position o andnew dataIs sent to D _i A node of (2); i belongs to {1,2, \8230;, k }, i represents the number of data blocks;

S2、D _i the node of (2) reads the data block D from the local disk _i And according to D ₁ The transmitted update information calculates updated dataAnd the original data D _i Delta at data block position oNamely thatThen, the strategy of checking partial log is adopted to carry out on the data block D _i Making overwrite updates, i.e. Representing an area other than the update portion; finally, the offset position o and the data block update delta are updatedTo the first parity chunk P ₁ I.e. only in the check block P ₁ A single storage pool is divided on the nodes to store data difference logs; the method for storing the data difference log is realized by adopting a self-adaptive management algorithm of a single storage pool so as to dynamically adjust and predict the size of a storage space of each updated check block in the storage pool;

s3, when verification updating fusion is needed, P ₁ Updating all data of a storage pool with a deltaIs sent to other check blocksP ₂ ，P ₃ ，…，P _m M represents the number of check blocks, and all the check blocks are fused and updated to obtainj∈{1,2,…,m}，P _j ' (o) and P _j (o) indicates new check blocks P, respectively _j ' and old parity Block P _j Word at update offset position o, C _ji To generate coefficients of the check data.

Preferably, the adaptive management algorithm for the single storage pool in step S2 includes the following steps:

first, a default initial storage space size is set, which is large enough to hold all the check difference logs;

then, the contraction and fusion operations are performed periodically on each node:

the frequency of contraction operation is lower than that of fusion operation and is set as a condition Cond0, a check block set with a check part log on one node is represented as S, and r is used for each time interval t and each check block p belonging to S _t (p) denotes the size of the memory space, u _t (p) represents the usage of the storage space, and at the end of each time interval t, u is measured using an exponential weighted smoothed mean in obtaining the usage of the storage space _t (p)：

Where use (p) denotes the amount of memory space that has been used in this time segment, r _t (p) is the current storage space size of the check block p, and α is a smoothing parameter;

depending on whether the utilization reaches 90% and the condition Cond0 is met, an unnecessary memory size c (p) is determined, which is the size of the reclaimed memory of the corresponding parity block p calculated in the calculation of the shrinking space, and then the unused space c (p) is shrunk and rounded down to a multiple of the data block size:

ChunkSize indicates the size of the allocated storage data block, so that the storage space size of the check block p in the next time period t +1 is:

r _t+1 (p)＝r _t (p)-c(p)

if only one data block size of storage space is left before the contraction, the contraction will not be performed;

when the usage rate of the unified storage pool space of one check block reaches 90%, performing fusion operation to fuse all the check difference logs and the check blocks, recovering all the storage space, namely the check block does not have corresponding storage space, and then reallocating the storage space when a new check difference log is generated.

Preferably, in step S1, the data location method is a replication run method based on extensible hash.

(III) advantageous effects

The invention combines the idea of carrying out full coverage updating on the data block and carrying out verification on the differential log of the verification block with a PDN-P updating mode, and provides a new data updating mode PDN-PDS.

Drawings

Fig. 1 is a diagram of check block generation for CRS encoding;

FIG. 2 is a diagram of a DUM update method;

FIG. 3 is a diagram of a PUM update method;

FIG. 4 is a diagram of a PUM-P update method;

figure 5 is a diagram of a PDN-P update method;

fig. 6 is a diagram of a single storage pool PDN-PDS method of an embodiment of the present invention;

fig. 7 is a comparative example between the conventional updating method and the updating method according to the embodiment of the present invention.

Detailed Description

In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.

With the linear coding property of the erasure code (equation (8)), a delta approach can be employed for data updating, eliminating redundant network traffic by transmitting only data deltas of the same size as the modified data range. It first determines the modification range of the data block and then calculates for this data block the difference, which is the amount of change between the old and new data within the modification range of the data block, multiplied by the corresponding coefficient to obtain the check difference. The range of modified data and the calculated data delta and check delta are then sent to the corresponding data node and all other check nodes, respectively, for updating. The transmission of complete data blocks and check blocks is not required, and only the range of modified data, the data difference and the check difference are required to be transmitted, so that the network transmission bandwidth is reduced, and the method is very suitable for a cluster storage environment.

TABLE 1 Log update Classification

	Data delta not journaling	Data delta log
			Checking for difference not logs	All over-write	Data partial logs
Checking difference logs	Verifying partial logs	Full journal

In addition, a classification can be made according to whether the data delta and the check delta are logged (see table 1). The log has the advantages that the updating efficiency is improved, namely, the data block or the check block is not required to be read to perform fusion operation with the differential quantity; the advantage of not logging (direct overwrite) is that reading a new block of data or a new check block after an update operation is faster and does not require fusing the log with the original data to obtain the new data. The method for checking partial logs is very suitable for the characteristics of more data block reading and less updating and less check block reading and more updating in erasure code updating. Therefore, the invention will improve PDN-P using the idea of checking partial logs.

Specifically, a new data updating method PDN-PDS (PDN-P with Delta logging and Single Storage Pool) is provided by combining the idea of performing full coverage updating on a data block and performing verification partial log of differential log on a verification block with a PDN-P updating method. Fig. 6 depicts the data flow of the PDN-PDS and details the update procedure.

S1, main OSD (D) ₁ ) (Object Storage Device) receives erasure code data updating request of client, and determines updated data block D according to data positioning method RUSH (Replication based on extensible hash) _i And the corresponding update offset position o, followed by the update position information o and the updated dataIs sent to D _i A node;

S2、D _i reading data block D from local disk by node _i And according to D ₁ The transmitted update information calculates updated dataAnd the original data D _i Delta at data block location oNamely, it isThe data block is then overwritten and updated using a policy that checks a portion of the journal, i.e.(Indicating areas other than the update portion. ) Finally, the offset position o and the data block update delta are updatedNode P sent to check block ₁ I.e. only in the check block P ₁ The single storage pool is divided on the nodes to store the data difference log, and the storage pool adopts an efficient data difference log storage mode, namely an adaptive management algorithm (see the following table) of the single storage pool.

S3, when verification updating fusion is needed, P ₁ Updating all data delta of storage poolIs sent to P ₂ ，P ₃ ，…，P _m And performing fusion updating on all check blocksj belongs to {1,2, \8230;, m }, m represents the number of check blocks, P _j ' (o) and P _j (o) indicates new parity chunks P, respectively _j ' and old parity Block P _j Word at update offset position o, C _ji To generate coefficients of the check data.

PDN-PDS in a single storage pool, i.e. at the first check block node P ₁ Using a single storage pool to store and manage all delta data logs corresponding to this parity chunkWhile the other m-1 parity chunks do not need to perform storage management of the parity delta log (here, P) ₁ Is general in that the ECG packets are spread, i.e., each node may be P at the same time ₁ And P _j J ∈ {2,3, \8230;, m }). However, using only one storage pool may reduce the reliability of parity data, i.e., when P is ₁ Node and data block D _i When the nodes fail at the same time, the previous data update is completely disabled. Thus, a further measure can be taken to narrow the time window for the check block node to perform data delta update log fusion, and can assume that a data block D is within a suitable time window (e.g., 15 minutes or 30 minutes) _i Node and a check block P ₁ It is rare that nodes fail simultaneously. Therefore, the PDN-PDS with a single storage pool can save storage space and improve updating efficiency on the premise of less influence on data reliability. Wherein, the check block P ₁ Data delta log generated by corresponding multiple data block updatingA free area stored on the check node. However, this method easily causes fragmentation of the log storage space and waste of space due to voids after garbage collection, and when the check block and the check difference log are updated in a fusion manner, the check difference log placed in a dispersed manner may bring a large disk seek burden, thereby resulting in low reading efficiency of a new check block. Therefore, an efficient data differential log storage method, namely an adaptive management algorithm of a single storage pool, needs to be designed.

Adaptive management algorithm for single storage pool:

in order to save the storage space and improve the storage efficiency of the storage pool, the invention designs a working load-aware storage pool adaptive management scheme, which can dynamically adjust and predict the size of the storage space of each updated check block in the storage pool. This scheme has three main components: (1) Predicting a storage space size of each check block in a next time segment using the measured workload pattern; (2) Shrinking the storage space and releasing the unused space back to the storage pool; (3) And fusing the check difference quantity in the storage space with the corresponding check block. In order to avoid small unusable holes in the storage space recycled by the shrink operation, the size of the storage space and the shrink size need to be set to be multiples of the size of the parity chunks, which ensures that the whole data chunk or parity chunk can be placed in the recycled storage space.

TABLE 2 adaptive management algorithm for single storage pool

The algorithm of Table 2 describes the basic flow of an adaptive management scheme for a unified storage pool:

first, a default initial storage size (2 or 4 times the data block size) is set that is large enough to hold all of the check delta logs.

the condition Cond0 is added to make the frequency of the contraction operation lower than that of the fusion operation, and the purpose of adding this condition is to avoid excessive fusion operation caused by too small storage space due to frequent contraction operation, which would cause all the checking differential logs of the checking blockThe operation of merging and updating is relatively expensive. The check block set with check part logs on one node is represented as S, and r is used for each time interval t and each check block p epsilon S _t (p) represents the size of the storage space, u _t (p) represents the usage of the storage space. Intuitively, u _t (p) represents the percentage of used storage space. At the end of each time segment t, u is measured using an exponential-weight smoothed mean in obtaining storage pool utilization getUtility _t (p)：

Where use (p) denotes the amount of memory space that has been used in this time segment, r _t (p) is the current storage size of the parity block p, and α is the smoothing parameter (commonly 0.3).

According to whether the utilization reaches 90% or not and whether the condition Cond0 is satisfied or not, an unnecessary storage space size c (p) is then determined, which is the size of the reclaimed storage space of the corresponding check block p calculated in calculating the contracted space computeShrinkSize. Then, the unused space c (p) is aggressively shrunk and rounded down to a multiple of the chunk size:

ChunkSize indicates the allocated storage data block size. The doShrink function of row 8 of table 2 will attempt to recover from the existing memory size r _t The space of c (p) size is contracted in (p). Thus, the storage space size of the parity block p in the next time period t +1 is:

r _t+1 (p)＝r _t (p)-c(p) (11)

if only one block size of storage space is left before puncturing, no puncturing will be performed.

When the unified storage pool space utilization rate of one check block reaches 90%, performing a dowerge fusion operation to fuse all the check difference amount logs with the check block, recovering all the storage spaces, namely the check block does not have a corresponding storage space, and then redistributing the storage spaces when a new check difference amount log is generated. Here domarge is not on the update path, so that the I/O access performance of the system is only affected to a limited extent. In addition, the domarge operation may also be triggered when a check block needs to be read (degraded case or failover). The triggering frequency of the domarge operation is not high in general, because of two reasons: 1) The initial space of each update check block is large, and the common updates are very small, so that the use rate of 90% cannot be reached quickly; 2) A degraded condition or a fault recovery triggers a domarge operation, but both of these fault occurrences are less likely to occur relative to normal reads and writes.

The PDN-PDS of a single storage pool is provided by combining PDN-P and a verification partial log updating idea based on dispersion, and an EC (4, 2) coding example is used.

The incoming data stream in fig. 7 describes the sequence of operations: (1) write the first data segment containing data blocks a and b, (2) update part a 'of a, (3) write another new data segment containing data blocks c and d, (4) finally update part b' of data block b. It can be seen that:

(1) PDN-P overwrites the data block for data updating, and simultaneously verifies and overwrites the verification blocks of the verification nodes 1 and 2;

(2) The PDN-PDS overwrites the data update on the data block, using a single storage pool behind the check block of check node 1 to deposit all data delta logs (Δ a + Δ b).

With respect to PDN-P, the PDN-PDS of the present invention uses only one storage pool to store numbers

The data difference logs are placed together, so that the updating efficiency of the check block is improved, and the reading is reduced

Multiple disk seek during updating is performed, storage space of check nodes is saved, and meanwhile, the seek needs to be performed

Certain measures are taken to further ensure the reliability of the data.

The indexes considered in the five aspects are as follows:

(1) Calculation amount: updating a data block, updating a check block and managing a storage pool;

(2) Transmission bandwidth: the network transmits the data block difference quantity;

(3) Disk I/O: data block disk read/write and check block disk read/write;

(4) Storage space: difference logs of data nodes and difference logs of check nodes;

(5) Data reliability: fault tolerance of data.

The three levels "high", "medium", and "low" are used in table 3 below to qualitatively compare the various aspects of update scheme complexity. The advantage of using a single storage pool for the PDN-PDS is to reduce the amount of computation, further reduce the transmission bandwidth and disk I/O, and use less disk space, but there is a certain cost that the data reliability becomes "high", which means that the data reliability of the PDN-PDS will be slightly reduced, that is, in case that (it can be set) one updated data block node and the first check block node fail at the same time within a certain period of time, the latest updated data will be lost. However, a single node failure accounts for 99.75% of all fault fixes, such missing update data is rare, and the time period for check update fusion can be reduced appropriately to further reduce this possibility.

TABLE 3 update complexity contrast for improved schemes

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A differential log erasure code updating method for a single storage pool, comprising the steps of:

s1, receiving an erasure code data updating request of a client by a main object storage device, and determining an updated data block D according to a data positioning method _i And the corresponding updated offset position o, which in turn will update the offset position o and the updated dataIs sent to D _i The node of (2); i belongs to {1,2, \8230;, k }, and i represents the number of data blocks;

S2、D _i the node of (2) reads the data block D from the local disk _i And according to D ₁ The transmitted update information calculates updated dataAnd original data D _i Delta at data block position oNamely thatThen, the data block D is subjected to a strategy of checking partial logs _i Making overwrite updates, i.e. Representing an area other than the update portion; finally, the offset position o and the data block update delta are updatedTo the first parity chunk P ₁ I.e. only in the check block P ₁ A single storage pool is divided on the nodes to store data difference logs; the date of data difference storageThe method is realized by adopting a self-adaptive management algorithm of a single storage pool so as to dynamically adjust and predict the size of the storage space of each updated check block in the storage pool;

s3, when verification updating fusion is needed, P ₁ Updating all data of a storage pool with a deltaTo other check blocks P ₂ ，P ₃ ，…，P _m M represents the number of check blocks, and all the check blocks are fused and updated to obtainj∈{1,2,…,m}，P _j ' (o) and P _j (o) indicates new parity chunks P, respectively _j ' and old parity Block P _j Word at update offset position o, C _ji To generate coefficients of the check data.

2. The method of claim 1, wherein the adaptive management algorithm for the single storage pool in step S2 comprises the steps of:

the frequency of contraction operation is lower than that of fusion operation and is set as a condition Cond0, a check block set with a check part log on one node is represented as S, and r is used for each time interval t and each check block p belonging to S _t (p) denotes the size of the memory space, u _t (p) represents the usage of storage space, at the end of each time segment t, u is measured using an exponential weighted smoothed mean in obtaining the usage of storage space _t (p）：

depending on whether the utilization reaches 90% and whether the condition Cond0 is met, an unnecessary memory size c (p) is determined, which is the size of the reclaimed memory calculated in calculating the shrink space for the corresponding parity block p, and then the unused space c (p) is shrunk and rounded down to a multiple of the data block size:

ChunkSize indicates the size of the allocated storage data block, so that the storage space of the check block p in the next time period t +1 is:

r _t+1 (p)＝r _t (p)-c(p)

if only one data block size of storage space is left before the contraction, the contraction is not carried out any more;

when the unified storage pool space utilization rate of one check block reaches 90%, performing fusion operation to fuse all the check difference amount logs and the check blocks, recovering all the storage space, namely the check block does not have corresponding storage space, and then generating a new check difference amount log, and reallocating the storage space.

3. The method according to claim 1 or 2, wherein in step S1, the data location method is a replication run method based on extensible hashing.