CN107766170B

CN107766170B - Differential log type erasure code updating method for single storage pool

Info

Publication number: CN107766170B
Application number: CN201610710868.XA
Authority: CN
Inventors: 陈付; 陕振; 张淑萍
Original assignee: Beijing Institute of Computer Technology and Applications
Current assignee: Beijing Institute of Computer Technology and Applications
Priority date: 2016-08-23
Filing date: 2016-08-23
Publication date: 2021-04-09
Anticipated expiration: 2036-08-23
Also published as: CN107766170A

Abstract

The invention relates to a differential log type erasure code updating method for a single storage pool, and belongs to the technical field of computers. The invention combines the idea of carrying out full coverage updating on the data block and carrying out verification on the differential log of the verification block with a PDN-P updating mode, and provides a new data updating mode PDN-PDS.

Description

Differential log type erasure code updating method for single storage pool

Technical Field

The invention relates to the technical field of computers, in particular to a differential log type erasure code updating method for a single storage pool.

Background

Erasure coding techniques have been widely applied to data storage systems to achieve high fault tolerance, but there is a great burden on data update performance, especially for distributed block storage systems in which data update operations are frequent. Here, the update principle of CRS erasure codes and four conventional common update methods are introduced first: two typical erasure code update modes (DUM and PUM) and two partial update modes (PUM-P and PDN-P).

Erasure code data update complexity refers to the average number of parity blocks affected by modifying, updating, or overwriting a block of data. For example, for CRS (6+3, 3), each data block is protected by 3 parity blocks, so the optimal update complexity is 3. The update complexity can significantly affect the update performance of erasure coding systems, especially small block updates. When the storage cluster employs erasure codes, the update burden can be complex because the update process includes disk I/O, transmission bandwidth, and CPU computations. Assuming that the storage system uses CRS (k + m, m) coding, a data segment is divided into k data blocks, and then m parity blocks are generated by CRS coding. The manner of generating the check block can be described with reference to fig. 1. Wherein, the check block

In the process of_jIs updated to D_j' when, the data block needs to be updatedD_jAnd all check blocks P₁、P₂、…、P_mAre respectively updated to D_j′、P₁′、P₂′、…、P_m', while other data blocks are not modified.

Four common data update methods are described below:

(1) DUM (data blocks by a specific Updating manager) method.

Fig. 2 shows the data flow of the DUM method, and the update process of the DUM is identical to the data encoding process, i.e. the update manager node reads the original data blocks in this stripe that are not updated, and regenerates all new parity blocks together with the updated data blocks by means of the cauchy generator matrix. Suppose data block D₁Will be updated to D₁' the update manager reads the division D in this stripe through the network₁All other data blocks except for the new data block D, and use the new data block D₁' to re-encode. M new check blocks P generated later₁‘、P₂‘、…、P_m' and new data Block D₁' parallel transmission to respective m check nodes and D₁The data node of (1). In the DUM, in order to update one data block, k-1 data blocks need to be read from the disks of k-1 data nodes, m check blocks need to be written into the disks of m check nodes, a new data block needs to be written into one data node, and m + k blocks need to be transmitted through a network. Among these, the involved operations in DUM are:

(2) PUM (parity blocks by a specific Updating manager) method.

For small block updates, a natural idea is to regenerate new parity blocks by parity blocks and data blocks to be updated, which saves disk I/O and network transmission bandwidth over DUM. The theoretical formula of the PUM is derived below, in which a data block D_xIs modified into D_x', then each new parity chunk is:

fig. 3 shows the data flow of the PUM: to update the data block D₁For example, the update manager reads all parity blocks P in a stripe₁、P₂、…、P_mAnd a data block D₁(ii) a A new parity chunk P is then calculated by equation (7) above₁‘、P₂‘、…、P_m'; finally, all the updated check blocks and new data blocks D are processed₁' to the respective check node and data node. In contrast to the DUM, the PUM reads only the m parity chunks to be updated instead of the k-1 data chunks. When m is<The disk I/O and the network transmission bandwidth of the PUM are less than that of the DUM at k-2; in addition, P in PUM_iThe calculation of' includes only one subtraction, one multiplication and one addition. To update a data block, the PUM needs to read m +1 blocks from m check nodes and a data node, and write m +1 blocks to the local hard disk, while transmitting 2m +2 blocks over the network.

(3) PUM-P (Party blocks by an Updating Manager and the Party nodes) method.

In the PUM, the update manager performs all computations and updates the parity chunks P_i' to the check node. FIG. 4 shows the data flow of PUM-P: in PUM-P, the update manager calculates P_i ^*＝C_i1×(D₁′-D₁) I ∈ {1,2, …, m }, and then P_i ^*Respectively sending the data to corresponding m check nodes through a network, and then sending the check blocks P from a local hard disk_iDirectly reading the data into the memory without considering the network transmission load, and finally, carrying out addition operation on the check nodes to obtain a new check block P_i‘＝P_i ^*+P_i. To update a block of data, the PUM-P needs to read m +1 blocks from the local hard disk and write m +1 blocksGo to the local hard disk and transmit m +2 blocks over the network. In contrast to PUM, PUM-P only reads one data block over the network because the check nodes can read the check block and perform an addition operation to compute a new check block. Thus, the overall I/O burden of PUM-P is lower than that of PUM.

FIG. 5 illustrates the data flow for PDN-P: based on the PUM-P, PDN-P will calculate P_i ^*＝C_i1×(D₁′-D₁) Transfer from update manager to update data Block D₁Thereby saving transmission D over the network₁The burden of (2). Provisional calculation result P_i' sent directly to the corresponding check node. To update a block of data, the PDN-P needs to read m +1 blocks from the local hard disk, write m +1 blocks to the local hard disk, and transmit m +1 blocks over the network. That is, the update network traffic of the PDN-P is lower than that of the PUM-P.

Formula (7) describes the linear coding property of the erasure code, and three improvements of PUM, PUM-P and PDN-P are proposed on the basis of the DUM method by using the linear property, so that the operation of taking and summing all data blocks is avoided, and a new parity block is calculated based on the change of the data blocks. Although the above improvements reduce the amount of calculation, the network transmission bandwidth and disk I/O pressure are still large, and the data updating efficiency needs to be further improved.

For CRS coding, formula P_i′＝C_ix×(D_x′-D_x)+P_iI

e

1,2, …, m can be further generalized to the case of a partial update in a data block. Suppose now old data block D_iOne word at offset o is updated, the old parity block P accordingly_iThe word at offset o needs to be updated. Can be expressed as:

P_i′(o)＝P_i(o)+C_ix(D_x′(o)-D_x(o)) (8) Here, C_ixTo generate coefficients of the check data, P_i' (o) and P_i(o) indicates new parity chunks P, respectively_i' and old parity Block P_iA word at offset o; d_x' (o) and D_x(o) represent new data blocks D, respectively_x' and old data block D_xThe word at offset o. That is, the delta of a block update is multiplied by a coefficient C_ixThe delta of parity block updates is obtained.

Disclosure of Invention

Technical problem to be solved

The technical problem to be solved by the invention is as follows: how to design an erasure code data updating method to improve the updating efficiency of the check block, reduce the disk seek when reading a plurality of updates and save the storage space of the check node.

(II) technical scheme

In order to solve the above technical problem, the present invention provides a delta log erasure code updating method for a single storage pool, which comprises the following steps:

s1, receiving the erasure code data updating request of the client by the main object storage device, and determining the updated data block D according to the data positioning method_iAnd the corresponding updated offset position o, which in turn will update the offset position o and the updated data

Is sent to D_iA node of (2); i belongs to {1,2, …, k }, and i represents the number of data blocks;

S2、D_ithe node of (2) reads the data block D from the local disk_iAnd according to D₁The transmitted update information calculates updated data

And the original data D_iDelta at data block position o

Namely, it is

Then, the strategy of checking partial log is adopted to carry out on the data block D_iMaking overwrite updates, i.e.

Representing an area other than the update portion; finally, the offset position o and the data block update delta are updated

To the first parity chunk P₁I.e. only in the check block P₁A single storage pool is divided on the nodes to store data difference logs; the method for storing the data difference log is realized by adopting a self-adaptive management algorithm of a single storage pool so as to dynamically adjust and predict the size of a storage space of each updated check block in the storage pool;

s3, when verification updating fusion is needed, P₁Updating all data of a storage pool with a delta

To other check blocks P₂，P₃，…，P_mM represents the number of check blocks, and all the check blocks are fused and updated to obtain

j∈{1,2,…,m}，P_j' (o) and P_j(o) indicates new parity chunks P, respectively_j' and old parity Block P_jWord at update offset position o, C_jiTo generate coefficients of the check data.

Preferably, the adaptive management algorithm for the single storage pool in step S2 includes the following steps:

first, a default initial storage space size is set, which is large enough to hold all the check difference logs;

then, the contraction and fusion operations are performed periodically on each node:

wherein the puncturing operation is performed less frequently than the fusing operation and is set to condition Cond0, and the check block set having the check part log on a node representsFor S, for each time interval t, and for each check block p ∈ S, use r_t(p) denotes the size of the memory space, u_t(p) represents the usage of storage space, at the end of each time segment t, u is measured using an exponential weighted smoothed mean in obtaining the usage of storage space_t(p)：

Here use (p) represents the amount of memory space that has been used in this time segment, r_t(p) is the current storage space size of the parity block p, and α is a smoothing parameter;

determining the unnecessary size c (p) of the storage space calculated in the calculation of the contraction space corresponding to the parity block p, according to whether the usage rate reaches 90% and the condition Cond0 is satisfied, then contracting the unused space c (p), and rounding down to a multiple of the size of the data block:

ChunkSize indicates the size of the allocated storage data block, so that the storage space of the check block p in the next time period t +1 is:

r_t+1(p)＝r_t(p)-c(p)

if only one data block size of storage space is left before the contraction, the contraction is not carried out any more;

when the usage rate of the unified storage pool space of one check block reaches 90%, performing fusion operation to fuse all the check difference logs and the check blocks, recovering all the storage space, namely the check block does not have corresponding storage space, and then reallocating the storage space when a new check difference log is generated.

Preferably, in step S1, the data location method is a replication run method based on extensible hash.

(III) advantageous effects

The invention combines the idea of carrying out full coverage updating on the data block and carrying out verification on the differential log of the verification block with a PDN-P updating mode, and provides a new data updating mode PDN-PDS.

Drawings

Fig. 1 is a diagram of check block generation for CRS encoding;

FIG. 2 is a diagram of a DUM update method;

FIG. 3 is a diagram of a PUM update method;

FIG. 4 is a diagram of a PUM-P update method;

figure 5 is a diagram of a PDN-P update method;

fig. 6 is a diagram of a single storage pool PDN-PDS method of an embodiment of the present invention;

fig. 7 is a comparative example between the conventional updating method and the updating method according to the embodiment of the present invention.

Detailed Description

In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.

With the linear coding property of the erasure code (equation (8)), a delta approach can be employed for data updating, eliminating redundant network traffic by transmitting only data deltas of the same size as the modified data range. It first determines the modification range of the data block and then calculates for this data block the difference, which is the amount of change between the old and new data within the modification range of the data block, multiplied by the corresponding coefficient to obtain the check difference. The range of modified data and the calculated data delta and check delta are then sent to the corresponding data node and all other check nodes, respectively, for updating. The transmission of complete data blocks and check blocks is not required, and only the range of modified data, the data difference and the check difference are required to be transmitted, so that the network transmission bandwidth is reduced, and the method is very suitable for a cluster storage environment.

TABLE 1 Log update Classification

	Data delta not journaling	Data delta log
			Checking for difference not logs	All over-write	Data partial logs
Checking difference logs	Verifying partial logs	Full journal

In addition, classification can be made according to whether the data delta and the check delta are logged (see table 1). The log has the advantages that the updating efficiency is improved, namely, the data block or the check block is not required to be read to perform fusion operation with the differential quantity; the benefit of not logging (direct overwrite) is that reading a new block of data or a new check block after an update operation is faster, and there is no need to fuse the log with the original data to obtain the new data. The method for checking partial logs is very suitable for the characteristics of more data block reading and less updating and less check block reading and more updating in erasure code updating. Therefore, the present invention will improve the PDN-P using the idea of checking part of the log.

Specifically, a new data updating method PDN-PDS (PDN-P with Delta logging and Single Storage Pool) is provided by combining the idea of performing full coverage updating on a data block and performing verification partial log of differential log on a verification block with a PDN-P updating method. Fig. 6 depicts the data flow of the PDN-PDS and details the update procedure.

S1 main OSD (D)₁) (Object Storage Device) receives erasure code data update request from client, and determines updated data block D according to data positioning method RUSH (extensible hash based Replication)_iAnd the corresponding update offset position o, followed by the update position information o and the updated data

Is sent to D_iA node;

S2、D_ireading data block D from local disk by node_iAnd according to D₁The transmitted update information calculates updated data

And the original data D_iDelta at data block position o

Namely, it is

Then, the data block is subjected to overwriting updating by adopting a strategy of checking partial logs, namely

(

Indicating an area other than the update portion. ) Finally, the offset position o and the data block update delta are updated

Node P sent to check block₁I.e. only in the check block P₁The single storage pool is divided on the nodes to store the data difference log, and the storage pool adopts an efficient data difference log storage mode, namely an adaptive management algorithm (see the following table) of the single storage pool.

S3, when verification updating fusion is needed, P₁Updating all data delta of storage pool

Is sent to P₂，P₃，…，P_mAnd performing fusion updating on all check blocks

j belongs to {1,2, …, m }, m represents the number of check blocks, P_j' (o) and P_j(o) indicates new parity chunks P, respectively_j' and old parity Block P_jWord at update offset position o, C_jiTo generate coefficients of the check data.

PDN-PDS in a single storage pool, i.e. at the first check block node P₁Using a single storage pool to store and manage all delta data logs corresponding to this parity chunk

While the other m-1 parity chunks do not need to perform storage management of the parity delta log (here, P)₁Is general in that the ECG packets are spread, i.e., each node may be P at the same time₁And P_jJ ∈ {2,3, …, m }). However, using only one storage pool may reduce the reliability of parity data, i.e., when P is₁Node and data block D_iWhen the nodes fail simultaneously, the previous data update is completely disabled. Thus, a further measure can be taken to narrow the time window for the check block node to perform data delta update log fusion, and can assume that a data block D is within a suitable time window (e.g., 15 minutes or 30 minutes)_iNode and a check block P₁It is rare that nodes fail simultaneously. Thus, P of a single poolThe DN-PDS can save the storage space and improve the updating efficiency on the premise of less influence on the data reliability. Wherein, the check block P₁Data delta log generated by corresponding multiple data block updating

A free area stored on the check node. However, this method easily causes fragmentation of the log storage space and waste of space due to voids after garbage collection, and when the check block and the check difference log are updated in a fusion manner, the check difference log placed in a dispersed manner may bring a large disk seek burden, thereby resulting in low reading efficiency of a new check block. Therefore, an efficient data delta log storage method, namely an adaptive management algorithm of a single storage pool, needs to be designed.

Adaptive management algorithm for single storage pool:

in order to save the storage space and improve the storage efficiency of the storage pool, the invention designs a working load-aware storage pool adaptive management scheme, which can dynamically adjust and predict the size of the storage space of each updated check block in the storage pool. This solution has three main components: (1) predicting a storage space size of each parity block in a next time segment using the measured workload pattern; (2) shrinking the storage space and releasing unused space back to the storage pool; (3) and fusing the check difference quantity in the storage space with the corresponding check block. In order to avoid small unusable holes in the storage space recycled by the shrink operation, the size of the storage space and the shrink size need to be set to be multiples of the size of the parity chunks, which ensures that the whole data chunk or parity chunk can be placed in the recycled storage space.

TABLE 2 adaptive management algorithm for single pool

The algorithm of Table 2 describes the basic flow of an adaptive management scheme for unified storage pools:

first, a default initial storage size (2 or 4 times the data block size) is set that is large enough to hold all of the check delta logs.

the condition Cond0 is added to make the frequency of the contraction operation lower than that of the fusion operation, and the purpose of adding this condition is to avoid too many fusion operations caused by too small storage space due to frequent contraction operations, which would cause fusion update operations of all the check difference logs of the check block, and this operation is costly. The check block set with check part log on one node is represented as S, and for each time interval t and each check block p ∈ S, r is used_t(p) denotes the size of the memory space, u_t(p) represents the usage of the storage space. Intuitively, u_t(p) represents the percentage of used storage space. At the end of each time segment t, u is measured using an exponential weighted smoothed average in obtaining storage pool utilization getUtility_t(p)：

Here use (p) represents the amount of memory space that has been used in this time segment, r_t(p) is the current storage size of the parity block p, and α is the smoothing parameter (usually 0.3).

According to whether the utilization rate utility reaches 90% or not and whether the condition Cond0 is satisfied or not, an unnecessary storage space size c (p) is determined, and the size is the size of the recycled storage space of the corresponding check block p calculated in the calculation of the contraction space computeShrinkSize. Then, the space c (p) that is no longer used is aggressively shrunk and rounded down to a multiple of the chunk size:

ChunkSize indicates the allocated storage data block size. The doShrink function of row 8 of table 2 will attempt to recover from the existing memory size r_t(p) shrinking the space of c (p) size. Thus, the storage space size of the parity block p in the next time period t +1 is:

r_t+1(p)＝r_t(p)-c(p) (11)

if only one block size of storage space is left before puncturing, no puncturing will be performed.

When the usage rate of the unified storage pool space of one check block reaches 90%, performing a domarge fusion operation to fuse all the check difference logs and the check blocks, and recovering all the storage spaces, namely the check block does not have a corresponding storage space, and then reallocating the storage spaces when a new check difference log is generated. Here domarge is not on the update path, so that the I/O access performance of the system is only affected to a limited extent. In addition, the domarge operation may also be triggered when a check block needs to be read (degraded case or failover). The triggering frequency of the domarge operation is not high in general, because of two reasons: 1) the initial space of each update check block is large, and the common updates are very small, so that the use rate of 90% cannot be reached quickly; 2) a degraded condition or a fault recovery triggers a domarge operation, but both of these fault occurrences are less likely to occur relative to normal reads and writes.

The PDN-PDS of a single storage pool is provided by combining PDN-P and a verification partial log updating idea based on dispersion, and an EC (4, 2) encoding example is used.

The incoming data stream in fig. 7 describes the sequence of operations: (1) writing the first data segment containing data blocks a and b, (2) updating part a 'of a, (3) writing another new data segment containing data blocks c and d, (4) last updating part b' of data block b. It can be seen that:

(1) PDN-P overwrites the data block for data updating, and simultaneously verifies and overwrites the verification blocks of the

verification nodes

1 and 2;

(2) the PDN-PDS overwrites the data update on the data block, using a single storage pool behind the check block of check node 1 to deposit all data delta logs (Δ a + Δ b).

With respect to PDN-P, the PDN-PDS of the present invention uses only one storage pool to store numbers

The data difference logs are placed together, so that the updating efficiency of the check block is improved, and the reading is reduced

Multiple disk seek in updating process is adopted, storage space of check node is saved, and simultaneously, the method needs to be carried out

Certain measures are taken to further ensure the reliability of the data.

The indexes considered in the five aspects are as follows:

(1) calculation amount: updating a data block, updating a check block and managing a storage pool;

(2) transmission bandwidth: the network transmits the data block difference quantity;

(3) disk I/O: data block disk read/write and check block disk read/write;

(4) storage space: difference logs of data nodes and difference logs of check nodes;

(5) data reliability: fault tolerance of data.

The three levels "high", "medium", and "low" are used in table 3 below to qualitatively compare the various aspects of update scheme complexity. The advantage of using a single storage pool for the PDN-PDS is that the amount of computation is reduced, the transmission bandwidth and the disk I/O are further reduced, and less disk space is used, but there is a certain cost that the data reliability becomes "high", which means that the data reliability of the PDN-PDS is slightly reduced, i.e. the latest update data is lost in case of a failure of (a settable) one updated data block node and the first check block node at the same time within a certain time period. However, a single node failure accounts for 99.75% of all fault fixes, such missing update data is rare, and the time period for check update fusion can be reduced appropriately to further reduce this possibility.

TABLE 3 update complexity contrast for improved schemes

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A differential log erasure code updating method for a single storage pool, comprising the steps of:

Is sent to D_iA node of (2); i belongs to {1,2, …, k }, i represents a data block sequence number, and k represents the number of data blocks;

S2、D_ithe node of (2) reads the data block D from the local disk_iAnd calculating updated data according to the update information sent by the main object storage device

And the original data D_iDelta at data block position o

Namely, it is

Then, the strategy of checking partial log is adopted to carry out on the data block D_iCarry out coveringWrite updates, i.e.

j∈{ 1，2，…，m} ，P_j' (o) and P_j(o) indicates new parity chunks P, respectively_j' and old parity Block P_jWord at update offset position o, C_jiCoefficients for generating the check data;

the adaptive management algorithm for the single storage pool in step S2 includes the following steps:

wherein the contraction operation is performed less frequently than the fusion operation and is set to the condition Cond0, the check block set on a node having a check partial log is represented as S, and r is used for the t-th time interval and each check block p ∈ S_t(p) represents the size of the memory space in the t-th time interval, u_t(p) represents the utilization rate of the storage space in the t-th time interval, and at the end of the t-th time interval, u is measured by using an exponential weight smooth mean value when the utilization rate of the storage space is obtained_t(p)：

Here use (p) represents the amount of memory space that has been used in this time segment, r_t(p) is the current storage space size of the parity block p, and α is a smoothing parameter; u. of_t-1(p) represents the usage rate of memory space in the t-1 th time interval;

ChunkSize indicates the size of the allocated storage data block, so that the size of the storage space of the check block p in the t +1 th time interval is:

r_t+1(p)＝r_t(p)-c(p)

2. The method of claim 1, wherein in step S1, the data location method is a replication run method based on extensible hash.