CN110750382A

CN110750382A - Minimum storage regeneration code coding method and system for improving data repair performance

Info

Publication number: CN110750382A
Application number: CN201910880818.XA
Authority: CN
Inventors: 冯丹; 叶柳青; 胡燏翀; 魏学亮
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2019-09-18
Filing date: 2019-09-18
Publication date: 2020-02-04
Anticipated expiration: 2039-09-18
Also published as: CN110750382B

Abstract

The invention discloses a minimum storage regeneration code coding method and a minimum storage regeneration code coding system for improving data restoration performance, which belong to the field of computer storage, and comprise the steps of equally dividing original data into k data blocks, equally dividing each data block into α data fragments, equally dividing each check block into α check fragments, coding k multiplied by α data fragments after determining a generating matrix for data coding to obtain each check fragment in each check block, respectively storing the data blocks and the check blocks onto different storage nodes after coding is completed, regularly checking whether a failure block exists, if the total number of the failure blocks is larger than the number m of coding blocks, restoring the data block which is least failed, if only one failure data block exists, requesting a data amount of 1/(d-k +1) from d effective storage nodes which are congested to restore the failure data block, and otherwise, requesting a complete block from m effective storage nodes to restore the failure block.

Description

Minimum storage regeneration code coding method and system for improving data repair performance

Technical Field

The invention belongs to the field of computer storage, and particularly relates to a minimum storage regeneration code encoding method and system for improving data restoration performance.

Background

Erasure codes are a coding method for ensuring data redundancy, which first splits original data into data blocks of equal size, and then codes the data blocks into check blocks. When several data blocks or check blocks are lost, erasure coding techniques can ensure that the original data can still be recovered. The technology is widely applied to a distributed storage system and a cloud storage system to improve the reliability of the system and prevent data inaccessibility caused by disk failure or data loss and the like.

In the conventional erasure coding method, original data with a size of M is equally divided into k data blocks: d₀、D₁、…、D_k-1After encoding a data block, m encoded blocks are generated: p₀、P₁、…、P_m-1Thus, n blocks are obtained, and the data blocks and the coding blocks are stored on n different storage nodes. A storage node is a logical abstraction of a storage device, which may be either a disk or a storage server. Erasure coding techniques generally have optimal storage efficiency or Maximum Distance Separable (MDS) properties compared to conventional replica techniques, and can provide comparable system reliability at lower storage overhead. However, erasure codes need to read and transmit data on multiple disks during data recovery, and occupy a large amount of storage resources and network resources, so that they are inferior to copy strategies in terms of performance. Taking (n, k) reed-solomon encoding as an example, as long as the data is repairable, k entire blocks of data blocks or check blocks are required for reconstruction, and the required data amount is M.

The minimum storage regeneration code is a special erasure code, which divides the coding block into finer granularity, and then can realize the optimal repair bandwidth by selectively selecting some segments which can be reused to participate in repair. In addition to the parameters n and k in the conventional erasure code method, the minimum stored regeneration code has a parameter d to represent the number of valid blocks other than the failed block through the network connection at the time of repair. When a single point fails, most of the existing minimum storage regeneration codes are all valid blocks (d is n-1) except for the failed block through network connection, and 1/m of data volume is downloaded from each block according to a certain rule to repair the damaged data, so that the repair bandwidth can be effectively reduced. But in a complex network environment it is impractical to require the participation of all active nodes for each repair. Moreover, the data repair delay depends on the time when the last node returns the data request, and when the system is unstable and a congested node appears or the node is solicited by other tasks to delay sending the data required for repair, the data repair performance of the system is greatly reduced. Generally, when the minimum storage regeneration code is used, the data repair performance needs to be improved in an unstable network environment.

Disclosure of Invention

In view of the shortcomings and needs in the art, the present invention provides a method and system for encoding a minimum stored regeneration code for improving data repair performance, which aims to improve data repair performance in an unstable network environment.

To achieve the above object, according to a first aspect of the present invention, there is provided a minimum stored regeneration code encoding method for improving data repair performance, comprising:

equally dividing the original data into k data blocks D₀～D_k-1Equally dividing each data block into α data fragments to obtain k multiplied by α data fragments, equally dividing each check block into α check fragments in m check blocks generated by encoding to generate m multiplied by α check fragments in total;

after a generating matrix G for data coding is determined, coding k multiplied by α data fragments according to the generating matrix G to obtain each check fragment in m check blocks;

regularly checking whether the blocks on each storage node fail;

if the failed blocks exist and the total number of the failed blocks is more than m, the repair is failed and the repair is finished; if only one failed data block exists, requesting data from d effective storage nodes with the least congestion, and downloading data amount of 1/(d-k +1) from each requested node to repair the failed data block; under other conditions, requesting data from m effective storage nodes, and downloading complete blocks from each requested node to repair failed blocks;

wherein n is k + m, k is not less than d and not more than n-1, the value of α is related to d, and k is not less than 2.

In the invention, when data restoration is carried out on a single failure data block, the number d of effective nodes connected through a network can be flexibly set in the range of [ k, n-1] according to the network congestion condition; specifically, under a stable network environment, the invention can construct the minimum storage regeneration code with d being n-1 so as to obtain smaller repair bandwidth than the traditional erasure code; in an unstable network environment, the invention can construct the minimum storage regeneration code with k being more than or equal to d being less than n-1 to adapt to the load condition of the sensing node, and select to request data from the relatively non-congested node, thereby not only obtaining smaller repair bandwidth overhead than the traditional erasure code, but also avoiding requesting data from the congested node, and finally obtaining smaller repair time delay than the minimum storage regeneration code with d being equal to n-1. In general, the present invention can improve data repair performance in an unstable network environment.

Further, the air conditioner is provided with a fan,

wherein the content of the first and second substances,indicating rounding up.

The invention is based on

Determining the fragmentation number α of the data block and the check block can match the selected valid node number d when the fragmentation condition is invalid with a single data block, thereby improving the subsequent data repair effect.

Further, determining a generator matrix G for data encoding includes:

for any check block P_tDetermining all data fragments participating in generating check block P_tIn each check fragment P_t,0～P_t,α-1So as to obtain a coding position matrix L, and determining a coding coefficient matrix V corresponding to the data slices participating in coding_t(ii) a According to

Get the check Block P_tCorresponding generator matrix block G_t；

The coding position matrixes corresponding to the check blocks are the same;

combining the generated matrix blocks corresponding to the check blocks to obtain a generated matrix as follows:

wherein, omicron represents the Hadamard product, t represents the number of the check block, and t is more than or equal to 0 and less than or equal to m-1.

In the invention, the coding position matrixes corresponding to the check blocks are the same, namely for each check block, all the data fragments are the same as the data fragments for generating the check fragments in the check block, and when d is less than n-1, data can be requested to more than d nodes.

Further, the method for acquiring the encoding position matrix L includes:

(S1) initializing a target matrix of α rows and k × α columns so that all elements in the target matrix are 0, wherein each row in the target matrix corresponds to a check block P_tEach column corresponds to one data fragment in all the data fragments;

(S2) setting the j-th in the object matrix₁Line, i-th₁×α+j₁The elements of the column being non-zero values to convert the ith₁J in each data block₁Data slicing

Adding into the mixture;

(S3) setting variables

(S4) if i is k, the process proceeds to step (S8); otherwise, updating the value of the variable i according to i-i +1 and setting the variable

r＝i mod(d-k+1)，

j is-1, and the column sequence number S of the data fragment is positioned according to the variable r and run_i＝r*run；

(S5) if j is step, the process proceeds to step (S4); otherwise, updating the value of the variable j according to j ═ j +1, and setting the variable tmp ═ 1;

(S6) if tmp is run, proceeding to step (S5); otherwise, updating the value of the variable tmp according to tmp +1, and calculating the variable pos as j (d-k +1) run + tmp after setting the variable ex as-1;

(S7) if ex is d-k, the process proceeds to step (S6); otherwise, updating the value of the variable ex according to the ex +1, and setting the pos + ex dis row and the S th row in the target matrix_iThe elements of the column are non-zero values to slice the ith pos + ex dis data in the ith data block into D_i,pos+ex*disAdding the above-mentioned material, and repeating the step (S7);

(S8) determining the target matrix as an encoding position matrix L;

wherein i is more than or equal to 0₁≤k-1，0≤j₁≤α-1，

Meaning rounding up and mod denotes a modulo operation.

The invention uses the coding position matrix L determined by the method to ensure that all data fragments participate in coding to obtain the check block, and simultaneously, adds partial information in the data fragments as redundant information to the check information part, so that more data information can be obtained by only reading less check information during repair, thereby reducing the repair bandwidth.

Further, the coefficient matrix V is encoded_tThe obtaining method comprises the following steps:

searching all linearly independent coding coefficient matrixes of a linear equation set obtained by randomly selecting k blocks by an enumeration method so as to obtain a coding coefficient matrix set;

selecting a coding coefficient matrix from the coding coefficient matrix set, so that when a single failure data block is repaired, linear equations obtained by requesting data from d effective storage nodes with the least congestion are linearly independent; determining the selected coding coefficient matrix as a coding coefficient matrix V_t。

The invention determines the coding coefficient matrix V by the method_tThe linear equation system obtained by acquiring any k blocks in n blocks is ensured to be linearly independent (equivalent to matrix reversibility), so that the Maximum Distance Separable (MDS) property can be achieved; at the same time, the coding coefficient matrix V determined by the above method_tAnd the matrix obtained after the selected single data block repairing scheme is reversible is also ensured.

Further, if there is only one failed data block, requesting data from the least congested d valid storage nodes, and downloading a data amount of 1/(d-k +1) from each requested node to repair the failed data block, respectively, includes:

determining the fragment position information participating in repairing the failed data block in the valid block of the requested node to determine a data fragment sequence number seq1 and a check fragment sequence number seq2 participating in repairing the failed data block;

the data fragments and the check fragments which participate in the repair are combined into a matrix M in sequence₁Combining the data fragments participating in repair and the data fragments to be repaired in the failure data block into a matrix M in sequence₂；

Selecting from the generator matrix G according to the sequence number seq2 of the check fragmentCorresponding rows are combined in sequence to obtain a matrix M_f(ii) a Determining the sequence number seq3 of the data fragment to be repaired according to the number of the data block to be repaired, and determining the sequence number seq3 of the slave matrix M according to the sequence number seq1 of the data fragment and the sequence number seq3 of the data fragment_fCorresponding columns are selected and combined in sequence to obtain a matrix M_f′；

According to matrix M_f' construction matrix M_tSo that M_t·M₂＝M₁(ii) a Matrix M_fIs a matrix M_tOne matrix block of;

for matrix M_tAfter inversion, a repair matrix M is obtained_r(ii) a Using a repair matrix M_rSum matrix M₁And repairing each data fragment to be repaired, thereby completing the repair of the failed data block.

Further, determining the fragment location information participating in repairing the failed data block in the valid block located in the requested node includes:

(T1) initialize the set R as an empty set and set the variables

r＝f mod(d-k+1)，

S_f＝r*run，j＝-1；

(T2) if j is step, the process proceeds to step (T4); otherwise, updating the value of the variable j according to j ═ j +1, and setting the variable tmp ═ 1;

(T3) if tmp is run, proceeding to step (T2); otherwise, setting a variable pos (j) (d-k +1) run + tmp, updating the value of the variable tmp according to the tmp (tmp +1), and calculating a slicing position Ps (S)_f+ pos and adding into the set R to indicate the Ps-th fragment in the valid block to participate in repairing the failed data block, and then proceeding to the step (T3);

(T4) determining the fragment position recorded in the set R as the fragment position information in the valid block, which is involved in repairing the failed data block;

where f denotes the number of the invalid data block.

Further, requesting data from m valid storage nodes, and downloading complete blocks from each requested node to repair failed blocks respectively, includes:

after generating the unit matrix of k × α rows and k × α columns, the unit matrix is combined with the generator matrix G to form a matrix M of (k + M) × α rows and k × α columns such that M · M_D＝M_N；M_DMatrix composed of all data fragments combined in sequence, M_NA matrix formed by combining all the data fragments and the check fragments in sequence;

according to the number of the requested node, determining the fragment serial number for repairing the failure block, selecting corresponding rows from the matrix M, and combining in sequence to obtain the matrix M_tSo that M_t·M_D＝M_V；M_VCombining all fragments for repairing the failure block into a matrix in sequence;

for matrix M_tAfter inversion, a repair matrix M is obtained_r(ii) a Using a repair matrix M_rSum matrix M_VAnd repairing the data fragments in each failure block so as to finish repairing the failure blocks.

According to a second aspect of the present invention, there is provided a system comprising a computer readable storage medium for storing an executable program and a processor;

the processor is used for reading an executable program stored in a computer readable storage medium and executing the minimum storage regeneration code encoding method for improving the data repair performance provided by the first aspect of the invention.

Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:

(1) according to the minimum storage regeneration code coding method for improving the data restoration performance, when data restoration is carried out on a single failure data block, the number d of effective nodes connected through a network can be flexibly set within the range of [ k, n-1] according to the network congestion condition; specifically, under a stable network environment, the invention can construct the minimum storage regeneration code with d being n-1 so as to obtain smaller repair bandwidth than the traditional erasure code; in an unstable environment, the invention can construct the minimum storage regeneration code with k being less than or equal to d < n-1 to adapt to the load condition of the sensing node, and select to request data from the relatively non-congested node, thereby not only obtaining smaller repair bandwidth overhead than the traditional erasure code, but also avoiding requesting data from the congested node, and finally obtaining smaller repair time delay than the minimum storage regeneration code with d being equal to n-1. In general, the present invention can improve data repair performance in an unstable network environment.

(2) According to the minimum storage regeneration code coding method for improving the data repair performance, the coding position matrixes corresponding to all check blocks are the same, when d is less than n-1, data can be requested from more than d nodes, and because the data position information transmitted by the nodes is the same, the data can be repaired only by waiting for the return of d data before, and the waiting of data information with high delay is avoided.

(3) According to the minimum storage regeneration code coding method for improving the data restoration performance, the constructed coding position matrix L ensures that all data fragments participate in coding to obtain the check block, and meanwhile, part of information in the data fragments is added to the check information part as redundant information, so that when restoration is carried out, more data information can be obtained by only reading less check information, and the restoration bandwidth can be reduced.

Drawings

FIG. 1 is a flowchart of a minimum stored regeneration code encoding method for improving data repair performance according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for determining a code location matrix according to an embodiment of the present invention;

fig. 3 is a flowchart of a method for determining fragment location information participating in repairing a failed data block when repairing a single data block in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

In order to improve the data repair performance in an unstable network environment, the present invention provides a minimum storage regeneration code encoding method for improving the data repair performance, as shown in fig. 1, including:

equally dividing the original data into k data blocks D₀～D_k-1Equally dividing each data block into α data fragments to obtain k multiplied by α data fragments, equally dividing each check block into α check fragments in m check blocks generated by encoding to generate m multiplied by α check fragments, filling with 0 if the data is insufficient in the fragmentation process, and completing the data fragmentation and fragmentation operation through the step;

after determining a generating matrix G for data coding, coding k multiplied by α data fragments according to the generating matrix G to obtain each check fragment in m check blocks, and after the coding is finished, respectively storing the k data blocks and the m check blocks on n different storage nodes;

regularly checking whether the blocks on each storage node fail; by this step, the operation of checking the state of the data block can be completed;

if the failed blocks exist and the total number of the failed blocks is more than m, the repair is failed and the repair is finished; if only one failed data block exists, requesting data from d effective storage nodes with the least congestion, and downloading data amount of 1/(d-k +1) from each requested node to repair the failed data block; under other conditions, requesting data from m effective storage nodes, and downloading complete blocks from each requested node to repair failed blocks; through the step, a corresponding failed block repair scheme can be determined according to the specific failure condition, so that the steps of failed block repair division and failure repair are completed;

According to the minimum storage regeneration code coding method for improving the data repair performance, when data repair is carried out on a single failure data block, the number d of effective nodes connected through a network can be flexibly set within the range of [ k, n-1] according to the network congestion condition, and specifically, under the stable network environment, a minimum storage regeneration code with d being n-1 can be constructed so as to obtain a repair bandwidth smaller than that of a traditional erasure code; in an unstable network environment, a minimum storage regeneration code with k being more than or equal to d < n-1 can be constructed to adapt to the load condition of a sensing node, and data is selected to be requested to a node which is relatively not congested, so that repair bandwidth overhead smaller than that of a traditional erasure code can be obtained, data can be prevented from being requested to a congested node, and repair time delay smaller than that of the minimum storage regeneration code with d being equal to n-1 is finally obtained. In summary, the above-mentioned minimum storage regeneration code encoding method for improving data repair performance can improve data repair performance in an unstable network environment.

Late binding technology (late binding protocol) does not need to know the congestion condition of nodes in the system, and can effectively avoid waiting for other nodes with high delay to return messages by requesting data from more than d nodes and starting to repair the data when the first d nodes are obtained. Dynamic data access technology (dynamic data access technology) mentioned in a paper (EC-Store: weighting the Gap between Store and latency Distributed implementation Systems, International Conference on Distributed computing Systems (ICDCS), 2018) judges the corresponding delay condition and whether the nodes in the network are blocked or not by modeling the system and acquiring the corresponding network and load information. In the present invention, the request of data to a node having a high latency can be avoided by the above-described technique, and it should be understood that other methods of avoiding the request of data to a node having a high latency can be applied to the present invention.

Optionally, in the above minimum storage regeneration code encoding method for improving data repair performance, the number of the data blocks and the number of the check blocks are divided into two

Wherein the content of the first and second substances,

represents rounding up;

the present embodiment is based on

In an optional embodiment, in the method for encoding a minimum stored reproduction code for improving data repair performance, determining a generator matrix G for data encoding includes:

for any check block P_tDetermining all data fragments participating in generating check block P_tIn each check fragment P_t,0～P_t,α-1So as to obtain a coding position matrix L, and determining a coding coefficient matrix V corresponding to the data slices participating in coding_t(ii) a According to G_t＝V_tObtaining an o-L check block P_tCorresponding generator matrix block G_t；

The coding position matrixes corresponding to the check blocks are the same;

wherein a represents a Hadamard product, t represents the number of the check block, and t is greater than or equal to 0 and less than or equal to m-1;

by enabling the coding position matrixes corresponding to the check blocks to be the same, namely for each check block, all data fragments are the same as the data fragments for generating the check fragments in the check blocks, when d is less than n-1, data can be requested to more than d nodes, and because the data position information transmitted by the nodes is the same, the data can be repaired only by waiting for the previous d data to return, so that the waiting for the data information with high delay is avoided;

as shown in fig. 2, when determining the generator matrix G, the method for acquiring the encoding position matrix L includes:

Adding into the mixture;

(S3) setting variables

r＝imod(d-k+1)，

(S7) if ex is d-k, the process proceeds to step (S6); otherwise, updating the value of the variable ex according to the ex +1, and setting the pos + ex dis row and the S th row in the target matrix_iThe elements of the column are non-zero values to set the ith in the ith data blockpos + ex dis data slices D_i,pos+ex*disAdding the above-mentioned material, and repeating the step (S7);

(S8) determining the target matrix as an encoding position matrix L;

wherein i is more than or equal to 0₁≤k-1，0≤j₁≤α-1，

Represents rounding up, mod represents a modulo operation;

by the coding position matrix L determined by the method, all data fragments are ensured to participate in coding to obtain the check block, and meanwhile, part of information in the data fragments is added to the check information part as redundant information, so that more data information can be obtained by only reading less check information during repair, and the repair bandwidth can be reduced;

after determining the coding position matrix L, the coefficient matrix V is coded_tThe obtaining method comprises the following steps:

selecting a coding coefficient matrix from the coding coefficient matrix set, so that when a single failure data block is repaired, linear equations obtained by requesting data from d effective storage nodes with the least congestion are linearly independent; determining the selected coding coefficient matrix as a coding coefficient matrix V_t；

Determining a coding coefficient matrix V by the above method_tThe linear equation system obtained by acquiring any k blocks in n blocks is ensured to be linearly independent (equivalent to matrix reversibility), so that the Maximum Distance Separable (MDS) property can be achieved; at the same time, the coding coefficient matrix V determined by the above method_tAnd the matrix obtained after the selected single data block repairing scheme is reversible is also ensured.

In an optional embodiment, in the above method for encoding a minimum storage regeneration code to improve data repair performance, if there is only one failed data block, the method requests data from d least congested valid storage nodes, and downloads a data amount of 1/(d-k +1) from each requested node to repair the failed data block, including:

Selecting corresponding rows from the generator matrix G according to the sequence number seq2 of the check fragment, and combining the rows in sequence to obtain a matrix M_f(ii) a Determining the sequence number seq3 of the data fragment to be repaired according to the number of the data block to be repaired, and determining the sequence number seq3 of the slave matrix M according to the sequence number seq1 of the data fragment and the sequence number seq3 of the data fragment_fCorresponding columns are selected and combined in sequence to obtain a matrix M_f′；

for matrix M_tAfter inversion, a repair matrix M is obtained_r(ii) a Using a repair matrix M_rSum matrix M₁Repairing each data fragment to be repaired, thereby completing the repair of the failed data block;

specifically, as shown in fig. 3, determining the fragment location information of the valid block located in the requested node, which participates in repairing the failed data block, includes:

(T1) initialize the set R as an empty set and set the variablesr＝f mod(d-k+1)，

S_f＝r*run，j＝-1；

where f denotes the number of the invalid data block.

In an optional embodiment, in the above method for encoding a minimum storage regeneration code for improving data repair performance, requesting data from m valid storage nodes, and downloading complete blocks from each requested node to repair a failed block respectively includes:

The invention also provides a system comprising a computer-readable storage medium and a processor, the computer-readable storage medium for storing an executable program;

Application example:

the minimum storage regeneration code encoding method for improving data repair performance is further explained below with reference to a specific example.

Operation GF (2) in the finite field⁸) In the following, m is set to 3, k to 3, and d to 4 according to the network conditions, and accordingly, n is set to k + m to 6,

for original data with a data size of 30MB, the following operations are performed:

(1) data blocking:

equally dividing original data with the data size of 30MB into 3 data blocks D of 10MB_i(i is 0,1,2), and then 3 data blocks are stored in 3 data nodes N, respectively₀，N₁，N₂Divide each data chunk equally into 4 data fragments according to α, and assign sequence numbers to all data fragments, where data chunk D_iThe serial number of the jth data fragment in (1) is i multiplied by 4+ j;

three check blocks P to be generated₀、P₁、P₂Each check block in the check block is equally divided into 4 check fragments;

(2) and (3) data encoding:

according to the method flow shown in fig. 2, the encoding position matrix is determined as follows:

each row corresponds to one check fragment in the check block, and each column corresponds to one data fragment in all the data fragments; the coding position matrixes corresponding to the check blocks are the same;

and randomly searching the coding coefficient matrix meeting the MDS property in the coefficient matrix V according to an enumeration method to obtain the coding coefficient matrix corresponding to each check block, so that a generation matrix G can be obtained as follows:

the data fragments are encoded according to the generator matrix G, and encoded check block information P can be obtained₀、P₁、P₂Comprises the following steps:

storing the three check blocks into different data nodes N respectively₀，N₁，N₂N on three data nodes₃，N₄，N₅；

(3) Checking the data block state:

checking whether the blocks on each data node are in error or lost regularly and sequentially, and if so, turning to the step (4); otherwise, not processing;

(4) repairing and dividing according to the failure blocks:

acquiring the numbers of all failed data blocks and check blocks according to the state check, and generating a lost block set; when the number of the failure blocks exceeds m to 3, indicating that the data is completely lost and cannot be recovered outside the repairable failure range, failing to repair, and ending the repair; if only one failed data block exists, executing step (5.2) to request data from the d least congested effective storage nodes, and downloading data amount of 1/(d-k +1) from each requested node to repair the failed data block; in other cases, namely in the recoverable failure range, when the failed check block exists or the number of the failed data blocks is more than 1, executing the step (5.1) to request data from m effective storage nodes and download the complete block from each requested node to recover the failed block;

(5) and (3) failure repair:

(5.1) conventional repair:

for example, there is a failure set of 0,1 after the status checking step, i.e., there are two failed data blocks D₀And D₁Randomly selecting 3 effective storage nodes to request dataHere, the selection block {2,3,4} is selected to complete data repair;

first, an identity matrix I of 12 rows and 12 columns is generated, and then combined with a generator matrix G to form a matrix of 24 rows and 24 columns

So that M.M_D＝M_N；M_DMatrix composed of all data fragments combined in sequence, M_NA matrix formed by combining all the data fragments and the check fragments in sequence;

according to the number of the requested node, determining the fragmentation sequence numbers {8,9,10,11, 12,13,14,15,16,17,18,19} of the failed blocks, selecting corresponding rows from the matrix M, and combining the rows in sequence to obtain the matrix M_tSo that M_t·M_D＝M_V；M_VCombining all fragments for repairing the failure block into a matrix in sequence; wherein the content of the first and second substances,

for matrix M_tAfter inversion, a repair matrix M is obtained_r：

Using a repair matrix M_rSum matrix M_VRepairing the data fragments in each failure block so as to complete the repair of the failure blocks;

due to M_D＝M_r·M_VBy repairing the matrix M_rSum matrix M_VThe lost data block can be repaired by the product of the following steps:

(5.2) single data block repair:

for example, there is a failure set 2 after the status check step, i.e., there is only one failureData block D of₂Optionally selecting 4 nodes from all the node numbers {0,1,3,4,5} to participate in repair, where the selected node number is {0,1,3,4 };

according to the method flow shown in fig. 3, it can be obtained that the position of the partition participating in the repair in each block in the block is {0,2}, and therefore, the data partition participating in the repair of the failed data block is specifically { D }_0,0D_0,2,D_1,0,D_1,2And the corresponding data fragment sequence number seq1 ═ 0,2,4,6, and the check fragment participating in repairing the failed data block is { P }_0,0,P_0,2,P_1,0,P_1,2The corresponding check fragment sequence number seq2 ═ 0,2,4,6 };

Selecting corresponding rows from the generator matrix G according to the sequence number seq2 of the check fragment, and combining the rows in sequence to obtain a matrix M_f：

Determining the data fragment sequence number seq3 to be repaired to be {8,9,10,11} according to the number 2 of the data block to be repaired, and selecting the matrix M according to the data fragment sequence number seq1 and the data fragment sequence number seq3_fTo select a corresponding column, i.e. from the matrix M_fSelected 0,2,4,6, 8,9,10 and 11 columns, and then combined in sequence to obtain a matrix M_f′：

According to matrix M_f' construction matrix M_tSo that M_t·M₂＝M₁(ii) a Matrix M_fIs a matrix M_tOne matrix block of; wherein the content of the first and second substances,

for matrix M_tAfter inversion, a repair matrix M is obtained_r：

Using a repair matrix M_rSum matrix M₁Repairing each data fragment to be repaired, thereby completing the repair of the failed data block;

due to M₂＝M_r·M₁By repairing the matrix M_rSum matrix M₁The lost data block can be repaired by the product of the following steps:

it will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for encoding a minimal stored regeneration code for improving data repair performance, comprising:

regularly checking whether the blocks on each storage node fail;

2. The minimum stored regeneration code encoding method for improving data repair performance as claimed in claim 1,

wherein the content of the first and second substances,

indicating rounding up.

3. The minimum stored regenerative code encoding method for improving data repair performance according to claim 1 or 2, wherein determining a generator matrix G for data encoding comprises:

Get the check Block P_tCorresponding generator matrix block G_t；

The coding position matrixes corresponding to the check blocks are the same;

wherein the content of the first and second substances,

representing the Hadamard product, t represents the number of the check block, and t is more than or equal to 0 and less than or equal to m-1.

4. The minimum stored regenerating code encoding method for improving data repair performance according to claim 3, wherein the method for obtaining the encoding position matrix L includes:

(S1) initializing a target matrix of α rows and k × α columns such that all elements therein are 0, wherein each row in the target matrix corresponds to a parity check block P_tEach column corresponds to one data fragment in all the data fragments;

Adding into the mixture;

(S3) setting variables

i＝-1；

(S4) if i is k, the process proceeds to step (S8); otherwise, updating the value of the variable i according to i-i +1 and setting the variabler＝i mod(d-k+1)，

(S7) if ex is d-k, the process proceeds to step (S6); otherwise, updating the value of the variable ex according to the ex +1, and setting the pos + ex dis row and the S th row in the target matrix_iThe elements of the column are non-zero values to slice the ith pos + ex dis data in the ith data block into D_i,pos+ex*_disAdding the above-mentioned material, and repeating the step (S7);

(S8) determining the target matrix as an encoding position matrix L;

wherein i is more than or equal to 0₁≤k-1，0≤j₁≤α-1，Meaning rounding up and mod denotes a modulo operation.

5. The minimum-stored-regeneration-code encoding method for improving data repair performance of claim 4, wherein the encoding coefficient matrix V_tThe obtaining method comprises the following steps:

6. The minimum storage regeneration code encoding method for improving data repair performance as claimed in claim 3, wherein if there is only one failed data block, requesting data from d least congested valid storage nodes, downloading a data amount of 1/(d-k +1) from each requested node to repair the failed data block respectively, comprises:

the data fragments and the check fragments which participate in the repair are combined into a matrix M in sequence₁Combining the data fragments participating in the repair and the data fragments to be repaired in the failure data block into an array M in sequence₂；

7. The method of claim 6, wherein determining fragmentation location information associated with repairing failed data blocks in valid blocks located in a requested node comprises:

(T1) initialize the set R as an empty set and set the variables

r＝f mod(d-k+1)，

S_f＝r*run，j＝-1；

where f denotes the number of the invalid data block.

8. The method of claim 3, wherein requesting data from m active storage nodes, downloading complete blocks from each requested node to repair failed blocks, respectively, comprises:

for matrix M_tAfter inversion, a repair matrix M is obtained_r(ii) a Benefit toUsing a repair matrix M_rSum matrix M_VAnd repairing the data fragments in each failure block so as to finish repairing the failure blocks.

9. A system comprising a computer-readable storage medium and a processor, wherein the computer-readable storage medium is configured to store an executable program;

the processor is used for reading an executable program stored in the computer readable storage medium and executing the minimum storage regeneration code encoding method for improving data repair performance of any one of claims 1 to 8.