CN113391948A

CN113391948A - Folding type extensible distributed storage coding and repairing and expanding method

Info

Publication number: CN113391948A
Application number: CN202110726617.1A
Authority: CN
Inventors: 孙蓉; 杜从军; 刘景伟; 裴庆祺
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2021-09-14
Anticipated expiration: 2041-06-29
Also published as: CN113391948B

Abstract

The invention discloses a folding type expandable distributed storage coding and repairing and expanding method, which comprises the following steps: after the coding parameters of each node are determined, sequentially constructing a generating matrix set corresponding to other stages from the generating matrix of the last stage, combining a coding group and selecting a code from the coding group to code data to be coded; when the node fails, selecting the nodes which are not failed and have the same number as the information nodes, and downloading the data symbols from the nodes to recover the data symbols in the failed nodes; the encoded data is expanded by merging the two sub-stripes within each expanded group. The invention has the advantages of improving the fault-tolerant capability of the expanded system, having MDS property, low expansion bandwidth, being capable of expanding for many times and the like, and can be used for coding, repairing and expanding the distributed storage system with the node calculation capability.

Description

Folding type extensible distributed storage coding and repairing and expanding method

Technical Field

The invention belongs to the technical field of computers, and further relates to a folding type extensible distributed storage coding and repairing and expanding method in the technical field of distributed storage. The invention can be used for coding, repairing and expanding the distributed storage system with the node having the computing capability.

Background

Due to the fact that the distributed storage system is large in data storage amount and frequent in node failure events, the distributed storage system needs to improve the reliability of the system by storing redundant data. The erasure code technology is a typical data redundancy mechanism, and achieves the purpose of fault tolerance by dividing original data into information blocks, then coding the information blocks to generate check blocks, and storing the information blocks and the check blocks in nodes of a distributed storage system in a scattered manner. The scale of the distributed storage system generally increases with the change of the use time, more and more new nodes are added into the system, ideally, the encoding parameters of the distributed storage system should be dynamically expandable according to the application requirements, and when the erasure code-based distributed storage system expands the encoding parameters, the migration and update processes of data need to transmit a large amount of data between the nodes, which may cause a large amount of network bandwidth resource consumption and affect the performance of the distributed storage system.

The patent document of Huazhong university of science and technology "a storage expansion method based on network coding" (patent application No. 201810304384.4, publication No. CN 108536396B) discloses a storage expansion method based on network coding. The idea of the method is to divide the strip before storage expansion into a plurality of expansion groups, and further divide each expansion group into PG and DG; sequentially taking data blocks from an original node circularly in a DG to obtain a series of data sets; encoding each data set by using network coding to generate update blocks, and performing local update or remote update on the coding blocks in the PG by using the update blocks; transmitting the coding blocks or the data blocks to the newly added nodes, and keeping the data blocks and the coding blocks after expansion uniformly placed on all the nodes; all data blocks and coding blocks transmitted to the new node are deleted and all coding blocks within the DG are deleted. The method utilizes the computing resources of the storage nodes to code the data blocks and locally update partial code blocks during storage expansion, so that the expansion bandwidth is reduced, but the method still has the defects that the number of the nodes in the system is increased after the storage expansion, more code blocks are needed to meet the requirement of the system on the fault-tolerant capability, the number of the code blocks in a single strip is kept unchanged during the storage expansion, and the fault-tolerant capability of the system after the expansion cannot adapt to the node scale of the system.

Paper published by aged et al "random binary spreading codes: a coding method suitable for a distributed storage system (computer science and report, 9.2017) is provided, wherein the coding method can dynamically adjust code rate and erasure correction capability. The coding matrix of the method consists of a unit matrix and a random matrix, and the high performance of the whole code word is achieved by adopting a top-down design mode and controlling the generation of each element in the random matrix. The method has the advantages that the parameters have the capability of dynamic adjustment, the row and the column of the coding matrix can be freely stretched, and further, the storage system can dynamically adjust the code rate and the erasure correction capability according to the change of application requirements.

A coding method for efficient Code conversion is proposed in the paper "conversion codes of New class of codes for influencing conversion data in Distributed Storage" (11th Innovations in the scientific Computer Conference ser Leibnizi International Proceedings in information, vol.151, pp.66:1-66:26,2020.), and the Code conversion problem is analyzed in the paper "Bandwith Code of Code conversion in Distributed Storage: functional limitations and optical configurations" (arXiv:2008.12707) published thereafter by means of network information, a conversion codes coding scheme for reducing the conversion Bandwidth is proposed. The Convertible codes effectively reduce the resource consumption of the system when the system expands from the initial code to the final code, but the method still has the defect that the method can only expand once with low bandwidth resource consumption.

Disclosure of Invention

The invention aims to provide a folding expandable distributed storage coding and repairing and expanding method aiming at overcoming the defects of the prior art, and aims to solve the problems that the fault-tolerant capability of a system after expansion cannot adapt to the scale of nodes, the repairing degree is high when a failed node is repaired, and the number of times of expansion is small.

The idea for realizing the purpose of the invention is as follows: because the coding method of the invention calculates the coding parameters of other stages in turn from the coding parameter of the 1 st stage according to the formula, the problem that the fault-tolerant capability after the system expansion cannot adapt to the node scale is solved because the number of the calculated check nodes is increased. And constructing a generating matrix corresponding to the last stage by taking a system type MDS code as a basic code, constructing generating matrix sets corresponding to other stages in reverse order according to a set folding rule, combining a coding group, and selecting a code from the coding group to encode data to be encoded. The repairing method of the invention selects the nodes with the same number as the information nodes from the non-failed nodes to download the data symbols to recover the data symbols in the failed nodes, and the selected nodes are equal to the information nodes, thereby solving the problem of high repairing degree when repairing the failed nodes. Because the extension method of the invention merges the two sub-stripes in each extension group when extending the coded data, because there are a plurality of codes in the coding group, the merging process can be carried out for a plurality of times, and the problem of few times of extension is solved.

To achieve the above object, the steps of a foldable scalable distributed storage coding method of the present invention include:

(1) setting the coding parameters of the 1 st stage:

number k of information nodes₁Number of check nodes r₁Number of verification nodes s₁Set as the encoding parameter of the 1 st stage, where k₁、r₁Is a positive integer, s₁Is a non-negative integer and is less than or equal to r₁；

(2) Calculating the encoding parameters of the next stage:

(2a) calculating the number of information nodes and the number of check nodes in the next stage according to the following formula:

k′＝2k

r′＝2r-s

wherein k 'and r' respectively represent the number of information nodes and the number of check nodes of the next stage of the current stage, and k, r and s respectively represent the number of information nodes, the number of check nodes and the number of element check nodes of the current stage;

(2b) selecting a value equal to the maximum value of the number of simultaneously failed information nodes which are expected to be repaired with low repair complexity from the value range { s, s +1, …,2r-s } as the number of meta check nodes of the next stage;

(3) judging whether the total number of the coding parameters obtained by the current iteration is equal to m, if so, executing the step (4); otherwise, executing the step (2) after taking the determined coding parameter as the coding parameter of the current stage; m represents the total number of codes in the set code group to be constructed, and the value of m is an integer greater than or equal to 2;

(4) determining the final encoding parameters:

(4a) setting the value of the number of check nodes in the coding parameter obtained by the current iteration as the number of meta check nodes in the coding parameter obtained by the current iteration;

(4b) composing the number of information nodes, the number of check nodes and the number of determined element check nodes obtained by current iteration into a final coding parameter;

(5) constructing a generating matrix corresponding to the last stage:

a systematic MDS code is used as a basic code, and a generating matrix G corresponding to the last stage is constructed by using a generating matrix constructing method of the basic code^m：

Wherein G is^mThe generator matrix corresponding to the last stage is shown,

represents a k_mIdentity matrix of order, k_mIs equal to the number of information nodes in the final coding parameter,

represents a r_mLine k_mMatrix of columns, r_mThe value of (a) is equal to the number of check nodes in the final encoding parameter;

(6) setting a folding rule of a generating matrix:

(6a) the matrix to be folded is divided into A, B, X, U, V five matrices: wherein A denotes the 1 st to l-th rows of the matrix to be folded₁A left information matrix of rows is formed,

representing the number of information nodes in the coding parameter corresponding to the previous stage of the current stage; b denotes the l-th of the matrix to be folded₂Go to₃A right information matrix of rows is formed,

x denotes the l-th of the matrix to be folded₄Go to first₅A matrix of rows is formed of a plurality of columns,

representing the number of meta-check nodes in the coding parameter corresponding to the previous stage of the current stage; u denotes the l-th of the matrix to be folded₆Go to₇A matrix of rows is formed of a plurality of columns,

representing the number of check nodes in the coding parameter corresponding to the previous stage of the current stage; v denotes the l-th of the matrix to be folded₈One right NOT composed of line to last 1 lineThe number of the element matrix is,

(6b) generating a matrix which is equal to the elements of the X row and the X column, setting the elements of the last mu non-zero columns of the matrix to zero to obtain a left element matrix X',

generating a matrix which is equal to the row elements and the column elements of the matrix X, and obtaining a right element matrix X' after all the first mu non-zero column elements of the matrix are set to zero; setting all the elements of the last mu non-zero columns of the matrix U to zero to obtain a left non-element matrix U';

(6c) according to the following formula, combining the matrix A, the matrix X ', the matrix U ', the matrix B, the matrix X ' and the matrix V respectively to construct two generation matrixes which are correlated in a generation matrix set corresponding to the previous stage of the current stage after the matrix to be folded is folded:

wherein G 'represents a left generator matrix, G' represents a right generator matrix;

(7) constructing a generating matrix set corresponding to the last stage of the last stage:

(7a) folding the generated matrix corresponding to the last stage according to the folding rule of the generated matrix,

adding the folded generation matrix into a generation matrix set corresponding to the previous stage of the current stage;

(7b) judging whether the value of m-1 is equal to 2, if so, executing the step (10), otherwise, executing the step (8) after taking the generated matrix set determined by the iteration as the generated matrix set corresponding to the current stage;

(8) constructing a generating matrix set corresponding to the previous stage:

according to the folding rule of the generated matrix, folding each generated matrix in the generated matrix set corresponding to the current stage, and adding the generated matrix obtained by folding into the generated matrix set corresponding to the previous stage of the current stage;

(9) judging whether the number of the generating matrixes in the generating matrix set obtained by the current iteration is equal to 2 or not^m-1If so, executing the step (10), otherwise, executing the step (8) after taking the generated matrix set determined this time as the generated matrix set corresponding to the current stage;

(10) determining codes of all stages:

taking the coding parameter corresponding to each stage as the coding parameter of the corresponding code; taking the generating matrix corresponding to the last stage as the generating matrix of the sub-strip of the code of the last stage; taking each generation matrix in the generation matrix set corresponding to each other stage except the last stage as the generation matrix of each sub-strip of the corresponding code;

(11) combining the codes of all the stages into a coding group;

(12) selecting a code from the coding group, wherein the sum of the number of the information nodes and the number of the check nodes in the coding parameters of the selected code is equal to the total number of the nodes expected to be adopted;

(13) encoding data to be encoded:

averagely dividing data to be coded into t information symbols, wherein t is k_m(ii) a Respectively coding data to be coded by using a generating matrix corresponding to each sub-stripe of the selected code to obtain data symbols of the sub-stripe, and forming coded data by the data symbols of all the sub-stripes; the encoded data is saved to the corresponding node.

The invention relates to a folding type expandable distributed storage coding repairing method, which comprises the following steps:

(1) abandoning and repairing the condition that the total number of the failure nodes of each coded data coded by the same code is larger than the number of the check nodes in the code coding parameter, and executing the step (2) under the other conditions;

(2) judging whether the total number of the failure information nodes of all the failure nodes is a non-0 value or not, if so, executing the step (3); otherwise, executing the step (6) after judging that the failure information node does not exist but the failure check node exists;

(3) judgment of

If yes, executing the step (4), otherwise, executing the step (7); wherein alpha represents the total number of the failure information nodes of all the failure nodes,

the number of the element check nodes in the parameter which represents that each coded data adopts the same code to code and the lambda represents the total number of the failure element check nodes in all the failure check nodes;

(4) dividing the data symbols:

downloading all information symbols stored by the information node from each non-failed information node which stores the same encoded data with the failed node, downloading all meta-check symbols stored by the meta-check node from each non-failed meta-check node which stores the same encoded data with the failed node, and randomly selecting eta non-meta-check nodes from all non-failed non-meta-check nodes which store the same encoded data with the failed node to download all non-meta-check symbols in the non-meta-check nodes; dividing the data symbols belonging to the same sub-stripe into the same symbol group, and dividing the symbol groups belonging to the same coded data into the same data set; wherein η has a value equal to

(5) Processing each data set:

(5a) numbering each symbol group in each data set according to:

wherein j is_h,cIndicating the number of the c-th symbol group in the h-th data set,

indicating a rounding-up operation, q_h,cThe sequence number of the non-zero element column in the first row element of the generating matrix of the corresponding sub-stripe of the c symbol group in the h data set is represented,

the number of information nodes in the parameter of each coded data coded by the same code is represented;

(5b) for each data set, sequentially carrying out decoding-eliminating operation on each symbol group in the data set from the symbol group with the number of 1 in the data set;

(5c) executing step (9) after all the data sets are processed;

(6) dividing information symbols:

downloading all information symbols stored by the information node from each information node storing the same encoded data with the failed node, and dividing the information symbols belonging to the same sub-stripe into the same symbol group; dividing symbol groups belonging to the same coded data into the same data set and then executing the step (9);

(7) and recovering the information symbols corresponding to the failure information nodes:

(7a) downloading all information symbols stored by the information node from each non-failed information node which stores the same encoded data with the failed node, and randomly selecting the number of meta-check nodes equal to the value of alpha from all non-failed meta-check nodes which store the same encoded data with the failed node to download all meta-check symbols stored by the meta-check node; dividing the data symbols belonging to the same sub-stripe into the same symbol group;

(7b) decoding the data symbols in each symbol group by using a decoding method corresponding to the basic code, recovering the information symbols corresponding to the failure information nodes in the symbol group, and adding the information symbols into the symbol group;

(7c) dividing symbol groups belonging to the same coded data into the same data set;

(8) judging whether all failure nodes have failure check nodes, if so, executing the step (9), otherwise, executing the step (10);

(9) encoding the information symbols:

coding all information symbols in all symbol groups in each data set by using a coding coefficient row matrix corresponding to the failure check node, recovering the check symbols corresponding to the failure check node, and adding the recovered check symbols into the symbol groups corresponding to the same sub-strip;

(10) and saving the recovered data symbols:

adding new information nodes with the number equal to the value of alpha, adding new check nodes with the number equal to the total number of the failed check nodes of all the failed nodes, storing the recovered information symbols belonging to the same information node in the same information node, and storing the recovered check symbols belonging to the same check node in the same check symbol;

(11) and replacing the failed node with the new node.

The invention relates to a folding expandable distributed storage coding expansion method, which comprises the following steps:

(1) adding a new node:

adding rho information nodes Y to the node for storing coded data except for the case that the coded data is coded by adopting the code of the last stage₁、Y₂、…、Y_ρAnd gamma check nodes F₁、F₂、…、F_γWherein, in the step (A),

information node in coding parameters of a code representing a stage next to a corresponding stage of a code currently used for coding dataThe number of the first and second groups is,

representing the number of information nodes in the coding parameters of the code currently used for coding the data,

indicating the number of check nodes in the encoding parameters of the code of the next stage corresponding to the stage currently adopted for encoding data,

the number of check nodes in the coding parameters of the code currently adopted by the coded data is represented;

(2) dividing two sub-stripes which are mutually related to a generating matrix in the coded data into an expansion group;

(3) merging two sub-stripes within each extension group:

(3a) downloading and caching all information symbols of the sub-strips corresponding to the right generating matrix in each extended group from all information nodes for storing coded data, and respectively transferring the information symbols at different positions of the sub-strips in the information symbols to an information node Y₁、Y₂、…、Y_ρIn different information nodes, combining the information symbols after migration and the information symbols not after migration into the information symbols of the merged sub-strips;

(3b) adding two element check symbols positioned in the same element check node in two sub-strips in each extended group in the element check node to obtain updated element check symbols;

(3c) coding the cached information symbols by using a complementary matrix of a left generator matrix corresponding to the sub-strip of each extended group to obtain correction symbols, and updating the non-meta-check symbols of the sub-strip corresponding to the left generator matrix by using the correction symbols;

(3d) non-meta check symbols at different positions in the sub-stripe corresponding to the right generator matrix in each extended groupNumber is respectively migrated to check node F₁、F₂、…、F_γIn different check nodes;

(3e) combining the updated meta-check symbol, the updated non-meta-check symbol and the migrated check symbol into a check symbol of the merged sub-stripe;

(4) and combining the data symbols of all the merged sub-stripes into expanded coded data.

Compared with the prior art, the invention has the following advantages:

firstly, the number of check nodes in the next stage calculated when the coding parameter in the next stage is calculated in the coding method of the present invention is not less than the number of check nodes in the current stage, and the problems that the number of coding blocks in a single stripe remains unchanged during storage expansion and the fault-tolerant capability after system expansion cannot adapt to the node scale of the system in the prior art are solved, so that the number of check symbols in a single sub-stripe can be simultaneously increased when a code word constructed by using the coding method of the present invention has a code in the next stage of construction, and the fault-tolerant capability after system expansion can adapt to the node scale of the system.

Secondly, in the repairing method, when the data symbols are downloaded from the selected nodes of the non-failed nodes, only the nodes with the same number as the number of the information nodes need to be selected, and the problems that the number of the nodes needing to be connected is larger than the number of the information nodes and the repairing degree is high when the failed nodes are repaired in the prior art are solved, so that the repairing method has the advantages that the number of the nodes needing to be connected is equal to the number of the information nodes and the repairing degree is low when the failed nodes are repaired.

Thirdly, the extension method of the present invention can realize extension of the encoded data by merging two sub-stripes in each extension group, and since there are multiple codes in the encoding group, such extension process can be performed many times, which overcomes the problem that the prior art can only extend once with low bandwidth resource consumption, so that the extension method of the present invention has the advantage that the extension can be performed many times with low bandwidth resource consumption.

Drawings

FIG. 1 is a flow chart of the foldable scalable distributed storage coding of the present invention;

FIG. 2 is a diagram illustrating encoding of data to be encoded according to an embodiment of the present invention;

FIG. 3 is a flow diagram of a folded extensible distributed storage repair of the present invention;

FIG. 4 is a flow chart of the foldable extensible distributed storage extension of the present invention;

fig. 5 is a schematic diagram of expanding encoded data in the embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings and examples.

The implementation steps of the folding scalable distributed storage coding method of the present invention are further described with reference to fig. 1.

Step 1, setting the coding parameters of the 1 st stage.

Number k of information nodes₁Number of check nodes r₁Number of verification nodes s₁Set as the encoding parameter of the 1 st stage, where k₁、r₁Is a positive integer, s₁Is a non-negative integer and is less than or equal to r₁。

In the embodiment of the present invention, the number of information nodes, the number of check nodes, and the number of meta check nodes of the encoding parameter at the 1 st stage are set to 4, 3, and 1, respectively.

And 2, calculating the coding parameters of the next stage.

Calculating the number of information nodes and the number of check nodes in the next stage according to the following formula:

k′＝2k

r′＝2r-s

a value equal to the maximum value of the number of information nodes which are expected to be repaired with low repair complexity and fail at the same time is selected from the value range { s, s +1, …,2r-s } as the number of meta-check nodes of the next stage.

Step 3, judging whether the total number of the coding parameters obtained by the current iteration is equal to m, if so, executing step 4; otherwise, the step 2 is executed after the determined coding parameter is taken as the coding parameter of the current stage; m represents the total number of codes in the set code group to be constructed, and the value of m is an integer greater than or equal to 2.

And 4, determining final encoding parameters.

Step 1, setting the value of the number of check nodes in the coding parameter obtained by current iteration as the number of meta check nodes in the coding parameter obtained by current iteration;

and step 2, composing the number of the information nodes, the number of the check nodes and the number of the determined element check nodes obtained by current iteration into a final coding parameter.

In the embodiment of the present invention, the total number of codes in the coding group is set to be 3, the number of information nodes, the number of check nodes, and the number of meta check nodes in the coding parameters of the 2 nd stage code can be obtained by calculation and selection to be 8, 5, and 2, respectively, and the number of information nodes, the number of check nodes, and the number of meta check nodes in the coding parameters of the 3 rd stage code, that is, the number of information nodes, the number of check nodes, and the number of meta check nodes in the final coding parameters, are 16, 8, and 8, respectively.

And 5, constructing a generating matrix corresponding to the last stage.

Wherein G is^mThe generator matrix corresponding to the last stage is shown,

represents a r_mLine k_mMatrix of columns, r_mIs equal to the number of check nodes in the final encoding parameter.

In the embodiment of the invention, a system type (24,16) RS is used as a basic code to construct a generating matrix

And 6, setting a folding rule of the generated matrix.

In step 1, the matrix to be folded is divided into A, B, X, U, V five matrices: wherein A denotes the 1 st to l-th rows of the matrix to be folded₁A left information matrix of rows is formed,

representing the number of check nodes in the coding parameter corresponding to the previous stage of the current stage; v denotes the l-th of the matrix to be folded₈A right non-element matrix composed of rows to the last 1,

step 2, generating a matrix equal to the row elements and the column elements of the matrix X, setting all the elements of the last mu non-zero columns of the matrix to zero to obtain a left element matrix X',

and 3, respectively combining the matrix A, the matrix X ', the matrix U ', the matrix B, the matrix X ' and the matrix V according to the following formula to construct two generation matrixes which are related to each other and in a generation matrix set corresponding to the previous stage of the current stage after the matrix to be folded is folded:

wherein G 'represents a left generator matrix and G' represents a right generator matrix.

And 7, constructing a generating matrix set corresponding to the last stage of the last stage.

Step 1, folding the generated matrix corresponding to the last stage according to the folding rule of the generated matrix,

adding the generated matrix obtained by folding into a generated matrix set corresponding to the previous stage of the current stage;

and 2, judging whether the value of m-1 is equal to 2, if so, executing the step 10, otherwise, executing the step 8 after taking the generated matrix set determined this time as the generated matrix set corresponding to the current stage.

And 8, constructing a generating matrix set corresponding to the previous stage.

And according to the folding rule of the generated matrix, folding each generated matrix in the generated matrix set corresponding to the current stage, and adding the generated matrix obtained by folding into the generated matrix set corresponding to the previous stage of the current stage.

And 9, judging whether the total number of the generated matrix sets obtained currently is equal to m-1, if so, executing the step 10, and otherwise, executing the step 8 after taking the generated matrix set determined this time as the generated matrix set corresponding to the current stage.

In the embodiment of the present invention, the matrix P is described for convenience of description_8×16The row matrixes of the coding coefficients corresponding to the 1 st to 8 th rows are respectively expressed as p¹、p²、…、p⁸By means of symbols

Denotes the reservation p^qThe u-th to v-th elements are set to zero to obtain a coding coefficient row matrix, q is more than or equal to 1 and less than or equal to 8, u is more than or equal to 1 and less than or equal to v and less than or equal to 16, and the symbol E is used_u′×u′An identity matrix representing one u ' row and u ' column, 1 ≦ u ' ≦ 16, denoted by the symbol O_u″_×v"denotes an all-zero matrix of u" rows and v "columns, 1. ltoreq. u" ≦ 16, 1. ltoreq. v "≦ 16. Firstly, the generator matrix G corresponding to the last stage is used³The division into five matrices: left information matrix A³＝[E_8×8|O_8×8]Right information matrix B³＝[O_8×8|E_8×8]Matrix, matrix

Matrix array

Right non-element matrix

Generating two AND matrices X³The same matrix is set to zero in the last 8 non-zero columns of one matrix to obtain the element matrix

Setting all the elements of the first 8 non-zero columns of another matrix to zero to obtain a constructed right element matrix

Will matrix U³All the elements of the last 8 non-zero columns are set to zero to obtain a left non-element matrix

Are respectively aligned with the matrix A³Matrix, matrix

Matrix array

Matrix B³Matrix, matrix

Matrix V³Combining to construct a pair matrix G³Two correlative generating matrixes in the generating matrix set corresponding to the previous stage of the current stage obtained after folding

Namely:

G^2,1、G^2,1i.e. 2 generator matrices in the set of generator matrices in stage 2.

The same method is adopted for G^2,1、G^2,2Folding to obtain two generation matrixes G^1,1、G^1,2，G^1,3、G^1,4：

G^1,1、G^1,2，G^1,3、G^1,4I.e. 4 of the set of generation matrices of stage 1.

And step 10, determining codes of all stages.

Taking the coding parameter corresponding to each stage as the coding parameter of the corresponding code; taking the generating matrix corresponding to the last stage as the generating matrix of the sub-strip of the code of the last stage; and taking each generation matrix in the generation matrix set corresponding to each other stage except the last stage as the generation matrix of each sub-strip of the corresponding code.

And step 11, combining the codes of all the stages into one coding group.

And step 12, selecting a code from the coding group, wherein the sum of the number of the information nodes and the number of the check nodes in the coding parameters of the selected code is equal to the total number of the expected nodes.

In the embodiment of the present invention, the total number of nodes to be used is 7, so the code of the 1 st stage is selected from the code group.

And step 13, encoding the data to be encoded.

The data symbols of the sub-strips comprise information symbols and check symbols, the check symbols comprise meta check symbols and non-meta check symbols, and the data symbols are obtained by encoding data to be encoded through encoding coefficient row matrixes corresponding to all rows in a generating matrix; the information symbol is a data symbol obtained by coding data to be coded by a left information matrix in the left generating matrix or a right information matrix in the right generating matrix; the element check symbol is a data symbol obtained by encoding data to be encoded by a left element matrix in the left generating matrix or a right element matrix in the right generating matrix; the non-element check symbol is a data symbol obtained by encoding data to be encoded by a left non-element matrix in the left generating matrix or a right non-element matrix in the right generating matrix.

The coded data is stored in the corresponding node, that is, the data symbols at different positions in each sub-stripe are stored in different nodes, and the data symbols at the same positions in different sub-stripes are stored in the same node; the nodes comprise two categories of information nodes and check nodes, and the check nodes comprise two subclasses of meta check nodes and non-meta check nodes; the information node is used for storing information symbols; the meta-check node is used for storing a meta-check symbol; the non-meta check node is used for storing a non-meta check symbol.

Referring to fig. 2, implementation steps for encoding data to be encoded in the embodiment of the present invention are further described.

D in FIG. 2₁、D₂、D₃、D₄Representing 4 information nodes, C₁、C₂、C₃Represents 3 check nodes, a₁、a₂、…、a₁₆Representing 16 information symbols, M representing the data to be encoded, M ═ a₁,a₂,…,a₁₆]^TAnd T denotes a transposition operation. Generation matrix G corresponding to 4 sub-stripes of the 1 st stage code^1,1、G^1,2、G^1,3、G^1,4Respectively encoding the data M to be encoded to obtain the data of each sub-strip as follows:

the 1 st sub-stripe comprises data symbols a₁、a₂、a₃、a₄、

The 2 nd sub-stripe comprises data symbols a₅、a₆、a₇、a₈、

The 3 rd sub-stripe comprises data symbols a₉、a₁₀、a₁₁、a₁₂、

The 4 th sub-stripe includes data symbol a₁₃、a₁₄、a₁₅、a₁₆、

p⁷M、p⁸M。

The data of the above 4 sub-stripes together constitute one encoded data. Sequentially storing the 1 st to 4 th information symbols in the 1 st sub-strip in an information node D₁、D₂、D₃、D₄In the method, the 1 st to 3 rd check symbols are sequentially stored in a check node C₁、C₂、C₃Performing the following steps; the data symbols of the other sub-stripes are stored in the same way as sub-stripe 1.

The implementation steps of the folding scalable distributed storage coding repair method of the present invention are further described with reference to fig. 3.

Step 1, abandoning and repairing the condition that the total number of the invalid nodes of each coded data coded by the same code is larger than the number of the check nodes in the code coding parameter, and executing step 2 under the other conditions.

In the embodiment of the present invention, it is assumed that 2 nodes out of all the nodes in fig. 2 fail.

Step 2, judging whether the total number of the failure information nodes of all the failure nodes is a non-0 value, if so, executing the step 3; otherwise, executing step 6 after judging that the failure information node does not exist but the failure check node exists.

Step 3, judging

If yes, executing the step 4, otherwise, executing the step 7; wherein alpha represents the total number of the failure information nodes of all the failure nodes,

the number of the element check nodes in the parameter which represents that each coding data adopts the same code to code and the lambda represents the total number of the failure element check nodes in all the failure check nodes.

In the embodiment of the present invention, assume that 2 failed nodes in fig. 2 are D₃、D₄That is, the total number of the failed information nodes is 2, and the number of the failed element check nodes is 0.

And 4, dividing the data symbols.

Downloading all information symbols stored by an information node from each non-invalid information node which stores the same encoding data with a invalid node, downloading all meta-check symbols stored by a meta-check node from each non-invalid meta-check node which stores the same encoding data with the invalid node, and randomly selecting eta non-meta-check nodes from all non-invalid non-meta-check nodes which store the same encoding data with the invalid node to download all non-meta-check symbols in the non-meta-check nodes; number to be of the same sub-stripeDividing the symbols into the same symbol group, and dividing the symbol groups belonging to the same coded data into the same data set; wherein η has a value equal to

In the embodiment of the present invention, node D shown in FIG. 2 is selected from₁、D₂In which all information symbols are downloaded, from the meta check node C₁Mid-download check symbols

Selecting a non-meta check node C₂And from C₂Mid-download check symbols

The downloaded data symbols are divided into 4 symbol groups: the symbol group corresponding to the 1 st sub-strip is

The symbol group corresponding to the 2 nd sub-band is

The symbol group corresponding to the 3 rd sub-strip is

The symbol group corresponding to the 4 th sub-strip is

And 5, processing each data set.

Step 1, numbering each symbol group in each data set according to the following formula:

representing an up forensic operation, q_h,cThe sequence number of the non-zero element column in the first row element of the generating matrix of the corresponding sub-stripe of the c symbol group in the h data set is represented,

and step 2, carrying out decoding-eliminating operation on each symbol group in the data set in turn from the symbol group with the number of 1 in the data set for each data set.

The decoding-eliminating operation is to perform decoding operation first and then eliminate operation, the decoding operation refers to decoding the data symbols in the current symbol group in the data set by using a decoding method corresponding to the basic code, recovering the information symbols corresponding to the failure information nodes, and adding the information symbols into the current symbol group in the data set; the elimination operation refers to utilizing the information symbols in the current symbol group to eliminate the check symbols in each symbol group with the serial number larger than that of the current symbol group in the data set; the elimination means that if the value of an element corresponding to an information symbol in a coding coefficient row matrix of the check symbol is a non-zero value, the information symbol is multiplied by the corresponding non-zero value and then subtracted from the check symbol to obtain the eliminated check symbol.

In the embodiment of the present invention, symbol sets corresponding to the 1 st to 4 th sub-bands shown in fig. 2 are numbered as 1, 2, 3, and 4 in sequence. For symbol group with number 1

RS decoding is carried out to recover the information symbol a₃,a₄(ii) a Then, the information symbol a in the symbol group numbered 1 is used₁,a₂,a₃,a₄The check symbols in each symbol group numbered 2, 3, 4 are eliminated: in the symbol group numbered 2, the symbols are checked

Of the row matrix of coding coefficients

It can be known that

Neutralizing information symbol a₁,a₂,a₃,a₄The value of the corresponding element is non-zero, so

Minus

Obtaining a cancelled check symbol

The symbol group numbered 2 is updated to

After the check symbols in the symbol groups numbered 3 and 4 are eliminated by the same method, the symbol group numbered 3 is kept unchanged, and the symbol group numbered 4 is updated to

The information symbol a can be recovered in sequence by the same method as the above process₇、a₈，a₁₁、a₁₂，a₁₅、a₁₆。

And 3, executing the step 9 after all the data sets are processed.

And 6, dividing the information symbols.

Downloading all information symbols stored by the information node from each information node storing the same encoded data with the failed node, and dividing the information symbols belonging to the same sub-stripe into the same symbol group; and step 9 is executed after dividing symbol groups belonging to the same coded data into the same data set.

And 7, recovering the information symbol corresponding to the failure information node.

Step 1, downloading all information symbols stored by an information node from each non-invalid information node storing the same encoded data with a invalid node, and randomly selecting the number of meta-check nodes equal to the value of alpha from all non-invalid meta-check nodes storing the same encoded data with the invalid node to download all the meta-check symbols stored by the meta-check nodes; dividing the data symbols belonging to the same sub-stripe into the same symbol group;

step 2, decoding the data symbols in each symbol group by using a decoding method corresponding to the basic code, recovering the information symbols corresponding to the failure information nodes in the symbol group, and adding the information symbols into the symbol group;

and step 3, dividing symbol groups belonging to the same coded data into the same data set.

And 8, judging whether the failure check nodes exist in all the failure nodes, if so, executing the step 9, otherwise, executing the step 10.

And 9, coding the information symbols.

And coding all information symbols in all symbol groups in each data set by using a coding coefficient row matrix corresponding to the failure check node, recovering the check symbols corresponding to the failure check node, and adding the recovered check symbols into the symbol groups corresponding to the same sub-strip.

And step 10, storing the recovered data symbols.

And adding new information nodes with the number equal to the value of alpha, adding new check nodes with the number equal to the total number of the failed check nodes of all the failed nodes, storing the recovered information symbols belonging to the same information node in the same information node, and storing the recovered check symbols belonging to the same check node in the same check symbol.

In the embodiment of the invention, the recovered information symbol a₃、a₇、a₁₁、a₁₅Save to a new node D₃', information symbol a to be recovered₄、a₈、a₁₂、a₁₆Saved in another new node D₄′。

And 11, replacing the failed node with the new node.

In the embodiment of the invention, node D is used₃′、D₄' Replacing failed node D₃、D₄。

The implementation steps of the folding scalable distributed storage coding expansion method of the present invention are further described with reference to fig. 4.

And step 1, adding a new node.

indicating the number of information nodes in the coding parameters of the code of the next stage of the corresponding stage of the code currently used for coding data,

indicating the number of check nodes in the encoding parameters of the code currently used to encode the data.

The implementation steps of the encoded data extension in the embodiment of the present invention are further described with reference to fig. 5.

In the embodiment of the present invention, the encoded data shown in fig. 2 is expanded, and fig. 5(a) shows a schematic diagram of encoded data storage after adding a node on the basis of fig. 2, where D₅、D₆、D₇、D₈Respectively, 4 information nodes newly added on the basis of FIG. 2, C₄、C₅Respectively, 2 check nodes newly added on the basis of fig. 2. Fig. 5(b) shows a schematic diagram of encoded data storage after expansion of encoded data.

And 2, dividing two sub-strips, which are obtained by folding the same matrix, of a generated matrix in the coded data into an expansion group.

And 3, combining the two sub-stripes in each expansion group.

Step 1, downloading and caching all information symbols of the sub-strips corresponding to the right generating matrix in each extended group from all information nodes for storing coded data, and respectively transferring the information symbols at different positions of the sub-strips in the information symbols to an information node Y₁、Y₂、…、Y_ρIn different information nodes, combining the information symbols after migration and the information symbols not after migration into the information symbols of the merged sub-strips;

step 2, adding two element check symbols positioned in the same element check node in two sub-strips in each extension group in the element check node to obtain updated element check symbols;

and 3, coding the cached information symbols by using the complementary matrix of the left generating matrix corresponding to the sub-strip of each extended group to obtain correction symbols, and updating the non-meta-check symbols of the sub-strip corresponding to the left generating matrix by using the correction symbols.

The complementary matrix of the left generator matrix corresponding to the sub-strip in each extended group means that the complementary matrix is to be paired with the sub-strip in the extended groupThe folded matrix corresponding to the corresponding left generator matrix is represented as a matrix

Representing a left non-element matrix in a left generating matrix corresponding to the sub-strip in the extended group as a matrix W, and using the matrix to generate a left non-element matrix

L.1₆' go to l₇The elements of a row form a matrix

The number of meta-check nodes in the coding parameters representing the code currently used for coding the data will be

The obtained matrix is expressed as a matrix T, and all non-zero columns in the matrix T form a complementary matrix of a left generation matrix corresponding to the sub-band in the extended group.

Step 4, respectively transferring the non-meta-check symbols at different positions in the sub-strips corresponding to the right generator matrix in each extension group to a check node F₁、F₂、…、F_γIn different check nodes;

and 5, combining the updated meta-check symbol, the updated non-meta-check symbol and the migrated check symbol into a check symbol of the merged sub-stripe.

In the embodiment of the present invention, the generation matrix G of the 1 st subband shown in fig. 5(a) is used^1,1And the generation matrix G of the 2 nd sub-stripe^1,2Are related to each other, so that the 1 st sub-band and the 2 nd sub-band are divided into an extended group, and similarly, the 3 rd sub-band and the 4 th sub-band are divided into an extended group. For the 1 st and 2 nd sub-stripesAn extension group is formed, and all information symbols a in the 2 nd sub-strip are downloaded and cached₅、a₆、a₇、a₈Sequentially migrating the information symbols to a node D₅、D₆、D₇、D₈Post-migration symbol and non-migration information symbol a₁、a₂、a₃、a₄The information symbols of the merged sub-strip are composed. At node C₁Internally, by merging

And

obtaining updated meta-check symbols

Namely, it is

Left generator matrix G^1,1Corresponding complementary matrix is

By using complementary matrices to buffer information symbols a₅、a₆、a₇、a₈Coding to obtain a corrected symbol

Using the obtained two correction symbols to respectively pair

Is eliminated, i.e. from

Minus

From

Minus

Obtaining two updated check symbols

Non-meta-check symbols in the 2 nd sub-stripe

Respectively migrate to node C₄、C₅(ii) a And combining the updated meta-check symbol, the updated non-meta-check symbol and the migrated check symbol into all check symbols of the merged sub-stripe. The left sub-band in fig. 5(b) is the result obtained by combining the 1 st and 2 nd sub-bands. And combining the 3 rd sub-band and the 4 th sub-band in the same way to obtain another new sub-band. The right sub-band in fig. 5(b) is the result obtained by combining the 3 rd and 4 th sub-bands.

And 4, combining the data symbols of all the merged sub-stripes into expanded coded data.

In the embodiment of the present invention, all the data symbols of the two new sub-stripes shown in fig. 5(b) constitute the extended encoded data.

Claims

1. A folding expandable distributed storage coding method is characterized in that coding parameters of the next stage are calculated, a generation matrix set corresponding to the previous stage is constructed according to a folding rule of the generation matrix, codes corresponding to each stage are determined, and codes corresponding to all stages are combined into a coding group; the method comprises the following steps:

(1) setting the coding parameters of the 1 st stage:

number k of information nodes₁Number of check nodes r₁Number of verification nodes s₁Is set to the 1 st stepCoding parameters of the segments, wherein k₁、r₁Is a positive integer, s₁Is a non-negative integer and is less than or equal to r₁；

(2) Calculating the encoding parameters of the next stage:

k′＝2k

r′＝2r-s

(4) determining the final encoding parameters:

(5) constructing a generating matrix corresponding to the last stage:

Wherein G is^mThe generator matrix corresponding to the last stage is shown,

(6) setting a folding rule of a generating matrix:

representing the number of meta-check nodes in the coding parameter corresponding to the previous stage of the current stage; u denotes the l-th of the matrix to be folded₆Go to₇Line groupThe matrix is formed by the following steps of,

(7a) folding the generated matrix corresponding to the last stage according to the folding rule of the generated matrix, and adding the folded generated matrix into a generated matrix set corresponding to the previous stage of the current stage;

(8) constructing a generating matrix set corresponding to the previous stage:

(10) determining codes of all stages:

(11) combining the codes of all the stages into a coding group;

(13) encoding data to be encoded:

averagely dividing data to be encoded intot information symbols, t ═ k_m(ii) a Respectively coding data to be coded by using a generating matrix corresponding to each sub-stripe of the selected code to obtain data symbols of the sub-stripe, and forming coded data by the data symbols of all the sub-stripes; the encoded data is saved to the corresponding node.

2. The method according to claim 1, wherein the data symbols of the sub-stripes in step (13) include information symbols and check symbols, the check symbols include meta check symbols and non-meta check symbols, and the data symbols are obtained by encoding data to be encoded through a row matrix of encoding coefficients corresponding to each row in a generator matrix; the information symbol is a data symbol obtained by coding data to be coded by a left information matrix in the left generating matrix or a right information matrix in the right generating matrix; the element check symbol is a data symbol obtained by encoding data to be encoded by a left element matrix in the left generating matrix or a right element matrix in the right generating matrix; the non-element check symbol is a data symbol obtained by encoding data to be encoded by a left non-element matrix in the left generating matrix or a right non-element matrix in the right generating matrix.

3. The method according to claim 1, wherein the step (13) of storing the encoded data in the corresponding node means that the data symbols at different positions in each sub-stripe are stored in different nodes, and the data symbols at the same positions in different sub-stripes are stored in the same node; the nodes comprise two categories of information nodes and check nodes, and the check nodes comprise two subclasses of meta check nodes and non-meta check nodes; the information node is used for storing information symbols; the meta-check node is used for storing a meta-check symbol; the non-meta check node is used for storing a non-meta check symbol.

4. A foldable scalable distributed storage coding repair method for foldable scalable distributed storage coding according to claim 1, characterized in that each data set is processed, information symbols are coded, and recovered data symbols are saved; the method comprises the following steps:

(3) judgment of

(4) dividing the data symbols:

(5) Processing each data set:

(5a) numbering each symbol group in each data set according to:

(5c) executing step (9) after all the data sets are processed;

(6) dividing information symbols:

(9) encoding the information symbols:

(10) and saving the recovered data symbols:

(11) and replacing the failed node with the new node.

5. The method according to claim 4, wherein the decoding-removing operation in step (5b) is a decoding operation followed by a removing operation, and the decoding operation refers to decoding the data symbols in the current symbol group in the data set by using a decoding method corresponding to the basic code, recovering the information symbols corresponding to the failed information nodes, and adding the information symbols into the current symbol group in the data set; the elimination operation refers to utilizing the information symbols in the current symbol group to eliminate the check symbols in each symbol group with the serial number larger than that of the current symbol group in the data set; the elimination means that if the value of an element corresponding to an information symbol in a coding coefficient row matrix of the check symbol is a non-zero value, the information symbol is multiplied by the corresponding non-zero value and then subtracted from the check symbol to obtain the eliminated check symbol.

6. A method for expanding foldable scalable distributed storage coding according to claim 1, wherein two sub-stripes of the encoded data, which are obtained by folding the same matrix into a matrix, are divided into an expanded group, two sub-stripes in the same expanded group are combined, and data symbols of all new sub-stripes are combined into new expanded encoded data, and the method comprises the following steps:

(1) adding a new node:

(3) merging two sub-stripes within each extension group:

(3d) respectively transferring the non-meta-check symbols at different positions in the sub-strips corresponding to the right generator matrix in each extended group to a check node F₁、F₂、…、F_γIn different check nodes;

7. The method according to claim 6, wherein the complementary matrix of the left generator matrix corresponding to the sub-strip in each extended group in step (3c) represents the folded matrix corresponding to the left generator matrix corresponding to the sub-strip in the extended group as a matrix

L.1₆' go to l₇The elements of a row form a matrix