CN113391948A - Folding type extensible distributed storage coding and repairing and expanding method - Google Patents

Folding type extensible distributed storage coding and repairing and expanding method Download PDF

Info

Publication number
CN113391948A
CN113391948A CN202110726617.1A CN202110726617A CN113391948A CN 113391948 A CN113391948 A CN 113391948A CN 202110726617 A CN202110726617 A CN 202110726617A CN 113391948 A CN113391948 A CN 113391948A
Authority
CN
China
Prior art keywords
matrix
check
nodes
information
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110726617.1A
Other languages
Chinese (zh)
Other versions
CN113391948B (en
Inventor
孙蓉
杜从军
刘景伟
裴庆祺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110726617.1A priority Critical patent/CN113391948B/en
Publication of CN113391948A publication Critical patent/CN113391948A/en
Application granted granted Critical
Publication of CN113391948B publication Critical patent/CN113391948B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

The invention discloses a folding type expandable distributed storage coding and repairing and expanding method, which comprises the following steps: after the coding parameters of each node are determined, sequentially constructing a generating matrix set corresponding to other stages from the generating matrix of the last stage, combining a coding group and selecting a code from the coding group to code data to be coded; when the node fails, selecting the nodes which are not failed and have the same number as the information nodes, and downloading the data symbols from the nodes to recover the data symbols in the failed nodes; the encoded data is expanded by merging the two sub-stripes within each expanded group. The invention has the advantages of improving the fault-tolerant capability of the expanded system, having MDS property, low expansion bandwidth, being capable of expanding for many times and the like, and can be used for coding, repairing and expanding the distributed storage system with the node calculation capability.

Description

Folding type extensible distributed storage coding and repairing and expanding method
Technical Field
The invention belongs to the technical field of computers, and further relates to a folding type extensible distributed storage coding and repairing and expanding method in the technical field of distributed storage. The invention can be used for coding, repairing and expanding the distributed storage system with the node having the computing capability.
Background
Due to the fact that the distributed storage system is large in data storage amount and frequent in node failure events, the distributed storage system needs to improve the reliability of the system by storing redundant data. The erasure code technology is a typical data redundancy mechanism, and achieves the purpose of fault tolerance by dividing original data into information blocks, then coding the information blocks to generate check blocks, and storing the information blocks and the check blocks in nodes of a distributed storage system in a scattered manner. The scale of the distributed storage system generally increases with the change of the use time, more and more new nodes are added into the system, ideally, the encoding parameters of the distributed storage system should be dynamically expandable according to the application requirements, and when the erasure code-based distributed storage system expands the encoding parameters, the migration and update processes of data need to transmit a large amount of data between the nodes, which may cause a large amount of network bandwidth resource consumption and affect the performance of the distributed storage system.
The patent document of Huazhong university of science and technology "a storage expansion method based on network coding" (patent application No. 201810304384.4, publication No. CN 108536396B) discloses a storage expansion method based on network coding. The idea of the method is to divide the strip before storage expansion into a plurality of expansion groups, and further divide each expansion group into PG and DG; sequentially taking data blocks from an original node circularly in a DG to obtain a series of data sets; encoding each data set by using network coding to generate update blocks, and performing local update or remote update on the coding blocks in the PG by using the update blocks; transmitting the coding blocks or the data blocks to the newly added nodes, and keeping the data blocks and the coding blocks after expansion uniformly placed on all the nodes; all data blocks and coding blocks transmitted to the new node are deleted and all coding blocks within the DG are deleted. The method utilizes the computing resources of the storage nodes to code the data blocks and locally update partial code blocks during storage expansion, so that the expansion bandwidth is reduced, but the method still has the defects that the number of the nodes in the system is increased after the storage expansion, more code blocks are needed to meet the requirement of the system on the fault-tolerant capability, the number of the code blocks in a single strip is kept unchanged during the storage expansion, and the fault-tolerant capability of the system after the expansion cannot adapt to the node scale of the system.
Paper published by aged et al "random binary spreading codes: a coding method suitable for a distributed storage system (computer science and report, 9.2017) is provided, wherein the coding method can dynamically adjust code rate and erasure correction capability. The coding matrix of the method consists of a unit matrix and a random matrix, and the high performance of the whole code word is achieved by adopting a top-down design mode and controlling the generation of each element in the random matrix. The method has the advantages that the parameters have the capability of dynamic adjustment, the row and the column of the coding matrix can be freely stretched, and further, the storage system can dynamically adjust the code rate and the erasure correction capability according to the change of application requirements.
A coding method for efficient Code conversion is proposed in the paper "conversion codes of New class of codes for influencing conversion data in Distributed Storage" (11th Innovations in the scientific Computer Conference ser Leibnizi International Proceedings in information, vol.151, pp.66:1-66:26,2020.), and the Code conversion problem is analyzed in the paper "Bandwith Code of Code conversion in Distributed Storage: functional limitations and optical configurations" (arXiv:2008.12707) published thereafter by means of network information, a conversion codes coding scheme for reducing the conversion Bandwidth is proposed. The Convertible codes effectively reduce the resource consumption of the system when the system expands from the initial code to the final code, but the method still has the defect that the method can only expand once with low bandwidth resource consumption.
Disclosure of Invention
The invention aims to provide a folding expandable distributed storage coding and repairing and expanding method aiming at overcoming the defects of the prior art, and aims to solve the problems that the fault-tolerant capability of a system after expansion cannot adapt to the scale of nodes, the repairing degree is high when a failed node is repaired, and the number of times of expansion is small.
The idea for realizing the purpose of the invention is as follows: because the coding method of the invention calculates the coding parameters of other stages in turn from the coding parameter of the 1 st stage according to the formula, the problem that the fault-tolerant capability after the system expansion cannot adapt to the node scale is solved because the number of the calculated check nodes is increased. And constructing a generating matrix corresponding to the last stage by taking a system type MDS code as a basic code, constructing generating matrix sets corresponding to other stages in reverse order according to a set folding rule, combining a coding group, and selecting a code from the coding group to encode data to be encoded. The repairing method of the invention selects the nodes with the same number as the information nodes from the non-failed nodes to download the data symbols to recover the data symbols in the failed nodes, and the selected nodes are equal to the information nodes, thereby solving the problem of high repairing degree when repairing the failed nodes. Because the extension method of the invention merges the two sub-stripes in each extension group when extending the coded data, because there are a plurality of codes in the coding group, the merging process can be carried out for a plurality of times, and the problem of few times of extension is solved.
To achieve the above object, the steps of a foldable scalable distributed storage coding method of the present invention include:
(1) setting the coding parameters of the 1 st stage:
number k of information nodes1Number of check nodes r1Number of verification nodes s1Set as the encoding parameter of the 1 st stage, where k1、r1Is a positive integer, s1Is a non-negative integer and is less than or equal to r1
(2) Calculating the encoding parameters of the next stage:
(2a) calculating the number of information nodes and the number of check nodes in the next stage according to the following formula:
k′=2k
r′=2r-s
wherein k 'and r' respectively represent the number of information nodes and the number of check nodes of the next stage of the current stage, and k, r and s respectively represent the number of information nodes, the number of check nodes and the number of element check nodes of the current stage;
(2b) selecting a value equal to the maximum value of the number of simultaneously failed information nodes which are expected to be repaired with low repair complexity from the value range { s, s +1, …,2r-s } as the number of meta check nodes of the next stage;
(3) judging whether the total number of the coding parameters obtained by the current iteration is equal to m, if so, executing the step (4); otherwise, executing the step (2) after taking the determined coding parameter as the coding parameter of the current stage; m represents the total number of codes in the set code group to be constructed, and the value of m is an integer greater than or equal to 2;
(4) determining the final encoding parameters:
(4a) setting the value of the number of check nodes in the coding parameter obtained by the current iteration as the number of meta check nodes in the coding parameter obtained by the current iteration;
(4b) composing the number of information nodes, the number of check nodes and the number of determined element check nodes obtained by current iteration into a final coding parameter;
(5) constructing a generating matrix corresponding to the last stage:
a systematic MDS code is used as a basic code, and a generating matrix G corresponding to the last stage is constructed by using a generating matrix constructing method of the basic codem
Figure BDA0003138919880000041
Wherein G ismThe generator matrix corresponding to the last stage is shown,
Figure BDA0003138919880000042
represents a kmIdentity matrix of order, kmIs equal to the number of information nodes in the final coding parameter,
Figure BDA0003138919880000043
represents a rmLine kmMatrix of columns, rmThe value of (a) is equal to the number of check nodes in the final encoding parameter;
(6) setting a folding rule of a generating matrix:
(6a) the matrix to be folded is divided into A, B, X, U, V five matrices: wherein A denotes the 1 st to l-th rows of the matrix to be folded1A left information matrix of rows is formed,
Figure BDA0003138919880000044
Figure BDA0003138919880000045
representing the number of information nodes in the coding parameter corresponding to the previous stage of the current stage; b denotes the l-th of the matrix to be folded2Go to3A right information matrix of rows is formed,
Figure BDA0003138919880000046
x denotes the l-th of the matrix to be folded4Go to first5A matrix of rows is formed of a plurality of columns,
Figure BDA0003138919880000047
Figure BDA0003138919880000048
representing the number of meta-check nodes in the coding parameter corresponding to the previous stage of the current stage; u denotes the l-th of the matrix to be folded6Go to7A matrix of rows is formed of a plurality of columns,
Figure BDA0003138919880000049
Figure BDA00031389198800000410
representing the number of check nodes in the coding parameter corresponding to the previous stage of the current stage; v denotes the l-th of the matrix to be folded8One right NOT composed of line to last 1 lineThe number of the element matrix is,
Figure BDA00031389198800000411
(6b) generating a matrix which is equal to the elements of the X row and the X column, setting the elements of the last mu non-zero columns of the matrix to zero to obtain a left element matrix X',
Figure BDA00031389198800000412
generating a matrix which is equal to the row elements and the column elements of the matrix X, and obtaining a right element matrix X' after all the first mu non-zero column elements of the matrix are set to zero; setting all the elements of the last mu non-zero columns of the matrix U to zero to obtain a left non-element matrix U';
(6c) according to the following formula, combining the matrix A, the matrix X ', the matrix U ', the matrix B, the matrix X ' and the matrix V respectively to construct two generation matrixes which are correlated in a generation matrix set corresponding to the previous stage of the current stage after the matrix to be folded is folded:
Figure BDA00031389198800000413
Figure BDA0003138919880000051
wherein G 'represents a left generator matrix, G' represents a right generator matrix;
(7) constructing a generating matrix set corresponding to the last stage of the last stage:
(7a) folding the generated matrix corresponding to the last stage according to the folding rule of the generated matrix,
adding the folded generation matrix into a generation matrix set corresponding to the previous stage of the current stage;
(7b) judging whether the value of m-1 is equal to 2, if so, executing the step (10), otherwise, executing the step (8) after taking the generated matrix set determined by the iteration as the generated matrix set corresponding to the current stage;
(8) constructing a generating matrix set corresponding to the previous stage:
according to the folding rule of the generated matrix, folding each generated matrix in the generated matrix set corresponding to the current stage, and adding the generated matrix obtained by folding into the generated matrix set corresponding to the previous stage of the current stage;
(9) judging whether the number of the generating matrixes in the generating matrix set obtained by the current iteration is equal to 2 or notm-1If so, executing the step (10), otherwise, executing the step (8) after taking the generated matrix set determined this time as the generated matrix set corresponding to the current stage;
(10) determining codes of all stages:
taking the coding parameter corresponding to each stage as the coding parameter of the corresponding code; taking the generating matrix corresponding to the last stage as the generating matrix of the sub-strip of the code of the last stage; taking each generation matrix in the generation matrix set corresponding to each other stage except the last stage as the generation matrix of each sub-strip of the corresponding code;
(11) combining the codes of all the stages into a coding group;
(12) selecting a code from the coding group, wherein the sum of the number of the information nodes and the number of the check nodes in the coding parameters of the selected code is equal to the total number of the nodes expected to be adopted;
(13) encoding data to be encoded:
averagely dividing data to be coded into t information symbols, wherein t is km(ii) a Respectively coding data to be coded by using a generating matrix corresponding to each sub-stripe of the selected code to obtain data symbols of the sub-stripe, and forming coded data by the data symbols of all the sub-stripes; the encoded data is saved to the corresponding node.
The invention relates to a folding type expandable distributed storage coding repairing method, which comprises the following steps:
(1) abandoning and repairing the condition that the total number of the failure nodes of each coded data coded by the same code is larger than the number of the check nodes in the code coding parameter, and executing the step (2) under the other conditions;
(2) judging whether the total number of the failure information nodes of all the failure nodes is a non-0 value or not, if so, executing the step (3); otherwise, executing the step (6) after judging that the failure information node does not exist but the failure check node exists;
(3) judgment of
Figure BDA0003138919880000065
If yes, executing the step (4), otherwise, executing the step (7); wherein alpha represents the total number of the failure information nodes of all the failure nodes,
Figure BDA0003138919880000066
the number of the element check nodes in the parameter which represents that each coded data adopts the same code to code and the lambda represents the total number of the failure element check nodes in all the failure check nodes;
(4) dividing the data symbols:
downloading all information symbols stored by the information node from each non-failed information node which stores the same encoded data with the failed node, downloading all meta-check symbols stored by the meta-check node from each non-failed meta-check node which stores the same encoded data with the failed node, and randomly selecting eta non-meta-check nodes from all non-failed non-meta-check nodes which store the same encoded data with the failed node to download all non-meta-check symbols in the non-meta-check nodes; dividing the data symbols belonging to the same sub-stripe into the same symbol group, and dividing the symbol groups belonging to the same coded data into the same data set; wherein η has a value equal to
Figure BDA0003138919880000061
(5) Processing each data set:
(5a) numbering each symbol group in each data set according to:
Figure BDA0003138919880000062
wherein j ish,cIndicating the number of the c-th symbol group in the h-th data set,
Figure BDA0003138919880000063
indicating a rounding-up operation, qh,cThe sequence number of the non-zero element column in the first row element of the generating matrix of the corresponding sub-stripe of the c symbol group in the h data set is represented,
Figure BDA0003138919880000064
the number of information nodes in the parameter of each coded data coded by the same code is represented;
(5b) for each data set, sequentially carrying out decoding-eliminating operation on each symbol group in the data set from the symbol group with the number of 1 in the data set;
(5c) executing step (9) after all the data sets are processed;
(6) dividing information symbols:
downloading all information symbols stored by the information node from each information node storing the same encoded data with the failed node, and dividing the information symbols belonging to the same sub-stripe into the same symbol group; dividing symbol groups belonging to the same coded data into the same data set and then executing the step (9);
(7) and recovering the information symbols corresponding to the failure information nodes:
(7a) downloading all information symbols stored by the information node from each non-failed information node which stores the same encoded data with the failed node, and randomly selecting the number of meta-check nodes equal to the value of alpha from all non-failed meta-check nodes which store the same encoded data with the failed node to download all meta-check symbols stored by the meta-check node; dividing the data symbols belonging to the same sub-stripe into the same symbol group;
(7b) decoding the data symbols in each symbol group by using a decoding method corresponding to the basic code, recovering the information symbols corresponding to the failure information nodes in the symbol group, and adding the information symbols into the symbol group;
(7c) dividing symbol groups belonging to the same coded data into the same data set;
(8) judging whether all failure nodes have failure check nodes, if so, executing the step (9), otherwise, executing the step (10);
(9) encoding the information symbols:
coding all information symbols in all symbol groups in each data set by using a coding coefficient row matrix corresponding to the failure check node, recovering the check symbols corresponding to the failure check node, and adding the recovered check symbols into the symbol groups corresponding to the same sub-strip;
(10) and saving the recovered data symbols:
adding new information nodes with the number equal to the value of alpha, adding new check nodes with the number equal to the total number of the failed check nodes of all the failed nodes, storing the recovered information symbols belonging to the same information node in the same information node, and storing the recovered check symbols belonging to the same check node in the same check symbol;
(11) and replacing the failed node with the new node.
The invention relates to a folding expandable distributed storage coding expansion method, which comprises the following steps:
(1) adding a new node:
adding rho information nodes Y to the node for storing coded data except for the case that the coded data is coded by adopting the code of the last stage1、Y2、…、YρAnd gamma check nodes F1、F2、…、FγWherein, in the step (A),
Figure BDA0003138919880000081
Figure BDA0003138919880000082
information node in coding parameters of a code representing a stage next to a corresponding stage of a code currently used for coding dataThe number of the first and second groups is,
Figure BDA0003138919880000083
representing the number of information nodes in the coding parameters of the code currently used for coding the data,
Figure BDA0003138919880000084
Figure BDA0003138919880000085
indicating the number of check nodes in the encoding parameters of the code of the next stage corresponding to the stage currently adopted for encoding data,
Figure BDA0003138919880000086
the number of check nodes in the coding parameters of the code currently adopted by the coded data is represented;
(2) dividing two sub-stripes which are mutually related to a generating matrix in the coded data into an expansion group;
(3) merging two sub-stripes within each extension group:
(3a) downloading and caching all information symbols of the sub-strips corresponding to the right generating matrix in each extended group from all information nodes for storing coded data, and respectively transferring the information symbols at different positions of the sub-strips in the information symbols to an information node Y1、Y2、…、YρIn different information nodes, combining the information symbols after migration and the information symbols not after migration into the information symbols of the merged sub-strips;
(3b) adding two element check symbols positioned in the same element check node in two sub-strips in each extended group in the element check node to obtain updated element check symbols;
(3c) coding the cached information symbols by using a complementary matrix of a left generator matrix corresponding to the sub-strip of each extended group to obtain correction symbols, and updating the non-meta-check symbols of the sub-strip corresponding to the left generator matrix by using the correction symbols;
(3d) non-meta check symbols at different positions in the sub-stripe corresponding to the right generator matrix in each extended groupNumber is respectively migrated to check node F1、F2、…、FγIn different check nodes;
(3e) combining the updated meta-check symbol, the updated non-meta-check symbol and the migrated check symbol into a check symbol of the merged sub-stripe;
(4) and combining the data symbols of all the merged sub-stripes into expanded coded data.
Compared with the prior art, the invention has the following advantages:
firstly, the number of check nodes in the next stage calculated when the coding parameter in the next stage is calculated in the coding method of the present invention is not less than the number of check nodes in the current stage, and the problems that the number of coding blocks in a single stripe remains unchanged during storage expansion and the fault-tolerant capability after system expansion cannot adapt to the node scale of the system in the prior art are solved, so that the number of check symbols in a single sub-stripe can be simultaneously increased when a code word constructed by using the coding method of the present invention has a code in the next stage of construction, and the fault-tolerant capability after system expansion can adapt to the node scale of the system.
Secondly, in the repairing method, when the data symbols are downloaded from the selected nodes of the non-failed nodes, only the nodes with the same number as the number of the information nodes need to be selected, and the problems that the number of the nodes needing to be connected is larger than the number of the information nodes and the repairing degree is high when the failed nodes are repaired in the prior art are solved, so that the repairing method has the advantages that the number of the nodes needing to be connected is equal to the number of the information nodes and the repairing degree is low when the failed nodes are repaired.
Thirdly, the extension method of the present invention can realize extension of the encoded data by merging two sub-stripes in each extension group, and since there are multiple codes in the encoding group, such extension process can be performed many times, which overcomes the problem that the prior art can only extend once with low bandwidth resource consumption, so that the extension method of the present invention has the advantage that the extension can be performed many times with low bandwidth resource consumption.
Drawings
FIG. 1 is a flow chart of the foldable scalable distributed storage coding of the present invention;
FIG. 2 is a diagram illustrating encoding of data to be encoded according to an embodiment of the present invention;
FIG. 3 is a flow diagram of a folded extensible distributed storage repair of the present invention;
FIG. 4 is a flow chart of the foldable extensible distributed storage extension of the present invention;
fig. 5 is a schematic diagram of expanding encoded data in the embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings and examples.
The implementation steps of the folding scalable distributed storage coding method of the present invention are further described with reference to fig. 1.
Step 1, setting the coding parameters of the 1 st stage.
Number k of information nodes1Number of check nodes r1Number of verification nodes s1Set as the encoding parameter of the 1 st stage, where k1、r1Is a positive integer, s1Is a non-negative integer and is less than or equal to r1
In the embodiment of the present invention, the number of information nodes, the number of check nodes, and the number of meta check nodes of the encoding parameter at the 1 st stage are set to 4, 3, and 1, respectively.
And 2, calculating the coding parameters of the next stage.
Calculating the number of information nodes and the number of check nodes in the next stage according to the following formula:
k′=2k
r′=2r-s
wherein k 'and r' respectively represent the number of information nodes and the number of check nodes of the next stage of the current stage, and k, r and s respectively represent the number of information nodes, the number of check nodes and the number of element check nodes of the current stage;
a value equal to the maximum value of the number of information nodes which are expected to be repaired with low repair complexity and fail at the same time is selected from the value range { s, s +1, …,2r-s } as the number of meta-check nodes of the next stage.
Step 3, judging whether the total number of the coding parameters obtained by the current iteration is equal to m, if so, executing step 4; otherwise, the step 2 is executed after the determined coding parameter is taken as the coding parameter of the current stage; m represents the total number of codes in the set code group to be constructed, and the value of m is an integer greater than or equal to 2.
And 4, determining final encoding parameters.
Step 1, setting the value of the number of check nodes in the coding parameter obtained by current iteration as the number of meta check nodes in the coding parameter obtained by current iteration;
and step 2, composing the number of the information nodes, the number of the check nodes and the number of the determined element check nodes obtained by current iteration into a final coding parameter.
In the embodiment of the present invention, the total number of codes in the coding group is set to be 3, the number of information nodes, the number of check nodes, and the number of meta check nodes in the coding parameters of the 2 nd stage code can be obtained by calculation and selection to be 8, 5, and 2, respectively, and the number of information nodes, the number of check nodes, and the number of meta check nodes in the coding parameters of the 3 rd stage code, that is, the number of information nodes, the number of check nodes, and the number of meta check nodes in the final coding parameters, are 16, 8, and 8, respectively.
And 5, constructing a generating matrix corresponding to the last stage.
A systematic MDS code is used as a basic code, and a generating matrix G corresponding to the last stage is constructed by using a generating matrix constructing method of the basic codem
Figure BDA0003138919880000101
Wherein G ismThe generator matrix corresponding to the last stage is shown,
Figure BDA0003138919880000102
represents a kmIdentity matrix of order, kmIs equal to the number of information nodes in the final coding parameter,
Figure BDA0003138919880000103
represents a rmLine kmMatrix of columns, rmIs equal to the number of check nodes in the final encoding parameter.
In the embodiment of the invention, a system type (24,16) RS is used as a basic code to construct a generating matrix
Figure BDA0003138919880000104
And 6, setting a folding rule of the generated matrix.
In step 1, the matrix to be folded is divided into A, B, X, U, V five matrices: wherein A denotes the 1 st to l-th rows of the matrix to be folded1A left information matrix of rows is formed,
Figure BDA0003138919880000105
Figure BDA0003138919880000106
representing the number of information nodes in the coding parameter corresponding to the previous stage of the current stage; b denotes the l-th of the matrix to be folded2Go to3A right information matrix of rows is formed,
Figure BDA0003138919880000107
x denotes the l-th of the matrix to be folded4Go to first5A matrix of rows is formed of a plurality of columns,
Figure BDA0003138919880000111
Figure BDA0003138919880000112
representing the number of meta-check nodes in the coding parameter corresponding to the previous stage of the current stage; u denotes the l-th of the matrix to be folded6Go to7A matrix of rows is formed of a plurality of columns,
Figure BDA0003138919880000113
Figure BDA0003138919880000114
representing the number of check nodes in the coding parameter corresponding to the previous stage of the current stage; v denotes the l-th of the matrix to be folded8A right non-element matrix composed of rows to the last 1,
Figure BDA0003138919880000115
step 2, generating a matrix equal to the row elements and the column elements of the matrix X, setting all the elements of the last mu non-zero columns of the matrix to zero to obtain a left element matrix X',
Figure BDA0003138919880000116
generating a matrix which is equal to the row elements and the column elements of the matrix X, and obtaining a right element matrix X' after all the first mu non-zero column elements of the matrix are set to zero; setting all the elements of the last mu non-zero columns of the matrix U to zero to obtain a left non-element matrix U';
and 3, respectively combining the matrix A, the matrix X ', the matrix U ', the matrix B, the matrix X ' and the matrix V according to the following formula to construct two generation matrixes which are related to each other and in a generation matrix set corresponding to the previous stage of the current stage after the matrix to be folded is folded:
Figure BDA0003138919880000117
Figure BDA0003138919880000118
wherein G 'represents a left generator matrix and G' represents a right generator matrix.
And 7, constructing a generating matrix set corresponding to the last stage of the last stage.
Step 1, folding the generated matrix corresponding to the last stage according to the folding rule of the generated matrix,
adding the generated matrix obtained by folding into a generated matrix set corresponding to the previous stage of the current stage;
and 2, judging whether the value of m-1 is equal to 2, if so, executing the step 10, otherwise, executing the step 8 after taking the generated matrix set determined this time as the generated matrix set corresponding to the current stage.
And 8, constructing a generating matrix set corresponding to the previous stage.
And according to the folding rule of the generated matrix, folding each generated matrix in the generated matrix set corresponding to the current stage, and adding the generated matrix obtained by folding into the generated matrix set corresponding to the previous stage of the current stage.
And 9, judging whether the total number of the generated matrix sets obtained currently is equal to m-1, if so, executing the step 10, and otherwise, executing the step 8 after taking the generated matrix set determined this time as the generated matrix set corresponding to the current stage.
In the embodiment of the present invention, the matrix P is described for convenience of description8×16The row matrixes of the coding coefficients corresponding to the 1 st to 8 th rows are respectively expressed as p1、p2、…、p8By means of symbols
Figure BDA0003138919880000121
Denotes the reservation pqThe u-th to v-th elements are set to zero to obtain a coding coefficient row matrix, q is more than or equal to 1 and less than or equal to 8, u is more than or equal to 1 and less than or equal to v and less than or equal to 16, and the symbol E is usedu′×u′An identity matrix representing one u ' row and u ' column, 1 ≦ u ' ≦ 16, denoted by the symbol Ou×v"denotes an all-zero matrix of u" rows and v "columns, 1. ltoreq. u" ≦ 16, 1. ltoreq. v "≦ 16. Firstly, the generator matrix G corresponding to the last stage is used3The division into five matrices: left information matrix A3=[E8×8|O8×8]Right information matrix B3=[O8×8|E8×8]Matrix, matrix
Figure BDA0003138919880000122
Matrix array
Figure BDA0003138919880000123
Right non-element matrix
Figure BDA0003138919880000124
Generating two AND matrices X3The same matrix is set to zero in the last 8 non-zero columns of one matrix to obtain the element matrix
Figure BDA0003138919880000125
Setting all the elements of the first 8 non-zero columns of another matrix to zero to obtain a constructed right element matrix
Figure BDA0003138919880000126
Will matrix U3All the elements of the last 8 non-zero columns are set to zero to obtain a left non-element matrix
Figure BDA0003138919880000127
Are respectively aligned with the matrix A3Matrix, matrix
Figure BDA0003138919880000128
Matrix array
Figure BDA0003138919880000129
Matrix B3Matrix, matrix
Figure BDA00031389198800001210
Matrix V3Combining to construct a pair matrix G3Two correlative generating matrixes in the generating matrix set corresponding to the previous stage of the current stage obtained after folding
Figure BDA00031389198800001211
Namely:
Figure BDA0003138919880000131
Figure BDA0003138919880000132
G2,1、G2,1i.e. 2 generator matrices in the set of generator matrices in stage 2.
The same method is adopted for G2,1、G2,2Folding to obtain two generation matrixes G1,1、G1,2,G1,3、G1,4
Figure BDA0003138919880000133
Figure BDA0003138919880000134
Figure BDA0003138919880000135
Figure BDA0003138919880000141
G1,1、G1,2,G1,3、G1,4I.e. 4 of the set of generation matrices of stage 1.
And step 10, determining codes of all stages.
Taking the coding parameter corresponding to each stage as the coding parameter of the corresponding code; taking the generating matrix corresponding to the last stage as the generating matrix of the sub-strip of the code of the last stage; and taking each generation matrix in the generation matrix set corresponding to each other stage except the last stage as the generation matrix of each sub-strip of the corresponding code.
And step 11, combining the codes of all the stages into one coding group.
And step 12, selecting a code from the coding group, wherein the sum of the number of the information nodes and the number of the check nodes in the coding parameters of the selected code is equal to the total number of the expected nodes.
In the embodiment of the present invention, the total number of nodes to be used is 7, so the code of the 1 st stage is selected from the code group.
And step 13, encoding the data to be encoded.
Averagely dividing data to be coded into t information symbols, wherein t is km(ii) a Respectively coding data to be coded by using a generating matrix corresponding to each sub-stripe of the selected code to obtain data symbols of the sub-stripe, and forming coded data by the data symbols of all the sub-stripes; the encoded data is saved to the corresponding node.
The data symbols of the sub-strips comprise information symbols and check symbols, the check symbols comprise meta check symbols and non-meta check symbols, and the data symbols are obtained by encoding data to be encoded through encoding coefficient row matrixes corresponding to all rows in a generating matrix; the information symbol is a data symbol obtained by coding data to be coded by a left information matrix in the left generating matrix or a right information matrix in the right generating matrix; the element check symbol is a data symbol obtained by encoding data to be encoded by a left element matrix in the left generating matrix or a right element matrix in the right generating matrix; the non-element check symbol is a data symbol obtained by encoding data to be encoded by a left non-element matrix in the left generating matrix or a right non-element matrix in the right generating matrix.
The coded data is stored in the corresponding node, that is, the data symbols at different positions in each sub-stripe are stored in different nodes, and the data symbols at the same positions in different sub-stripes are stored in the same node; the nodes comprise two categories of information nodes and check nodes, and the check nodes comprise two subclasses of meta check nodes and non-meta check nodes; the information node is used for storing information symbols; the meta-check node is used for storing a meta-check symbol; the non-meta check node is used for storing a non-meta check symbol.
Referring to fig. 2, implementation steps for encoding data to be encoded in the embodiment of the present invention are further described.
D in FIG. 21、D2、D3、D4Representing 4 information nodes, C1、C2、C3Represents 3 check nodes, a1、a2、…、a16Representing 16 information symbols, M representing the data to be encoded, M ═ a1,a2,…,a16]TAnd T denotes a transposition operation. Generation matrix G corresponding to 4 sub-stripes of the 1 st stage code1,1、G1,2、G1,3、G1,4Respectively encoding the data M to be encoded to obtain the data of each sub-strip as follows:
the 1 st sub-stripe comprises data symbols a1、a2、a3、a4
Figure BDA0003138919880000151
The 2 nd sub-stripe comprises data symbols a5、a6、a7、a8
Figure BDA0003138919880000152
The 3 rd sub-stripe comprises data symbols a9、a10、a11、a12
Figure BDA0003138919880000153
The 4 th sub-stripe includes data symbol a13、a14、a15、a16
Figure BDA0003138919880000154
p7M、p8M。
The data of the above 4 sub-stripes together constitute one encoded data. Sequentially storing the 1 st to 4 th information symbols in the 1 st sub-strip in an information node D1、D2、D3、D4In the method, the 1 st to 3 rd check symbols are sequentially stored in a check node C1、C2、C3Performing the following steps; the data symbols of the other sub-stripes are stored in the same way as sub-stripe 1.
The implementation steps of the folding scalable distributed storage coding repair method of the present invention are further described with reference to fig. 3.
Step 1, abandoning and repairing the condition that the total number of the invalid nodes of each coded data coded by the same code is larger than the number of the check nodes in the code coding parameter, and executing step 2 under the other conditions.
In the embodiment of the present invention, it is assumed that 2 nodes out of all the nodes in fig. 2 fail.
Step 2, judging whether the total number of the failure information nodes of all the failure nodes is a non-0 value, if so, executing the step 3; otherwise, executing step 6 after judging that the failure information node does not exist but the failure check node exists.
Step 3, judging
Figure BDA0003138919880000155
If yes, executing the step 4, otherwise, executing the step 7; wherein alpha represents the total number of the failure information nodes of all the failure nodes,
Figure BDA0003138919880000156
the number of the element check nodes in the parameter which represents that each coding data adopts the same code to code and the lambda represents the total number of the failure element check nodes in all the failure check nodes.
In the embodiment of the present invention, assume that 2 failed nodes in fig. 2 are D3、D4That is, the total number of the failed information nodes is 2, and the number of the failed element check nodes is 0.
And 4, dividing the data symbols.
Downloading all information symbols stored by an information node from each non-invalid information node which stores the same encoding data with a invalid node, downloading all meta-check symbols stored by a meta-check node from each non-invalid meta-check node which stores the same encoding data with the invalid node, and randomly selecting eta non-meta-check nodes from all non-invalid non-meta-check nodes which store the same encoding data with the invalid node to download all non-meta-check symbols in the non-meta-check nodes; number to be of the same sub-stripeDividing the symbols into the same symbol group, and dividing the symbol groups belonging to the same coded data into the same data set; wherein η has a value equal to
Figure BDA0003138919880000161
In the embodiment of the present invention, node D shown in FIG. 2 is selected from1、D2In which all information symbols are downloaded, from the meta check node C1Mid-download check symbols
Figure BDA0003138919880000162
Selecting a non-meta check node C2And from C2Mid-download check symbols
Figure BDA0003138919880000163
The downloaded data symbols are divided into 4 symbol groups: the symbol group corresponding to the 1 st sub-strip is
Figure BDA0003138919880000164
The symbol group corresponding to the 2 nd sub-band is
Figure BDA0003138919880000165
The symbol group corresponding to the 3 rd sub-strip is
Figure BDA0003138919880000166
The symbol group corresponding to the 4 th sub-strip is
Figure BDA0003138919880000167
And 5, processing each data set.
Step 1, numbering each symbol group in each data set according to the following formula:
Figure BDA0003138919880000168
wherein j ish,cIndicating the number of the c-th symbol group in the h-th data set,
Figure BDA0003138919880000169
representing an up forensic operation, qh,cThe sequence number of the non-zero element column in the first row element of the generating matrix of the corresponding sub-stripe of the c symbol group in the h data set is represented,
Figure BDA00031389198800001610
the number of information nodes in the parameter of each coded data coded by the same code is represented;
and step 2, carrying out decoding-eliminating operation on each symbol group in the data set in turn from the symbol group with the number of 1 in the data set for each data set.
The decoding-eliminating operation is to perform decoding operation first and then eliminate operation, the decoding operation refers to decoding the data symbols in the current symbol group in the data set by using a decoding method corresponding to the basic code, recovering the information symbols corresponding to the failure information nodes, and adding the information symbols into the current symbol group in the data set; the elimination operation refers to utilizing the information symbols in the current symbol group to eliminate the check symbols in each symbol group with the serial number larger than that of the current symbol group in the data set; the elimination means that if the value of an element corresponding to an information symbol in a coding coefficient row matrix of the check symbol is a non-zero value, the information symbol is multiplied by the corresponding non-zero value and then subtracted from the check symbol to obtain the eliminated check symbol.
In the embodiment of the present invention, symbol sets corresponding to the 1 st to 4 th sub-bands shown in fig. 2 are numbered as 1, 2, 3, and 4 in sequence. For symbol group with number 1
Figure BDA0003138919880000171
RS decoding is carried out to recover the information symbol a3,a4(ii) a Then, the information symbol a in the symbol group numbered 1 is used1,a2,a3,a4The check symbols in each symbol group numbered 2, 3, 4 are eliminated: in the symbol group numbered 2, the symbols are checked
Figure BDA0003138919880000172
Of the row matrix of coding coefficients
Figure BDA0003138919880000173
It can be known that
Figure BDA0003138919880000174
Neutralizing information symbol a1,a2,a3,a4The value of the corresponding element is non-zero, so
Figure BDA0003138919880000175
Minus
Figure BDA0003138919880000176
Obtaining a cancelled check symbol
Figure BDA0003138919880000177
The symbol group numbered 2 is updated to
Figure BDA0003138919880000178
After the check symbols in the symbol groups numbered 3 and 4 are eliminated by the same method, the symbol group numbered 3 is kept unchanged, and the symbol group numbered 4 is updated to
Figure BDA0003138919880000179
The information symbol a can be recovered in sequence by the same method as the above process7、a8,a11、a12,a15、a16
And 3, executing the step 9 after all the data sets are processed.
And 6, dividing the information symbols.
Downloading all information symbols stored by the information node from each information node storing the same encoded data with the failed node, and dividing the information symbols belonging to the same sub-stripe into the same symbol group; and step 9 is executed after dividing symbol groups belonging to the same coded data into the same data set.
And 7, recovering the information symbol corresponding to the failure information node.
Step 1, downloading all information symbols stored by an information node from each non-invalid information node storing the same encoded data with a invalid node, and randomly selecting the number of meta-check nodes equal to the value of alpha from all non-invalid meta-check nodes storing the same encoded data with the invalid node to download all the meta-check symbols stored by the meta-check nodes; dividing the data symbols belonging to the same sub-stripe into the same symbol group;
step 2, decoding the data symbols in each symbol group by using a decoding method corresponding to the basic code, recovering the information symbols corresponding to the failure information nodes in the symbol group, and adding the information symbols into the symbol group;
and step 3, dividing symbol groups belonging to the same coded data into the same data set.
And 8, judging whether the failure check nodes exist in all the failure nodes, if so, executing the step 9, otherwise, executing the step 10.
And 9, coding the information symbols.
And coding all information symbols in all symbol groups in each data set by using a coding coefficient row matrix corresponding to the failure check node, recovering the check symbols corresponding to the failure check node, and adding the recovered check symbols into the symbol groups corresponding to the same sub-strip.
And step 10, storing the recovered data symbols.
And adding new information nodes with the number equal to the value of alpha, adding new check nodes with the number equal to the total number of the failed check nodes of all the failed nodes, storing the recovered information symbols belonging to the same information node in the same information node, and storing the recovered check symbols belonging to the same check node in the same check symbol.
In the embodiment of the invention, the recovered information symbol a3、a7、a11、a15Save to a new node D3', information symbol a to be recovered4、a8、a12、a16Saved in another new node D4′。
And 11, replacing the failed node with the new node.
In the embodiment of the invention, node D is used3′、D4' Replacing failed node D3、D4
The implementation steps of the folding scalable distributed storage coding expansion method of the present invention are further described with reference to fig. 4.
And step 1, adding a new node.
Adding rho information nodes Y to the node for storing coded data except for the case that the coded data is coded by adopting the code of the last stage1、Y2、…、YρAnd gamma check nodes F1、F2、…、FγWherein, in the step (A),
Figure BDA0003138919880000181
Figure BDA0003138919880000182
indicating the number of information nodes in the coding parameters of the code of the next stage of the corresponding stage of the code currently used for coding data,
Figure BDA0003138919880000183
representing the number of information nodes in the coding parameters of the code currently used for coding the data,
Figure BDA0003138919880000184
Figure BDA0003138919880000185
indicating the number of check nodes in the encoding parameters of the code of the next stage corresponding to the stage currently adopted for encoding data,
Figure BDA0003138919880000186
indicating the number of check nodes in the encoding parameters of the code currently used to encode the data.
The implementation steps of the encoded data extension in the embodiment of the present invention are further described with reference to fig. 5.
In the embodiment of the present invention, the encoded data shown in fig. 2 is expanded, and fig. 5(a) shows a schematic diagram of encoded data storage after adding a node on the basis of fig. 2, where D5、D6、D7、D8Respectively, 4 information nodes newly added on the basis of FIG. 2, C4、C5Respectively, 2 check nodes newly added on the basis of fig. 2. Fig. 5(b) shows a schematic diagram of encoded data storage after expansion of encoded data.
And 2, dividing two sub-strips, which are obtained by folding the same matrix, of a generated matrix in the coded data into an expansion group.
And 3, combining the two sub-stripes in each expansion group.
Step 1, downloading and caching all information symbols of the sub-strips corresponding to the right generating matrix in each extended group from all information nodes for storing coded data, and respectively transferring the information symbols at different positions of the sub-strips in the information symbols to an information node Y1、Y2、…、YρIn different information nodes, combining the information symbols after migration and the information symbols not after migration into the information symbols of the merged sub-strips;
step 2, adding two element check symbols positioned in the same element check node in two sub-strips in each extension group in the element check node to obtain updated element check symbols;
and 3, coding the cached information symbols by using the complementary matrix of the left generating matrix corresponding to the sub-strip of each extended group to obtain correction symbols, and updating the non-meta-check symbols of the sub-strip corresponding to the left generating matrix by using the correction symbols.
The complementary matrix of the left generator matrix corresponding to the sub-strip in each extended group means that the complementary matrix is to be paired with the sub-strip in the extended groupThe folded matrix corresponding to the corresponding left generator matrix is represented as a matrix
Figure BDA0003138919880000191
Representing a left non-element matrix in a left generating matrix corresponding to the sub-strip in the extended group as a matrix W, and using the matrix to generate a left non-element matrix
Figure BDA0003138919880000192
L.16' go to l7The elements of a row form a matrix
Figure BDA0003138919880000193
Figure BDA0003138919880000194
Figure BDA0003138919880000195
The number of meta-check nodes in the coding parameters representing the code currently used for coding the data will be
Figure BDA0003138919880000196
The obtained matrix is expressed as a matrix T, and all non-zero columns in the matrix T form a complementary matrix of a left generation matrix corresponding to the sub-band in the extended group.
Step 4, respectively transferring the non-meta-check symbols at different positions in the sub-strips corresponding to the right generator matrix in each extension group to a check node F1、F2、…、FγIn different check nodes;
and 5, combining the updated meta-check symbol, the updated non-meta-check symbol and the migrated check symbol into a check symbol of the merged sub-stripe.
In the embodiment of the present invention, the generation matrix G of the 1 st subband shown in fig. 5(a) is used1,1And the generation matrix G of the 2 nd sub-stripe1,2Are related to each other, so that the 1 st sub-band and the 2 nd sub-band are divided into an extended group, and similarly, the 3 rd sub-band and the 4 th sub-band are divided into an extended group. For the 1 st and 2 nd sub-stripesAn extension group is formed, and all information symbols a in the 2 nd sub-strip are downloaded and cached5、a6、a7、a8Sequentially migrating the information symbols to a node D5、D6、D7、D8Post-migration symbol and non-migration information symbol a1、a2、a3、a4The information symbols of the merged sub-strip are composed. At node C1Internally, by merging
Figure BDA0003138919880000197
And
Figure BDA0003138919880000198
obtaining updated meta-check symbols
Figure BDA0003138919880000199
Namely, it is
Figure BDA0003138919880000201
Left generator matrix G1,1Corresponding complementary matrix is
Figure BDA0003138919880000202
By using complementary matrices to buffer information symbols a5、a6、a7、a8Coding to obtain a corrected symbol
Figure BDA0003138919880000203
Figure BDA0003138919880000204
Using the obtained two correction symbols to respectively pair
Figure BDA0003138919880000205
Is eliminated, i.e. from
Figure BDA0003138919880000206
Minus
Figure BDA0003138919880000207
From
Figure BDA0003138919880000208
Minus
Figure BDA0003138919880000209
Obtaining two updated check symbols
Figure BDA00031389198800002010
Non-meta-check symbols in the 2 nd sub-stripe
Figure BDA00031389198800002011
Respectively migrate to node C4、C5(ii) a And combining the updated meta-check symbol, the updated non-meta-check symbol and the migrated check symbol into all check symbols of the merged sub-stripe. The left sub-band in fig. 5(b) is the result obtained by combining the 1 st and 2 nd sub-bands. And combining the 3 rd sub-band and the 4 th sub-band in the same way to obtain another new sub-band. The right sub-band in fig. 5(b) is the result obtained by combining the 3 rd and 4 th sub-bands.
And 4, combining the data symbols of all the merged sub-stripes into expanded coded data.
In the embodiment of the present invention, all the data symbols of the two new sub-stripes shown in fig. 5(b) constitute the extended encoded data.

Claims (7)

1. A folding expandable distributed storage coding method is characterized in that coding parameters of the next stage are calculated, a generation matrix set corresponding to the previous stage is constructed according to a folding rule of the generation matrix, codes corresponding to each stage are determined, and codes corresponding to all stages are combined into a coding group; the method comprises the following steps:
(1) setting the coding parameters of the 1 st stage:
number k of information nodes1Number of check nodes r1Number of verification nodes s1Is set to the 1 st stepCoding parameters of the segments, wherein k1、r1Is a positive integer, s1Is a non-negative integer and is less than or equal to r1
(2) Calculating the encoding parameters of the next stage:
(2a) calculating the number of information nodes and the number of check nodes in the next stage according to the following formula:
k′=2k
r′=2r-s
wherein k 'and r' respectively represent the number of information nodes and the number of check nodes of the next stage of the current stage, and k, r and s respectively represent the number of information nodes, the number of check nodes and the number of element check nodes of the current stage;
(2b) selecting a value equal to the maximum value of the number of simultaneously failed information nodes which are expected to be repaired with low repair complexity from the value range { s, s +1, …,2r-s } as the number of meta check nodes of the next stage;
(3) judging whether the total number of the coding parameters obtained by the current iteration is equal to m, if so, executing the step (4); otherwise, executing the step (2) after taking the determined coding parameter as the coding parameter of the current stage; m represents the total number of codes in the set code group to be constructed, and the value of m is an integer greater than or equal to 2;
(4) determining the final encoding parameters:
(4a) setting the value of the number of check nodes in the coding parameter obtained by the current iteration as the number of meta check nodes in the coding parameter obtained by the current iteration;
(4b) composing the number of information nodes, the number of check nodes and the number of determined element check nodes obtained by current iteration into a final coding parameter;
(5) constructing a generating matrix corresponding to the last stage:
a systematic MDS code is used as a basic code, and a generating matrix G corresponding to the last stage is constructed by using a generating matrix constructing method of the basic codem
Figure FDA0003138919870000021
Wherein G ismThe generator matrix corresponding to the last stage is shown,
Figure FDA0003138919870000022
represents a kmIdentity matrix of order, kmIs equal to the number of information nodes in the final coding parameter,
Figure FDA0003138919870000023
represents a rmLine kmMatrix of columns, rmThe value of (a) is equal to the number of check nodes in the final encoding parameter;
(6) setting a folding rule of a generating matrix:
(6a) the matrix to be folded is divided into A, B, X, U, V five matrices: wherein A denotes the 1 st to l-th rows of the matrix to be folded1A left information matrix of rows is formed,
Figure FDA0003138919870000024
Figure FDA0003138919870000025
representing the number of information nodes in the coding parameter corresponding to the previous stage of the current stage; b denotes the l-th of the matrix to be folded2Go to3A right information matrix of rows is formed,
Figure FDA0003138919870000026
x denotes the l-th of the matrix to be folded4Go to first5A matrix of rows is formed of a plurality of columns,
Figure FDA0003138919870000027
Figure FDA0003138919870000028
representing the number of meta-check nodes in the coding parameter corresponding to the previous stage of the current stage; u denotes the l-th of the matrix to be folded6Go to7Line groupThe matrix is formed by the following steps of,
Figure FDA0003138919870000029
Figure FDA00031389198700000210
Figure FDA00031389198700000211
representing the number of check nodes in the coding parameter corresponding to the previous stage of the current stage; v denotes the l-th of the matrix to be folded8A right non-element matrix composed of rows to the last 1,
Figure FDA00031389198700000212
(6b) generating a matrix which is equal to the elements of the X row and the X column, setting the elements of the last mu non-zero columns of the matrix to zero to obtain a left element matrix X',
Figure FDA00031389198700000213
generating a matrix which is equal to the row elements and the column elements of the matrix X, and obtaining a right element matrix X' after all the first mu non-zero column elements of the matrix are set to zero; setting all the elements of the last mu non-zero columns of the matrix U to zero to obtain a left non-element matrix U';
(6c) according to the following formula, combining the matrix A, the matrix X ', the matrix U ', the matrix B, the matrix X ' and the matrix V respectively to construct two generation matrixes which are correlated in a generation matrix set corresponding to the previous stage of the current stage after the matrix to be folded is folded:
Figure FDA00031389198700000214
Figure FDA00031389198700000215
wherein G 'represents a left generator matrix, G' represents a right generator matrix;
(7) constructing a generating matrix set corresponding to the last stage of the last stage:
(7a) folding the generated matrix corresponding to the last stage according to the folding rule of the generated matrix, and adding the folded generated matrix into a generated matrix set corresponding to the previous stage of the current stage;
(7b) judging whether the value of m-1 is equal to 2, if so, executing the step (10), otherwise, executing the step (8) after taking the generated matrix set determined by the iteration as the generated matrix set corresponding to the current stage;
(8) constructing a generating matrix set corresponding to the previous stage:
according to the folding rule of the generated matrix, folding each generated matrix in the generated matrix set corresponding to the current stage, and adding the generated matrix obtained by folding into the generated matrix set corresponding to the previous stage of the current stage;
(9) judging whether the number of the generating matrixes in the generating matrix set obtained by the current iteration is equal to 2 or notm-1If so, executing the step (10), otherwise, executing the step (8) after taking the generated matrix set determined this time as the generated matrix set corresponding to the current stage;
(10) determining codes of all stages:
taking the coding parameter corresponding to each stage as the coding parameter of the corresponding code; taking the generating matrix corresponding to the last stage as the generating matrix of the sub-strip of the code of the last stage; taking each generation matrix in the generation matrix set corresponding to each other stage except the last stage as the generation matrix of each sub-strip of the corresponding code;
(11) combining the codes of all the stages into a coding group;
(12) selecting a code from the coding group, wherein the sum of the number of the information nodes and the number of the check nodes in the coding parameters of the selected code is equal to the total number of the nodes expected to be adopted;
(13) encoding data to be encoded:
averagely dividing data to be encoded intot information symbols, t ═ km(ii) a Respectively coding data to be coded by using a generating matrix corresponding to each sub-stripe of the selected code to obtain data symbols of the sub-stripe, and forming coded data by the data symbols of all the sub-stripes; the encoded data is saved to the corresponding node.
2. The method according to claim 1, wherein the data symbols of the sub-stripes in step (13) include information symbols and check symbols, the check symbols include meta check symbols and non-meta check symbols, and the data symbols are obtained by encoding data to be encoded through a row matrix of encoding coefficients corresponding to each row in a generator matrix; the information symbol is a data symbol obtained by coding data to be coded by a left information matrix in the left generating matrix or a right information matrix in the right generating matrix; the element check symbol is a data symbol obtained by encoding data to be encoded by a left element matrix in the left generating matrix or a right element matrix in the right generating matrix; the non-element check symbol is a data symbol obtained by encoding data to be encoded by a left non-element matrix in the left generating matrix or a right non-element matrix in the right generating matrix.
3. The method according to claim 1, wherein the step (13) of storing the encoded data in the corresponding node means that the data symbols at different positions in each sub-stripe are stored in different nodes, and the data symbols at the same positions in different sub-stripes are stored in the same node; the nodes comprise two categories of information nodes and check nodes, and the check nodes comprise two subclasses of meta check nodes and non-meta check nodes; the information node is used for storing information symbols; the meta-check node is used for storing a meta-check symbol; the non-meta check node is used for storing a non-meta check symbol.
4. A foldable scalable distributed storage coding repair method for foldable scalable distributed storage coding according to claim 1, characterized in that each data set is processed, information symbols are coded, and recovered data symbols are saved; the method comprises the following steps:
(1) abandoning and repairing the condition that the total number of the failure nodes of each coded data coded by the same code is larger than the number of the check nodes in the code coding parameter, and executing the step (2) under the other conditions;
(2) judging whether the total number of the failure information nodes of all the failure nodes is a non-0 value or not, if so, executing the step (3); otherwise, executing the step (6) after judging that the failure information node does not exist but the failure check node exists;
(3) judgment of
Figure FDA0003138919870000041
If yes, executing the step (4), otherwise, executing the step (7); wherein alpha represents the total number of the failure information nodes of all the failure nodes,
Figure FDA0003138919870000042
the number of the element check nodes in the parameter which represents that each coded data adopts the same code to code and the lambda represents the total number of the failure element check nodes in all the failure check nodes;
(4) dividing the data symbols:
downloading all information symbols stored by the information node from each non-failed information node which stores the same encoded data with the failed node, downloading all meta-check symbols stored by the meta-check node from each non-failed meta-check node which stores the same encoded data with the failed node, and randomly selecting eta non-meta-check nodes from all non-failed non-meta-check nodes which store the same encoded data with the failed node to download all non-meta-check symbols in the non-meta-check nodes; dividing the data symbols belonging to the same sub-stripe into the same symbol group, and dividing the symbol groups belonging to the same coded data into the same data set; wherein η has a value equal to
Figure FDA0003138919870000051
(5) Processing each data set:
(5a) numbering each symbol group in each data set according to:
Figure FDA0003138919870000052
wherein j ish,cIndicating the number of the c-th symbol group in the h-th data set,
Figure FDA0003138919870000053
indicating a rounding-up operation, qh,cThe sequence number of the non-zero element column in the first row element of the generating matrix of the corresponding sub-stripe of the c symbol group in the h data set is represented,
Figure FDA0003138919870000054
the number of information nodes in the parameter of each coded data coded by the same code is represented;
(5b) for each data set, sequentially carrying out decoding-eliminating operation on each symbol group in the data set from the symbol group with the number of 1 in the data set;
(5c) executing step (9) after all the data sets are processed;
(6) dividing information symbols:
downloading all information symbols stored by the information node from each information node storing the same encoded data with the failed node, and dividing the information symbols belonging to the same sub-stripe into the same symbol group; dividing symbol groups belonging to the same coded data into the same data set and then executing the step (9);
(7) and recovering the information symbols corresponding to the failure information nodes:
(7a) downloading all information symbols stored by the information node from each non-failed information node which stores the same encoded data with the failed node, and randomly selecting the number of meta-check nodes equal to the value of alpha from all non-failed meta-check nodes which store the same encoded data with the failed node to download all meta-check symbols stored by the meta-check node; dividing the data symbols belonging to the same sub-stripe into the same symbol group;
(7b) decoding the data symbols in each symbol group by using a decoding method corresponding to the basic code, recovering the information symbols corresponding to the failure information nodes in the symbol group, and adding the information symbols into the symbol group;
(7c) dividing symbol groups belonging to the same coded data into the same data set;
(8) judging whether all failure nodes have failure check nodes, if so, executing the step (9), otherwise, executing the step (10);
(9) encoding the information symbols:
coding all information symbols in all symbol groups in each data set by using a coding coefficient row matrix corresponding to the failure check node, recovering the check symbols corresponding to the failure check node, and adding the recovered check symbols into the symbol groups corresponding to the same sub-strip;
(10) and saving the recovered data symbols:
adding new information nodes with the number equal to the value of alpha, adding new check nodes with the number equal to the total number of the failed check nodes of all the failed nodes, storing the recovered information symbols belonging to the same information node in the same information node, and storing the recovered check symbols belonging to the same check node in the same check symbol;
(11) and replacing the failed node with the new node.
5. The method according to claim 4, wherein the decoding-removing operation in step (5b) is a decoding operation followed by a removing operation, and the decoding operation refers to decoding the data symbols in the current symbol group in the data set by using a decoding method corresponding to the basic code, recovering the information symbols corresponding to the failed information nodes, and adding the information symbols into the current symbol group in the data set; the elimination operation refers to utilizing the information symbols in the current symbol group to eliminate the check symbols in each symbol group with the serial number larger than that of the current symbol group in the data set; the elimination means that if the value of an element corresponding to an information symbol in a coding coefficient row matrix of the check symbol is a non-zero value, the information symbol is multiplied by the corresponding non-zero value and then subtracted from the check symbol to obtain the eliminated check symbol.
6. A method for expanding foldable scalable distributed storage coding according to claim 1, wherein two sub-stripes of the encoded data, which are obtained by folding the same matrix into a matrix, are divided into an expanded group, two sub-stripes in the same expanded group are combined, and data symbols of all new sub-stripes are combined into new expanded encoded data, and the method comprises the following steps:
(1) adding a new node:
adding rho information nodes Y to the node for storing coded data except for the case that the coded data is coded by adopting the code of the last stage1、Y2、…、YρAnd gamma check nodes F1、F2、…、FγWherein, in the step (A),
Figure FDA0003138919870000061
Figure FDA0003138919870000062
indicating the number of information nodes in the coding parameters of the code of the next stage of the corresponding stage of the code currently used for coding data,
Figure FDA0003138919870000063
representing the number of information nodes in the coding parameters of the code currently used for coding the data,
Figure FDA0003138919870000064
Figure FDA0003138919870000065
indicating the number of check nodes in the encoding parameters of the code of the next stage corresponding to the stage currently adopted for encoding data,
Figure FDA0003138919870000066
the number of check nodes in the coding parameters of the code currently adopted by the coded data is represented;
(2) dividing two sub-stripes which are mutually related to a generating matrix in the coded data into an expansion group;
(3) merging two sub-stripes within each extension group:
(3a) downloading and caching all information symbols of the sub-strips corresponding to the right generating matrix in each extended group from all information nodes for storing coded data, and respectively transferring the information symbols at different positions of the sub-strips in the information symbols to an information node Y1、Y2、…、YρIn different information nodes, combining the information symbols after migration and the information symbols not after migration into the information symbols of the merged sub-strips;
(3b) adding two element check symbols positioned in the same element check node in two sub-strips in each extended group in the element check node to obtain updated element check symbols;
(3c) coding the cached information symbols by using a complementary matrix of a left generator matrix corresponding to the sub-strip of each extended group to obtain correction symbols, and updating the non-meta-check symbols of the sub-strip corresponding to the left generator matrix by using the correction symbols;
(3d) respectively transferring the non-meta-check symbols at different positions in the sub-strips corresponding to the right generator matrix in each extended group to a check node F1、F2、…、FγIn different check nodes;
(3e) combining the updated meta-check symbol, the updated non-meta-check symbol and the migrated check symbol into a check symbol of the merged sub-stripe;
(4) and combining the data symbols of all the merged sub-stripes into expanded coded data.
7. The method according to claim 6, wherein the complementary matrix of the left generator matrix corresponding to the sub-strip in each extended group in step (3c) represents the folded matrix corresponding to the left generator matrix corresponding to the sub-strip in the extended group as a matrix
Figure FDA0003138919870000071
Representing a left non-element matrix in a left generating matrix corresponding to the sub-strip in the extended group as a matrix W, and using the matrix to generate a left non-element matrix
Figure FDA0003138919870000072
L.16' go to l7The elements of a row form a matrix
Figure FDA0003138919870000073
Figure FDA0003138919870000074
The number of meta-check nodes in the coding parameters representing the code currently used for coding the data will be
Figure FDA0003138919870000075
The obtained matrix is expressed as a matrix T, and all non-zero columns in the matrix T form a complementary matrix of a left generation matrix corresponding to the sub-band in the extended group.
CN202110726617.1A 2021-06-29 2021-06-29 Folding type extensible distributed storage coding and repairing and expanding method Active CN113391948B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110726617.1A CN113391948B (en) 2021-06-29 2021-06-29 Folding type extensible distributed storage coding and repairing and expanding method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110726617.1A CN113391948B (en) 2021-06-29 2021-06-29 Folding type extensible distributed storage coding and repairing and expanding method

Publications (2)

Publication Number Publication Date
CN113391948A true CN113391948A (en) 2021-09-14
CN113391948B CN113391948B (en) 2022-10-21

Family

ID=77624387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110726617.1A Active CN113391948B (en) 2021-06-29 2021-06-29 Folding type extensible distributed storage coding and repairing and expanding method

Country Status (1)

Country Link
CN (1) CN113391948B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571104A (en) * 2012-01-15 2012-07-11 西安电子科技大学 Distributed encoding and decoding method for RA (Repeat Accumulate) code
CN102667761A (en) * 2009-06-19 2012-09-12 布雷克公司 Scalable cluster database
CN103688515A (en) * 2013-03-26 2014-03-26 北京大学深圳研究生院 Method for encoding minimum bandwidth regeneration codes and repairing storage nodes
CN103688514A (en) * 2013-02-26 2014-03-26 北京大学深圳研究生院 Coding method for minimum storage regeneration codes and method for restoring of storage nodes
CN104503706A (en) * 2014-12-23 2015-04-08 中国科学院计算技术研究所 Data storing method and data reading method based on disk array
US20170077950A1 (en) * 2008-09-16 2017-03-16 File System Labs Llc Matrix-Based Error Correction and Erasure Code Methods and System and Applications Thereof
CN106790408A (en) * 2016-11-29 2017-05-31 中国空间技术研究院 A kind of coding method repaired for distributed memory system node

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170077950A1 (en) * 2008-09-16 2017-03-16 File System Labs Llc Matrix-Based Error Correction and Erasure Code Methods and System and Applications Thereof
CN102667761A (en) * 2009-06-19 2012-09-12 布雷克公司 Scalable cluster database
CN102571104A (en) * 2012-01-15 2012-07-11 西安电子科技大学 Distributed encoding and decoding method for RA (Repeat Accumulate) code
CN103688514A (en) * 2013-02-26 2014-03-26 北京大学深圳研究生院 Coding method for minimum storage regeneration codes and method for restoring of storage nodes
CN103688515A (en) * 2013-03-26 2014-03-26 北京大学深圳研究生院 Method for encoding minimum bandwidth regeneration codes and repairing storage nodes
CN104503706A (en) * 2014-12-23 2015-04-08 中国科学院计算技术研究所 Data storing method and data reading method based on disk array
CN106790408A (en) * 2016-11-29 2017-05-31 中国空间技术研究院 A kind of coding method repaired for distributed memory system node

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
MÁRTON SIPOS等: "Distributed cloud storage using network coding", 《2014 IEEE 11TH CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE (CCNC)》 *
刘冰星等: "一种网络编码分布式存储系统中的数据更新策略", 《小型微型计算机系统》 *
王意洁等: "分布式存储中的纠删码容错技术研究", 《计算机学报》 *
陈亮等: "随机二元扩展码:一种适用于分布式存储系统的编码", 《计算机学报》 *

Also Published As

Publication number Publication date
CN113391948B (en) 2022-10-21

Similar Documents

Publication Publication Date Title
TWI419481B (en) Low density parity check codec and method of the same
US10146618B2 (en) Distributed data storage with reduced storage overhead using reduced-dependency erasure codes
CN104052576B (en) Data recovery method based on error correcting codes in cloud storage
CN111697976B (en) RS erasure correcting quick decoding method and system based on distributed storage
JP2004186940A (en) Error correction code decoding device
US20120023362A1 (en) System and method for exact regeneration of a failed node in a distributed storage system
CN108132854B (en) Erasure code decoding method capable of simultaneously recovering data elements and redundant elements
WO2018072294A1 (en) Method for constructing check matrix and method for constructing horizontal array erasure code
CN112000512B (en) Data restoration method and related device
CN105518996B (en) A kind of data decoding method based on binary field reed-solomon code
CN112332856B (en) Layer decoding method and device of quasi-cyclic LDPC code
CN110764950A (en) Hybrid coding method, data restoration method and system based on RS (Reed-Solomon) code and regeneration code
CN111858169A (en) Data recovery method, system and related components
CN111786683B (en) Low-complexity polar code multi-code block decoder
CN113626250A (en) Strip merging method and system based on erasure codes
JPWO2006087792A1 (en) Encoding apparatus and encoding method
CN110061746B (en) Coupling method of space coupling LDPC code without code rate loss
CN113391948B (en) Folding type extensible distributed storage coding and repairing and expanding method
CN110990375B (en) Method for constructing heterogeneous partial repeat codes based on adjusting matrix
CN110781024B (en) Matrix construction method of symmetrical partial repetition code and fault node repairing method
CN116707545A (en) Low-consumption and high-throughput 5GLDPC decoder implementation method and device
CN112104412A (en) Accelerator suitable for low-orbit satellite broadband communication
CN109343998A (en) Erasure code-based full-distribution restoration method
US20210203364A1 (en) Apparatuses and methods for mapping frozen sets between polar codes and product codes
CN108199720A (en) A kind of node restorative procedure and system for reducing storage overhead and improving remediation efficiency

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant