CN108512553B - Truncated regeneration code construction method for reducing bandwidth consumption - Google Patents

Truncated regeneration code construction method for reducing bandwidth consumption Download PDF

Info

Publication number
CN108512553B
CN108512553B CN201810194923.3A CN201810194923A CN108512553B CN 108512553 B CN108512553 B CN 108512553B CN 201810194923 A CN201810194923 A CN 201810194923A CN 108512553 B CN108512553 B CN 108512553B
Authority
CN
China
Prior art keywords
matrix
node
data
coefficient
truncated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810194923.3A
Other languages
Chinese (zh)
Other versions
CN108512553A (en
Inventor
何荣祥
顾术实
李月
李娟�
张钦宇
王野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN201810194923.3A priority Critical patent/CN108512553B/en
Publication of CN108512553A publication Critical patent/CN108512553A/en
Application granted granted Critical
Publication of CN108512553B publication Critical patent/CN108512553B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3059Digital compression and data reduction techniques where the original information is represented by a subset or similar information, e.g. lossy compression
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6011Encoder aspects
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6047Power optimization with respect to the encoder, decoder, storage or transmission

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to a construction method of a truncated regeneration code for reducing bandwidth consumption, which deletes t information bits on the basis of (n, k) mother code parameters to obtain (n-t, k-t) truncated subcodes, adds a redundancy order, and solves the numerical value of redundancy quantity after the t nodes are coded to store data which are 0; during decoding and repairing, the coefficient vectors of t truncated nodes are supplemented to a matrix formed by the coefficient vectors corresponding to the connected nodes to form a new k x d coefficient matrix, a coding matrix formed by data downloaded from the nodes is supplemented with t rows of zero vectors to form a new coding matrix, and the supplemented coefficient matrix and the received data matrix are decoded or repaired according to MSR mother codes. The invention reduces the computational complexity of the truncated regeneration code, solves the problems of less parameter selection and poorer adaptability of the regeneration code when the network node and the bandwidth resource are limited, and realizes the regeneration code structure with low complexity and low bandwidth overhead.

Description

Truncated regeneration code construction method for reducing bandwidth consumption
Technical Field
The invention relates to the technical field of distributed storage, in particular to a method for constructing a truncated regeneration code capable of reducing bandwidth consumption.
Background
With the rapid development of the internet, the arrival of a big data era, the rapid promotion of 5G mobile communication and the explosive growth of global data traffic, higher requirements are put forward on the data storage capacity of the system. Compared with the traditional centralized storage system, the distributed storage system has the advantages of low cost, large storage capacity, strong expansibility, high parallel processing speed and the like, and gradually receives wide attention from academia and the industry. In order to ensure the overall reliability and stability of the system, a redundancy strategy is generally adopted in a distributed storage system, a replication scheme is adopted in distributed systems such as gfs (google File system), and erasure correction coding is adopted in systems such as OceanStore, but the problems of storage resource waste, overlarge bandwidth overhead and the like cannot be avoided, and the working efficiency of the system is seriously influenced.
The network coding endows the intermediate node with computing power, carries out coding operation on data, and improves the bandwidth utilization rate and the throughput rate of the whole system. Dimakis et al propose regeneration Codes (Regenerating Codes) based on network coding, achieving the optimal solution of storage-bandwidth overhead tradeoff. However, under complex network conditions, especially when bandwidth resources are limited, the adaptability of the regenerated code is poor, and further reducing the requirement of the regenerated code on the bandwidth resources is an urgent problem to be solved.
Rashmi et al propose a construction method of MBR and MSR based on Product-Matrix (Product-Matrix) framework, and although the construction of MBR code with arbitrary parameters can be realized, there are still some limitations on MSR. Goparaju et al propose a new MSR structure applicable to any n, k, d parameters based on the interference alignment idea. Li et al propose an invariant subspace MSR based generic framework with two check structures, capable of achieving minimal I/O reads. Kamath et al propose a construction of a local regeneration code, which realizes local repair characteristics through the transformation of the conventional MSR and MBR. Papariiopoulos et al propose simple regeneration codes, which are regeneration codes that achieve local repair characteristics by combining MDS codes with xor operations. Shah et al have studied the security of the regenerated code, and have realized the regenerated code with secrecy ability under the scene that the eavesdropper can visit the storage node data, and can download the repair data.
Through the analysis, the calculation cost of the regeneration code is still large at present, and the time cost of data downloading and repairing is influenced, so that the working efficiency in an actual system is influenced.
Disclosure of Invention
Aiming at the defects or shortcomings in the prior art, the invention aims to solve the technical problems that: a method for constructing a shortened regenerated code with reduced bandwidth consumption is provided.
The technical scheme is as follows:
a construction method of a truncated regeneration code for reducing bandwidth consumption is disclosed, which deletes t information bits on the basis of (n, k) mother code parameters to obtain (n-t, k-t) truncated subcodes, adds a redundancy order, and solves the value of redundancy quantity when t nodes store data which are 0 after coding;
when decoding and repairing, the coefficient vectors of t truncated nodes are supplemented to the matrix formed by the coefficient vectors corresponding to the connected nodes to form a new k multiplied by d coefficient matrix, t rows of zero vectors are supplemented to the coding matrix formed by the data downloaded from the nodes to form a new coding matrix, and the supplemented coefficient matrix and the received data matrix are decoded or repaired according to the decoding or repairing mode of the MSR mother code.
As a further improvement of the invention, the method comprises the following steps:
(1) with the parameter (n, k, d) [ alpha, gamma, B [ ]]For MSR mother code, truncating t nodes to obtain parameter (n) s ,k s ,d s )[α ss ,B s ]The following relationship exists between the MSR mother code and the truncated subcode parameters in the truncated subcode of (1):
Figure BDA0001592808120000031
(2) under the frame of a product matrix, deleting t nodes of information bits and adding redundancy to obtain
Figure BDA0001592808120000032
In which the first alpha data are known redundancy amounts, consisting of two alpha x alpha diagonalsThe message matrix M constructed by the matrix of d multiplied by alpha is as follows:
Figure BDA0001592808120000033
(3) and constructing a coding matrix C which is psi.M, making the first t row elements in C all equal to 0, and solving the value of the redundancy quantity, wherein psi is a Van der Menu matrix which is used as a coefficient matrix.
As a further improvement of the invention, the method is based on the decoding of any k from the concatenation s Each node, corresponding to the node coefficient vector constituting k s X d coefficient matrix denoted as Ψ s_DC Will make Ψ s_DC The coefficient vectors of the t truncated nodes are complemented to form a new k x d coefficient matrix psi DC . From k s Each node downloads data to form k s Coding matrix Ψ for x d s_DC M, complementing t rows of zero vectors to form a new k x d coding matrix psi DC And M. The supplemented coefficient matrix and the supplemented coding matrix may be decoded in accordance with a decoding method of an MSR code formed from a product matrix.
The specific steps of decoding include:
(1) the data collector needs to connect any k s Each node, the corresponding node coefficient vector constituting k s X d coefficient matrix Ψ s_DC Will make Ψ s_DC Complementing the coefficient vectors of t truncated nodes to form a new k x d coefficient matrix psi DC Obtaining: psi DC =[Φ DC Λ DC Φ DC ];
From k s Individual node download data construct k s Coding matrix Ψ of xd s_DC M, supplementing t rows of zero vectors to the coding matrix to form a new k x d coding matrix psi DC M, obtaining:
Figure BDA0001592808120000041
(2) will collect the data Ψ DC M right ride
Figure BDA0001592808120000042
Expressed as:
Figure BDA0001592808120000043
p and Q are intermediate variables and are both symmetric matrices, wherein,
Figure BDA0001592808120000044
(3) introducing the (i, j) matrix element, when i ≠ j, since the symmetric matrix (i, j) element is the same as the (j, i) element, namely P iji Q ij =P jij Q ji P when solution i ≠ j i,j And Q i,j
When i is j, the rest is not known except the ith element
Figure BDA0001592808120000045
It is known that S1 can be solved, and S2 can be solved in the same way to complete the decoding. Wherein phi is expressed as phi DC Column of (i.e., +) i Is a matrix phi DC The same applies to the ith column of (1).
As a further improvement of the invention, the method is carried out in such a way that, when repairing, the connection d s A help node, denoted as
Figure BDA0001592808120000051
Helper node coefficient vector construction d s X d coefficient matrix denoted as Ψ s_repair Will make Ψ s_repair The coefficient vectors of the t truncated nodes are complemented to form a new d x d coefficient matrix psi repair . Beta data is downloaded from each helper node. Form d s And the data matrix of multiplied by beta supplements the zero vector of t rows to form a new data matrix of multiplied by beta. And repairing the supplemented coefficient matrix and the received data matrix according to the repairing mode of the MSR code constructed by the product matrix.
The specific steps of repairing include:
(1) the coefficient vector of the failure node is represented as f, f
Figure BDA0001592808120000052
Calculating the missing data as
Figure BDA0001592808120000053
(2) Connection d s A help node, denoted as
Figure BDA0001592808120000054
Form d s X d coefficient matrix Ψ s_repair Will make Ψ s_repair The coefficient vectors of t truncated nodes are complemented to form a new d x d coefficient matrix psi repair Obtaining:
Figure BDA0001592808120000055
(3) downloading beta data from each helper node, forming d s The data matrix of x beta supplements t rows of zero vectors to form a new data matrix of d x beta to help the node data and phi f Right multiplication to obtain
Figure BDA0001592808120000056
To a new node, the new node being from d s A helper node accepting the data Ψ repairf And get the lead
Figure BDA0001592808120000057
Expressed as:
Figure BDA0001592808120000058
the lost data can be repaired by transposition
Figure BDA0001592808120000059
As a further improvement of the invention, the method comprises the following steps when applied to the overhead in complexity:
(1) defining the primitive polynomial as the XOR number of the basic operations in the finite field GF (2w) of g (z), the addition is bitwise XOR requiring w XOR operations, the multiplication is a multiplication of the polynomial and then dimension reduction using the primitive polynomial g (z), requiring μ w XOR, where μ ═ 1) + | g (z) | m 0
(2) Defining solving equation Ax ═ B, where a is an n × n matrix, the unknowns x are n × 1 matrix, B is an n × 1 matrix, sub-exclusive or is required to solve a-1, and the unknowns matrix x is solved by the product of the two matrices, a-1B.
As a further improvement of the invention, the encoding process of the method comprises the following steps:
(1) the ta complete redundancy quantities are solved by a ta group process and converted into a solved mathematical model Ax which is B, wherein A is a ta multiplied by ta matrix, the unknown number x is a ta multiplied by 1 matrix, and B is a ta multiplied by 1 matrix, so that the ta complete redundancy quantities need to be solved
Figure BDA0001592808120000061
A secondary exclusive or operation;
(2) multiplying a coefficient matrix of size (n + t) × (2k +2t-2) with a message matrix of size (2k +2t-2) × α requires α (n + t) (2k +2t-2) multiplications and α (n + t) (2k +2t-3) additions, each multiplication requiring μ w exclusive-ors and each addition requiring w exclusive-ors, performing α (n + t) w · [ (2k +2t-2) μ + (2k +2t-3)]After the XOR is carried out, averaging each bit of original data to obtain the XOR frequency of the codes
Figure BDA0001592808120000062
As a further improvement of the invention, the decoding process of the method comprises the following steps:
(1) the completion operation of the coefficient matrix and the collected data does not involve an exclusive or operation;
(2) data of size (k + t) × α is right-multiplied by α × (k + t)
Figure BDA0001592808120000063
Carrying out alpha (k + t) 2 Sub-addition of (alpha-1) (k + t) 2 Secondary multiplication;
(3) by passing
Figure BDA0001592808120000071
An equation in the form of Ax ═ B, where a is a 2 × 2 matrix, can be solved for the elements at positions i ≠ j in the P and Q matrices, for an xor of 4(k + t) 2 μw;
(4) Due to the fact that
Figure BDA0001592808120000074
The original data S1 can be decoded by solving a equations set, and S2 is solved by Q.
As a further improvement of the invention, the repair process of the method comprises the following steps:
(1) in the help node, the node stores a 1 multiplied alpha matrix and a multiplied alpha multiplied beta failure node coefficient matrix to generate 1 multiplied beta new data;
(2) the new node receives the (2k + t +2) x β matrix from the helper node and the inverse of the (2k + t +2) x (2k + t +2) coefficient matrix
Figure BDA0001592808120000072
Multiplication and XOR order of beta (2k + t +2) [ (2k + t +2) μ w + (2k + t +1) w](ii) a Transpose addition again, require (2k + t +2) β μ xor. Average to each bit of original data, proceed
Figure BDA0001592808120000073
And performing XOR again to finish the repair.
The invention has the beneficial effects that:
in order to reduce the bandwidth consumption of the regeneration code in a system and improve the applicability of the regeneration code in a bandwidth resource limiting network, the invention achieves the purposes of reducing the number of storage nodes and reducing the bandwidth overhead by reducing partial information bits of a mother code on the premise of constructing the regeneration code by using a product matrix, and introduces Binary Addition and Shift arithmetic (BASIC) to reduce the computational complexity of the shortened regeneration code and solve the problems of less parameter selection and poorer adaptability of the regeneration code when network nodes and bandwidth resources are limited, thereby realizing the regeneration code constructing method with low complexity and low bandwidth overhead.
Drawings
Fig. 1a is a performance analysis of unit bandwidth overhead, MSR code and truncated MSR code when t is 1 according to the present invention;
fig. 1b is a performance analysis of unit bandwidth overhead, MSR code and truncated MSR code when t is 2 according to the present invention;
fig. 2a is a bandwidth overhead comparison of schemes where n is 8 and k is 3, where RS, MBR, MSR, mbrt is 1 and mbrt is 2;
fig. 2b is a bandwidth overhead comparison of the schemes of RS, MBR, MSR, mbrt ═ 1, and mbrt ═ 2 when n is 9 and k is 3 according to the present invention;
fig. 2c is a bandwidth overhead comparison of the schemes of RS, MBR, MSR, mbrt ═ 1, and mbrt ═ 2 when n is 16 and k is 6 according to the present invention;
FIG. 3a is a graph comparing the encoding complexity of the present invention using MSRs, truncated MSRs, and BASIC truncated MSRs;
FIG. 3b is a comparison of decoding complexity using MSRs, truncated MSRs, and BASIC truncated MSRs according to the present invention;
FIG. 3c is a graph comparing the repair complexity of the present invention using MSRs, truncated MSRs, and BASIC truncated MSRs.
Detailed Description
The invention is further described with reference to the following description and embodiments in conjunction with the accompanying drawings.
Description of the principles of the invention:
aiming at the problems that when network nodes and bandwidth resources are limited, the selection of parameters of the regeneration codes is less and the adaptability is poor, a method for constructing the shortened regeneration codes for reducing the bandwidth consumption is provided, and the implementation principle and the performance analysis are detailed in the following analysis.
Designing a group of regeneration codes with the parameters of (n, k, d) [ alpha, beta, B ], wherein n is the total number of nodes, k is the number of nodes needing to be connected for decoding, d is the number of nodes needing to be connected for repairing, alpha is the storage capacity of each node, beta is the size of data downloaded from one node during repairing, and B is the number of original symbols capable of being coded at one time. The existing indexes of the regeneration code, namely storage cost alpha and bandwidth cost gamma, describe the performance of the node and cannot reflect the performance in an actual system. For example, in table 1, when the link bandwidth is limited to β ═ 1 and the total data amount M is 6, and simply using two indexes α and γ as criteria for evaluating the performance of the regenerated code, it can be seen that the parameter (4,2,2) [1,2] MSR code has the minimum overhead and can be considered as the best performance, but in an actual system, the actual system memory occupation and bandwidth consumption of the (4,2,2) [1,2] MSR code are not the best due to the difference in the bandwidth limitation and the number of striping, whereas the (6,3,4) [2,4] MSR code with a larger α and γ performs better instead.
Table 1 parameter comparison for three MSR code schemes (M ═ 6, β ═ 1)
Figure BDA0001592808120000091
In order to fairly and intuitively compare the performance of the regeneration code in an actual system under different parameters, two new regeneration code indexes are defined: unit storage overhead usc (unit storage cost), that is, the hard disk space occupied by each unit data block during storage; unit bandwidth overhead, urb, (unit bandwidth), i.e. the transmission bandwidth consumed for repairing each unit data block. USC and URB of MSR code and MBR code are respectively expressed as
Figure BDA0001592808120000092
And
Figure BDA0001592808120000093
then: ,
Figure BDA0001592808120000101
Figure BDA0001592808120000102
the invention applies the truncation idea to the regeneration code and uses two indexes of unit storage cost USC and unit bandwidth cost URB to judge the performance of the regeneration code. The truncation is to delete t information bits on the basis of the (n, k) mother code to obtain a truncated subcode of (n-t, k-t), thereby achieving the purpose of shortening the code. Preferably, the key technology of the core of the invention is that some information bits are deleted, and redundancy is added, so that the data stored by t nodes after coding are all 0, and therefore, the nodes do not need to be stored, and the cost in an actual system is lower.
In the present invention, since the shortened MBR code cannot improve the performance, the present invention mainly considers the shortened MSR code.
The steps during encoding are as follows:
(1) assume that the parameters are (n, k, d) [ α, γ, B ]]The MSR mother code of (a) is obtained by truncating t nodes to obtain a parameter of (n) s ,k s ,d s )[α ss ,B s ]The following relationship exists between the parameters of the mother code and the child code in the truncated child code of (1), and is expressed by the formula:
n s =n-t
k s =k-t
d s =d-t
α s =α
γ s =γ-t
B s =B-t·α
(2) also, because the MSR mother code is a product matrix based construction and the parameters of the MSR mother code are constrained by the product matrix framework, the preferred inventive truncated MSR construction is an improvement under the product matrix framework. In the encoding process, the information is obtained after information bits are deleted and redundancy is added
Figure BDA0001592808120000111
Where the first a data is a known amount of redundancy and is a linear combination of the last elements. The coefficient matrix Ψ is a vandermonde matrix, shaped as:
Figure BDA0001592808120000112
and the message matrix M is a d × α matrix formed by two α × α diagonal matrices, in the form of:
Figure BDA0001592808120000113
(3) and (3) the coding matrix C is psi.M, all the first t row elements in C are equal to 0, and the linear relation between the alpha redundancy quantities and the original data can be determined, so that the numerical values of the redundancy quantities can be solved.
In decoding, from connecting arbitrary k s Each node, corresponding to the node coefficient vector constituting k s X d coefficient matrix denoted as Ψ s_DC Will make Ψ s_DC The coefficient vectors of the t truncated nodes are complemented to form a new k x d coefficient matrix psi DC . From k s Individual node download data construct k s Coding matrix Ψ for x d s_DC M, complementing t rows of zero vectors to form a new k x d coding matrix psi DC And M. And decoding the supplemented coefficient matrix and the supplemented coding matrix according to a decoding mode of the MSR code formed by the product matrix.
The specific decoding process comprises the following steps:
(1) the data collector needs to connect any k s Each node, the corresponding node coefficient vector constituting k s X d coefficient matrix denoted as Ψ s_DC . Will make Ψ s_DC Complementing the coefficient vectors of t truncated nodes to form a new k x d coefficient matrix psi DC The concrete formula is as follows:
Ψ DC =[Φ DC Λ DC Φ DC ]
(2) from k s Individual node download data construct k s Coding matrix Ψ for x d s_DC M, supplementing t rows of zero vectors to the coding matrix to form a new k x d coding matrix psi DC M, formulated as:
Figure BDA0001592808120000121
first, collect data Ψ DC Right multiplication of M
Figure BDA0001592808120000122
Is formulated as:
Figure BDA0001592808120000123
let P and Q be intermediate variables, then:
Figure BDA0001592808120000124
Figure BDA0001592808120000125
due to S 1 And S 2 Are both symmetric matrices, so P and Q are both symmetric matrices. Therefore, it is
Figure BDA0001592808120000126
Simplified to be represented by P and Q
Figure BDA0001592808120000127
In addition, due to P, Q, Λ DC Are all symmetric matrices, then
Figure BDA0001592808120000128
Also a symmetric matrix.
Secondly, introduce the (i, j) matrix element, when i ≠ j, since the symmetric matrix (i, j) element is the same as the (j, i) element, i.e. P iji Q ij =P jij Q ji P when solution i ≠ j i,j And Q i,j . When calculating i ═ j, the i-th element is not known, and the rest are
Figure BDA00015928081200001210
Are all known, S can be solved 1 . By the same principle, S 2 Finish decoding, where phi is all expressed as phi DC Column of (i.e., +) i Is a matrix phi DC The same applies to the ith column of (1).
At the time of repair, connect d s A help node, denoted as
Figure BDA0001592808120000129
Helper node coefficient vector construction d s X d coefficient matrix denoted as Ψ s_repair Will make Ψ s_repair The coefficient vectors of t truncated nodes are complemented to form a new d x d coefficient matrixΨ repair . Beta data is downloaded from each helper node. Form d s And the data matrix of multiplied by beta supplements the zero vector of t rows to form a new data matrix of multiplied by beta. And repairing the repaired coefficient matrix and the received data matrix according to the repairing mode of the MSR code constructed by the product matrix.
The specific repairing process comprises the following steps:
(1) the coefficient vector of the failure node is represented as f, f
Figure BDA0001592808120000131
The missing data is calculated as:
Figure BDA0001592808120000132
(2) connection d s A help node, denoted as
Figure BDA0001592808120000133
These helper node coefficient vectors constitute d s X d coefficient matrix denoted as Ψ s_repair . Will Ψ s_repair The coefficient vectors of the t truncated nodes are complemented to form a new d x d coefficient matrix psi repair The following formula is obtained:
Figure BDA0001592808120000134
(3) beta data is downloaded from each helper node. Form d s And the data matrix of multiplied by beta supplements the zero vector of t rows to form a new data matrix of multiplied by beta. Helper node data and phi f Right multiplication to obtain
Figure BDA0001592808120000135
And transmitting to the new node. New node slave d s A helper node accepting the data Ψ repairf
First, the new node pre-multiplies the received data
Figure BDA0001592808120000136
Is formulated as:
Figure BDA0001592808120000137
second, lost data can be repaired by transposition
Figure BDA0001592808120000138
And finishing the repair.
In the present invention, the truncated MSR code is constructed based on the product matrix, so that the constraint d is 2 k-2. For the MSR code, a truncated MSR code with a parameter of (n, k,2k + t-2) can be obtained by truncating t bits from the MSR mother code of (n + t, k + t,2k +2t-2), and then the USC and URB of the truncated MSR code are respectively expressed as:
Figure BDA0001592808120000141
Figure BDA0001592808120000142
compared with the MSR code with the same n, k parameters, shortening the MSR can reduce the unit bandwidth overhead on the basis of maintaining the same unit storage overhead, as shown in simulation results of fig. 1a and 1 b.
It is seen from fig. 1a and 1b that, preferably, when the number of truncated bits is the same, the unit bandwidth overhead difference between the truncated MSR and the MSR code will gradually decrease as the k value increases, because the number of truncated bits is gradually reduced in proportion to the total number of nodes. The larger the truncated bit number t is, the larger the truncated node number ratio is, and the unit bandwidth overhead of the truncated MSR is reduced more obviously. Shortening the MSR code enables a reduction in the unit bandwidth overhead
Figure BDA0001592808120000143
The unit bandwidth overhead can be reduced by 10% by truncating one bit for k values equal to 5. After the k value is more than 10, the reduction of the truncated bit is less than 5%. When the k value is less than or equal to 9, the reduction of the truncated two bits reaches 10 percent. The reduction of two truncated bits after the k value is more than 19 is less than 5%.
In practical experiments, original files with sizes of 4KB, 8KB, 12KB, 16KB and 20KB were selected, and in the parameter fig. 2 a: n-8, k-3, fig. 2 b: n-9, k-3, fig. 2 c: under the condition that n is 16 and k is 6, comparing bandwidth overheads of several coding schemes such as RS code, MBR, MSR, and truncated MSR in an actual system, we obtain the bandwidth overheads as shown in fig. 2a, fig. 2b, and fig. 2 c.
First analyzing FIG. 2a, truncating the MSR reduces bandwidth overhead compared to the MSR, and bandwidth overhead decreases as the number of truncations increases. Secondly, comparing the bandwidth overhead under different redundancies with the same k value longitudinally, as shown in fig. 2a and fig. 2b, although the redundancy is increased by the n value, the bandwidth overhead of various schemes is kept unchanged, the bandwidth overhead which can be reduced by shortening the MSR is also unchanged, and the performance of the shortened MSR is verified to be independent of the redundancy. When the same redundancy is different for k values, as in fig. 2a and 2c, the RS code bandwidth overhead is still the largest, the MBR is kept to a minimum, and the MSR and the two truncated MSRs are centered. When k is 3, the MSR with one bit of truncation is reduced by 16.7% of bandwidth overhead compared with the MSR, and the two bits of truncation are reduced by 24.6%; when k is 6, the shortening of one bit is only reduced by 4.4%, the shortening of two bits is reduced by 12.1%, although the shortening still can reduce the bandwidth overhead, the reduction amplitude is not obvious when the k value is increased, and the reduction amplitude of the bandwidth overhead caused by the shortening in the actual system is basically consistent with the reduction amplitude of the bandwidth overhead caused by the shortening in the actual system
Figure BDA0001592808120000151
When the overhead construction in the aspect of complexity constructs the truncated regeneration code, the method of the invention comprises the following steps:
(1) first, a finite field GF (2) with primitive polynomial g (z) is defined w ) The addition is bitwise exclusive-or, w exclusive-or operations are required, the multiplication is multiplication of a polynomial, and then μ w exclusive-or is required by using the primitive polynomial g (z) dimension reduction, where μ ═ 1) + | g (z) | 0
(2) And secondly, defining the XOR times of solving equations, and using the XOR times as a mathematical model to solve the complexity analysis of the subsequent truncated MSR code. Assuming that the solution equation Ax is B, where A is n × n matrix, the unknowns x are n × 1 matrix, and B is n × 1 matrix, A- 1 Requiring a sub-exclusive OR, the unknown matrix x being the product A of two matrices -1 B solves the result that the XOR number of the matrix multiplication is n 2 Mu w + n (n-1) w, because the magnitude of the XOR times of the inverse matrix is far greater than that of the matrix product, for simple processing, the total XOR times of equation solution are set as
Figure BDA0001592808120000152
Assuming that the original data has Bm-bit symbols in common, the original data is divided into B original code blocks, each of which has m symbols, if the finite field is GF (2) w ) And w is m.
The encoding process is divided into two steps, specifically:
(1) firstly, t α group equations are used to solve t α complemented redundancy, and the t α complemented redundancy is converted into a mathematical model Ax ═ B, where a is a t α × t α matrix, the unknown number x is a t α × 1 matrix, and B is a t α × 1 matrix, so that it is necessary to solve the problem that B is a t α × 1 matrix
Figure BDA0001592808120000161
A secondary xor operation.
(2) Second, multiplying a coefficient matrix of size (n + t) × (2k +2t-2) with a message matrix of size (2k +2t-2) × α requires α (n + t) (2k +2t-2) multiplications and α (n + t) (2k +2t-3) additions, each requiring μ w XOR, each requiring w XOR, which requires α (n + t) w · [ (2k +2t-2) μ + (2k +2t-3)]And (4) performing secondary exclusive or. Averaging to each bit of original data, the XOR number of codes is
Figure BDA0001592808120000162
The decoding process is divided into four steps:
(1) first, the padding operation on the coefficient matrix and the collected data does not involve an exclusive-or operation. Then, data of size (k + t) × α is right-multiplied by α × (k + t)
Figure BDA0001592808120000163
Need α (k + t) 2 Sub-addition of (alpha-1) (k + t) 2 Multiplication, this step totalling (k + t) 2 w·[αμ+α-1]sub-XOR;
(2) secondly, by
Figure BDA0001592808120000164
An equation in the form of Ax ═ B, where a is a 2 × 2 matrix, can be solved for the elements at positions i ≠ j in the P and Q matrices, for an xor of 4(k + t) 2 μw;
(3) Finally, since it is known that
Figure BDA0001592808120000165
The original data S can be decoded by solving a equation set 1 . Each equation is formed as AxB ═ C, where a is a 1 × α matrix, x is an α × α matrix, and C is a 1 × α matrix, and can be modified to be B T ·(x T A T )=C T Solution of the equation requires
Figure BDA0001592808120000166
Xored to obtain x T A T D; a common alpha set of equations, then the XOR order is
Figure BDA0001592808120000167
Forming a group as x T A T Reconstructing the D equation to form ES 1 F, where E is an α × α matrix, S 1 Is alpha x 1 matrix, F is alpha x 1 matrix, S is solved 1 Require exclusive or
Figure BDA0001592808120000171
Next, the process is carried out. In the same way, S can be solved through Q 2 . Averaging to each bit of original data, decoding needs
Figure BDA0001592808120000172
And (4) performing secondary exclusive or.
The repair process is divided into two steps:
(1) first, in the helper node, the local node stores a 1 × α matrix multiplied by an α × β failure node coefficient matrix, and new 1 × β data is generated. The process is carried out by the exclusive OR operation of beta alpha multiplication and beta (alpha-1) addition, wherein the times are beta [ alpha mu w + (alpha-1) w ]. Since there are a total of 2k + t +2 helper nodes, the total XOR order is β (2k + t +2) [ α μ w + (α -1) w ].
(2) Second, the new node receives the (2k + t +2) × β matrix from the helper node and the inverse of the (2k + t +2) × (2k + t +2) coefficient matrix
Figure BDA0001592808120000173
Multiplication and XOR order of beta (2k + t +2) [ (2k + t +2) μ w + (2k + t +1) w]. Transpose addition again, require (2k + t +2) β μ xor. Averaging to each bit of original data, repair needs
Figure BDA0001592808120000174
And (4) performing secondary exclusive or.
The complexity of the invention was analyzed:
the complexity of the truncated MSR code in the present invention is compared with the conventional erasure codes, MSR codes, and partial repair regeneration codes as shown in table 2. The truncated MSR needs to calculate the redundancy added before the original data in the coding link, and zero padding is needed to decode and repair according to the rule of the MSR mother code, so the complexity of the truncated MSR is slightly higher than that of the MSR, and the coding mode reduces the bandwidth overhead by sacrificing the calculation overhead.
TABLE 2 complexity comparison
Figure BDA0001592808120000181
Analyzing the complexity of various schemes, all involving the mu factor brought by finite field operation, Hou et al propose a BASIC operation, replacing the traditional finite field calculation, which can reduce the calculation complexity. BASIC operations can be applied to the truncated MSR code, called BASIC _ ssmsr code, and can reduce the computational overhead in each of the encoding, decoding, and repairing stages, as shown in fig. 3a, 3b, and 3 c.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (7)

1. A construction method of a truncated regeneration code for reducing bandwidth consumption is characterized in that t information bits are deleted on the basis of (n, k) mother codes serving as parameters to obtain (n-t, k-t) truncated subcodes, a redundancy order is added, data stored in t nodes after coding are all 0, and the numerical value of redundancy is solved;
when decoding and repairing, a matrix formed by coefficient vectors corresponding to connected nodes is supplemented with coefficient vectors of t truncated nodes to form a new k multiplied by d coefficient matrix, an encoding matrix formed by data downloaded from the nodes is supplemented with t rows of zero vectors to form a new encoding matrix, and the supplemented coefficient matrix and the received data matrix are decoded or repaired according to a decoding or repairing mode of an MSR mother code;
the method comprises the following steps:
(1) with the parameter (n, k, d) [ alpha, gamma, B [ ]]The method is characterized in that the method is MSR mother code, wherein n is total node number, k is node number needing to be connected for decoding, d is node number needing to be connected for repairing, alpha is storage capacity of each node, beta is data size downloaded from one node during repairing, B is original symbol number capable of being coded at one time, and gamma represents total bandwidth needed when a single file coding block of a fault node is repaired; truncating t nodes to obtain a parameter of (n) s ,k s ,d s )[α ss ,B s ]The following relationship exists between the MSR mother code and the truncated subcode parameters in the truncated subcode of (1):
Figure FDA0003781487200000011
(2) under the frame of a product matrix, deleting t nodes of information bits and adding redundancy to obtain
Figure FDA0003781487200000012
Wherein the first α data are known redundancy quantities, and a d × α matrix constructed by two α × α diagonal matrices constructs a message matrix M as:
Figure FDA0003781487200000021
(3) constructing a coding matrix C ═ Ψ · M, making the first t row elements in C all equal to 0, and solving the numerical value of the redundant margin, wherein Ψ is a coefficient matrix which is a Van der Menu matrix;
in decoding, from connecting arbitrary k s Each node, corresponding to the node coefficient vector constituting k s X d coefficient matrix, denoted as Ψ s_DC Will make Ψ s_DC The coefficient vectors of the t truncated nodes are complemented to form a new k x d coefficient matrix psi DC From k to k s Individual node download data construct k s Coding matrix Ψ of xd s_DC M, complementing t rows of zero vectors to form a new k x d coding matrix psi DC M, decoding the filled coefficient matrix and the filled coding matrix according to the decoding mode of MSR codes formed by the product matrix;
the steps in decoding include:
(1) the data collector needs to connect any k s Each node, the corresponding node coefficient vector constituting k s X d coefficient matrix Ψ s_DC Will make Ψ s_DC The coefficient vectors of t truncated nodes are complemented to form a new k x d coefficient matrix psi DC Expressed as: Ψ DC =[Φ DC Λ DC Φ DC ];
From k s Individual node download data construct k s Coding matrix Ψ for x d s_DC M, supplementing t rows of zero vectors to the coding matrix to form a new k x d coding matrix psi DC M, obtaining:
Figure FDA0003781487200000022
(2) will collect the data Ψ DC Right multiplication of M
Figure FDA0003781487200000023
To obtain
Figure FDA0003781487200000024
Wherein the content of the first and second substances,
Figure FDA0003781487200000031
Ψ DC for decoding a sub-matrix of k x d coefficients connecting nodes, S 1 And S 2 A symmetric sub-matrix which is a message matrix;
(3) introducing (i, j) matrix elements, when i ≠ j, since the symmetric matrix (i, j) elements are identical to the (j, i) elements, i.e. P iji Q ij =P jij Q ji P when the solution is i not equal to j i,j And Q i,j
When i equals j, the rest except the i-th element is unknown
Figure FDA0003781487200000036
Are all known, can solve S 1 In this way, S is solved 2 Finish decoding, where phi are all expressed as phi DC Column vector of (phi) i Is a matrix phi DC The ith column vector of (2).
2. The method of claim 1, wherein the connection d is used for repair s A help node, denoted as
Figure FDA0003781487200000034
Helper node coefficient vector construction d s X d coefficient matrix denoted as Ψ s_repair Will make Ψ s_repair The coefficient vectors of t truncated nodes are complemented to form a new d x d coefficient matrix psi repair Downloading beta data from each helper node, forming d s And supplementing the zero vector of the t rows by the data matrix of the multiplied by beta to form a new data matrix of the multiplied by beta, and repairing the supplemented coefficient matrix and the received data matrix according to the repairing mode of the MSR code constructed by the product matrix.
3. The method of claim 2, wherein the repairing step comprises:
(1) the coefficient vector of the failure node is represented as f, f
Figure FDA0003781487200000032
Calculating the missing data as
Figure FDA0003781487200000033
(2) Connection d s A help node, denoted as
Figure FDA0003781487200000035
Form d s X d coefficient matrix Ψ s_repair Will make Ψ s_repair The coefficient vectors of the t truncated nodes are complemented to form a new d x d coefficient matrix psi repair Obtaining:
Figure FDA0003781487200000041
(3) downloading beta data from each helper node, forming d s The data matrix of x beta supplements t rows of zero vectors to form a new data matrix of d x beta to help the node data and phi f Right multiplication to obtain
Figure FDA0003781487200000042
To a new node, the new node being from d s A helper node accepting the data Ψ repairf And get the lead
Figure FDA0003781487200000043
Obtaining:
Figure FDA0003781487200000044
the lost data can be repaired by transposition
Figure FDA0003781487200000045
4. The method of claim 1, wherein the method applied to the complexity overhead comprises the steps of:
(1) finite field GF (2) defining a primitive polynomial g (z) w ) The addition is bitwise exclusive-or, w exclusive-or operations are required, the multiplication is multiplication of a polynomial, and then μ w exclusive-or is required by using the primitive polynomial g (z) dimension reduction, where μ ═ 1) + | g (z) | 0
(2) Defining solving equation Ax ═ B, where a is n × n matrix, unknowns x are n × 1 matrix, B is n × 1 matrix, solving a -1 A sub-exclusive OR is required to solve, and the unknown matrix x is formed by the product A of two matrices -1 And B is solved.
5. The method of claim 4, wherein the encoding process comprises the steps of:
(1) t alpha compensated redundancy is solved by a t alpha group process and converted into a mathematical model Ax which is solved, wherein A is a t alpha multiplied by t alpha matrix, unknown x is a t alpha multiplied by 1 matrix, and B is a t alpha multiplied by 1 matrix, so that the requirement on solving the redundancy is met
Figure FDA0003781487200000046
A sub exclusive or operation, wherein alpha is the storage capacity of each node, t is the line number of the supplementary zero vector, and μ w is the polynomial multiplication operation times;
(2) multiplying a coefficient matrix of size (n + t) × (2k +2t-2) with a message matrix of size (2k +2t-2) × α requires α (n + t) (2k +2t-2) multiplications and α (n + t) (2k +2t-3) additions, each multiplication requiring μ w exclusive-ors and each addition requiring w exclusive-ors, performing α (n + t) w · [ (2k +2t-2) μ + (2k +2t-3)]After the XOR is carried out, averaging each bit of original data to obtain the XOR frequency of the codes
Figure FDA0003781487200000051
6. The method as claimed in claim 4, wherein the decoding process comprises the following steps:
(1) the completion operation of the coefficient matrix and the collected data does not involve an exclusive or operation;
(2) data of size (k + t) × α is right-multiplied by α × (k + t)
Figure FDA0003781487200000052
Carrying out alpha (k + t) 2 Sub-addition of (alpha-1) (k + t) 2 A secondary multiplication;
(3) by passing
Figure FDA0003781487200000053
An equation in the form of Ax ═ B, where A is a 2 × 2 matrix, can be solved for the elements at i ≠ j positions in the P matrix and the Q matrix, when the XOR number is 4(k + t) 2 μw;
(4) Due to the fact that
Figure FDA0003781487200000055
The original data S can be decoded by solving alpha equations 1 In turn, solve for S by Q 2
7. The method of claim 4, wherein the repair process comprises the steps of:
(1) in the help node, the node stores a 1 multiplied alpha matrix and a multiplied alpha multiplied beta failure node coefficient matrix to generate 1 multiplied beta new data;
(2) the new node receives the (2k + t +2) x β matrix from the helper node and the inverse of the (2k + t +2) x (2k + t +2) coefficient matrix
Figure FDA0003781487200000054
Multiplication of Ψ repair Represents a sub-matrix of d x d coefficients for the repair assistance node, the xor order being β (2k + t +2) [ (2k + t +2) μ w + (2k + t +1) w](ii) a Transpose and add again, need (2k + t +2) beta mu times of XOR, average to each bit of original data, proceed
Figure FDA0003781487200000061
And performing XOR again to finish the repair.
CN201810194923.3A 2018-03-09 2018-03-09 Truncated regeneration code construction method for reducing bandwidth consumption Active CN108512553B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810194923.3A CN108512553B (en) 2018-03-09 2018-03-09 Truncated regeneration code construction method for reducing bandwidth consumption

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810194923.3A CN108512553B (en) 2018-03-09 2018-03-09 Truncated regeneration code construction method for reducing bandwidth consumption

Publications (2)

Publication Number Publication Date
CN108512553A CN108512553A (en) 2018-09-07
CN108512553B true CN108512553B (en) 2022-09-27

Family

ID=63377374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810194923.3A Active CN108512553B (en) 2018-03-09 2018-03-09 Truncated regeneration code construction method for reducing bandwidth consumption

Country Status (1)

Country Link
CN (1) CN108512553B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111971945A (en) * 2019-04-03 2020-11-20 东莞理工学院 Rack sensing regeneration code for data center
CN113553212B (en) * 2021-07-28 2023-07-18 哈尔滨工业大学(深圳) Hybrid regeneration coding repair method and system for satellite cluster storage network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102624866A (en) * 2012-01-13 2012-08-01 北京大学深圳研究生院 Data storage method, data storage device and distributed network storage system
CN103688515A (en) * 2013-03-26 2014-03-26 北京大学深圳研究生院 Method for encoding minimum bandwidth regeneration codes and repairing storage nodes
CN103688514A (en) * 2013-02-26 2014-03-26 北京大学深圳研究生院 Coding method for minimum storage regeneration codes and method for restoring of storage nodes
CN104506506A (en) * 2014-12-15 2015-04-08 齐宇庆 Electronic information security storage system and storage method
CN105681425A (en) * 2016-01-22 2016-06-15 广东顺德中山大学卡内基梅隆大学国际联合研究院 Multi-node repairing method and system based on distributed storage system
CN107086870A (en) * 2017-03-16 2017-08-22 东莞理工学院 Repair the MDS array codes coding and coding/decoding method of more piece point failure
CN107395319A (en) * 2017-06-16 2017-11-24 哈尔滨工业大学深圳研究生院 Code-rate-compatible polarization code encoding method and system based on punching

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9575846B2 (en) * 2014-07-24 2017-02-21 At&T Intellectual Property I, L.P. Distributed storage of data
US10437525B2 (en) * 2015-05-27 2019-10-08 California Institute Of Technology Communication efficient secret sharing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102624866A (en) * 2012-01-13 2012-08-01 北京大学深圳研究生院 Data storage method, data storage device and distributed network storage system
CN103688514A (en) * 2013-02-26 2014-03-26 北京大学深圳研究生院 Coding method for minimum storage regeneration codes and method for restoring of storage nodes
CN103688515A (en) * 2013-03-26 2014-03-26 北京大学深圳研究生院 Method for encoding minimum bandwidth regeneration codes and repairing storage nodes
CN104506506A (en) * 2014-12-15 2015-04-08 齐宇庆 Electronic information security storage system and storage method
CN105681425A (en) * 2016-01-22 2016-06-15 广东顺德中山大学卡内基梅隆大学国际联合研究院 Multi-node repairing method and system based on distributed storage system
CN107086870A (en) * 2017-03-16 2017-08-22 东莞理工学院 Repair the MDS array codes coding and coding/decoding method of more piece point failure
CN107395319A (en) * 2017-06-16 2017-11-24 哈尔滨工业大学深圳研究生院 Code-rate-compatible polarization code encoding method and system based on punching

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
分布式存储系统中再生码的性能分析和优化设计;邓锐;《哈尔滨工业大学硕士论文》;20171201;1-68 *
分布式存储系统中数据再生策略研究;李钧;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130315(第3(2013)期);I137-47 *
基于离散粒子群优化算法的合作感知调度方案;张星 等;《通信学报》;20170725;第38卷(第7期);175-185 *
基于稀疏随机矩阵的再生码构造方法;徐志强 等;《计算机应用》;20170710;第37卷(第7期);1948-1952,1959 *
基于简单再生码的带宽感知的分布式存储节点修复优化;丁尚 等;《软件学报》;20170112;第28卷(第8期);1940-1951 *

Also Published As

Publication number Publication date
CN108512553A (en) 2018-09-07

Similar Documents

Publication Publication Date Title
US9647698B2 (en) Method for encoding MSR (minimum-storage regenerating) codes and repairing storage nodes
KR101270815B1 (en) In-place transformations with applications to encoding and decoding various classes of codes
CN103688515B (en) The coding of a kind of minimum bandwidth regeneration code and memory node restorative procedure
CN100581064C (en) Low density parity check code decoder and method thereof
Tamo et al. The repair problem for Reed–Solomon codes: Optimal repair of single and multiple erasures with almost optimal node size
CN104219019A (en) Coding method and coding device
CN107086870A (en) Repair the MDS array codes coding and coding/decoding method of more piece point failure
CN101902228B (en) Rapid cyclic redundancy check encoding method and device
CN108512553B (en) Truncated regeneration code construction method for reducing bandwidth consumption
CN103746774A (en) Error resilient coding method for high-efficiency data reading
CN113297000A (en) RAID (redundant array of independent disks) coding circuit and coding method
CN111858169A (en) Data recovery method, system and related components
Li et al. On the sub-packetization size and the repair bandwidth of Reed-Solomon codes
CN113391946B (en) Coding and decoding method for erasure codes in distributed storage
CN108712232A (en) A kind of multi-code word parallel decoding method in continuous variable quantum key distribution system
CN111464300A (en) High-speed post-processing method suitable for continuous variable quantum key distribution
CN113098529B (en) Method, system and storage medium for transmitting additional information based on cyclic shift
CN109062724B (en) A kind of correcting and eleting codes conversion method and terminal
WO2017041232A1 (en) Encoding and decoding framework for binary cyclic code
WO2017041233A1 (en) Encoding and storage node repairing method for functional-repair regenerating code
Bhuvaneshwari et al. Review on LDPC codes for big data storage
CN115882876A (en) Data coding verification method, system, equipment, medium and circuit
CN104796158A (en) LDPC (low density parity check code) coding and decoding system platform for fixed information length
Sari Effects of puncturing patterns on punctured convolutional codes
CN115858230A (en) Maximum distance separable code construction, repair method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant