Background
With the rapid development of the internet, the application is more and more abundant, the number of users is more and more, and the data is also increased in a geometric level, the storage of mass data brings huge pressure to local storage, and a storage system is overwhelmed and is located at the edge of collapse, so that the overhead and pressure on the storage are reduced by adopting distributed storage.
In the early distributed storage system, three copy technologies were used to store and recover data, that is, when storing data, three copies of the same original data are copied to other storage nodes, and when data is lost, the copied data is used to recover the data directly. But the traditional three-copy strategy has obvious defects: (1) The fault tolerance rate of the hard disk fault is too low, and once the hard disk fault occurs, data recovery can be immediately caused to cause the data recovery of the cluster layer; (2) The hard disk utilization rate is low, the scheme adopting three copies is equivalent to that the hard disk utilization rate can only reach 33% at most, and in addition to other factors, the overall hard disk utilization rate of the cluster is probably less than 30%, so that the cost of the stored hard disk is increased undoubtedly; (3) the writing performance of the three copy techniques is very low; (4) When a single disk fault occurs and the single disk is damaged for a period of time, the data recovery of the cluster layer needs to be continued immediately, and the time cannot be controlled manually.
The erasure code technology related by the method well solves the problems. The erasure code is firstly applied in the communication field, mainly used for solving the problem of loss of some data in the transmission process, and is formed by segmenting transmitted signal data, then coding to generate a check bit, transmitting the check bit and original data together, and decoding and recovering the data without loss of the check bit in the data recovery process. With the development of the present, erasure coding technology is also applied to memory systems. The distributed storage system based on erasure codes has the core principle that original data is divided into a plurality of data blocks, then redundant blocks are obtained according to different erasure code algorithms, and then the redundant blocks are respectively stored in different nodes. And when the node fails, recovering the data according to the residual data blocks and the redundant blocks to obtain the lost data. By this method, the reliability and security of data are guaranteed. And it is readily seen that there are significant advantages in distributed storage: (1) The data with the same size is stored, the storage space occupied by the erasure code is very small, and approximately half of the storage space is saved compared with the traditional three-copy technology. (2) The space utilization rate is high and is more than twice of that of the traditional three-copy technology. (3) The conventional three-copy technique can only allow two nodes to fail, while the erasure code technique can allow a plurality of nodes to fail simultaneously. And (4) the cluster construction cost is low.
With the development of the present, erasure coding techniques are also based on storage systems. At present, erasure code technologies applied to distributed storage systems mainly include RS, X, event, and other codes. For a common RS code, which is based on a prior coding algorithm, n original data blocks are given in the coding stage, then m check data blocks are generated from the original data blocks, and finally, the n check data blocks are stored together. In the decoding stage, the original data can be recovered by arbitrarily taking out n data blocks from the n + m data blocks, that is, the lost data block is less than or equal to m. When the RS code is used for encoding and decoding, the inversion operation of the matrix is often involved, and meanwhile, multiplication in a finite field is involved, so that the realization is complex. The amount of processed data is limited, the time occupied by the operation of the matrix is long when the data is recovered, the coding and decoding efficiency is greatly limited, and the obtained redundant data is increased along with the increase of the data, so that the coding and decoding throughput rate is reduced.
Disclosure of Invention
In view of the above disadvantages in the prior art, the erasure code-based data recovery method, apparatus, device and storage medium provided by the present invention further improve the efficiency of storage coding.
In order to achieve the above purpose, the invention adopts the technical scheme that:
the scheme provides a data recovery method based on erasure codes, which comprises the following steps:
s1, encoding original data based on erasure codes by using a generating matrix to generate redundant bits, and storing the original data and the generated redundant data;
and S2, recovering the lost data by using the check matrix according to the original data and the generated redundant data, and completing the data recovery method.
The invention has the beneficial effects that: the invention reduces the time spent on data recovery, particularly the time spent on matrix operation during data recovery, and reduces the steps and complexity of matrix operation by optimizing the check matrix, thereby improving the efficiency of data recovery.
Further, the encoding process in step S1 includes the following steps:
a1, representing an encoding matrix A to be optimized based on erasure codes by using a vector A, A = [ a = [ a ] 1 ,a 2 ,a 3 ,…,a N ] T ∈R M ×N Wherein N is less than or equal to M, a n Represents a certain row of the coding matrix a, and N =1,2 M×N Representing an M × N matrix of positive real numbers;
a2, recordingEncoding N-1 behavior A in matrix A n ,A n =[a 1 ,a 2 ,a 3 ,…,a n-1 ,a n+1 ,…,a N ] T ∈R M×(N-1) Wherein a is N Is represented by A n Column N of (5), R M×(N-1) A matrix of positive real numbers representing M (N-1);
a3, introducing an encoding auxiliary permutation matrix T n,N Carrying out auxiliary multiplication on the coding matrix A to obtain a check bit, and storing the check bit;
a4, according to the coding auxiliary permutation matrix T n,N Exchanging the nth row and the N row in the coding matrix A with the coding matrix A;
a5, introducing a new auxiliary check matrix X, and calculating to obtain an augmentation matrix of the new auxiliary check matrix X according to the new auxiliary coding matrix X and the coding matrix A
Wherein R is
M×M Representing an M × M matrix of positive real numbers;
a6, according to the new auxiliary check matrix X augmentation matrix
Calculating to obtain a full rank matrix
Wherein the content of the first and second substances,
denotes a
n The conjugate transpose of (1);
a7, according to the full rank matrix
Calculating by utilizing linear algebra to obtain an operation equation of a coding matrix A and a permutation matrix of the coding matrix A;
a8, obtaining a new augmentation matrix of the auxiliary check matrix X according to the operation equation
Augmentation matrix for coding matrix A
An orthogonal complement projection matrix over the line space expansion;
a9, setting a row N = M, and decomposing a reversible encoding matrix A to be optimized;
a10, introducing a new auxiliary check row vector x according to the decomposed reversible encoding matrix A to be optimized
n Wherein the auxiliary check row vector x
n Satisfy the requirement of
R
(M-1)×1 Represents a positive real matrix of (M-1) xN;
a11, checking the row vector x according to the auxiliary
n And calculating by utilizing an orthogonal complementary projection matrix to obtain an auxiliary check line, wherein aiming at | | x
n When | l =1,
and A12, calculating to obtain the determinant values of the coding matrix A and the corresponding transpose matrix according to the auxiliary check row
Wherein A is
T A transposed matrix representing the coding matrix a;
and A13, calculating according to the determinant values of the coding matrix A and the corresponding transpose matrix to obtain a corresponding log value, and optimizing the row or column corresponding to the check matrix according to the log value, thereby completing the coding processing of the original data based on the erasure codes.
The beneficial effects of the further scheme are as follows: the invention improves the operations of matrix inversion, multiplication and the like in the process of storing and coding by optimizing the matrix, thereby reducing the time spent on the matrix operation and improving the data recovery efficiency.
Still further, the new augmented moment of the auxiliary check matrix X in step A5Matrix of
The expression of (a) is as follows:
wherein, I represents a unit array,
is an augmented matrix of the coding matrix a,
to represent
The conjugate transpose of (c).
The beneficial effects of the above further scheme are as follows: by augmenting the matrix
And the calculation of the check matrix is carried out to prepare for the value of the following specific data loss block.
Still further, the expression of the operation equation in step A7 is as follows:
wherein det [ AxA [ ]
T ]Representing an operational equation, A represents an encoding matrix, A
T Representing the conjugate transpose of the coding matrix A, T
n,N Representing the encoding auxiliary permutation matrix, P representing the specific value of the primitive variable obtained in the optimization process,
represents T
n,N Conjugate transpose of (a)
n Represents a certain row of the coding matrix a, and N =1, 2., N represents the total number of rows of the coding matrix a,
denotes a
n The conjugate transpose of (a) is performed,
an augmentation matrix representing the coding matrix a,
to represent
The conjugate transpose of (c).
The beneficial effect of the above further scheme is that: by calculating the value of the determinant of the encoding matrix a and the corresponding transpose matrix, the value of the square of the determinant is further obtained.
Still further, the expression of the determinant values of the coding matrix a and its corresponding transpose matrix in step a12 is as follows:
wherein the content of the first and second substances,
a value representing a determinant of the encoding matrix a and its corresponding transpose,
an augmentation matrix representing the coding matrix a,
to represent
The conjugate transpose of (a) is performed,
representing a certain row a of the coding matrix A
n Transpose of (x)
n Representing a secondary check row vector.
The beneficial effects of the above further scheme are as follows: the value of the determinant of the matrix a and the corresponding transpose matrix is encoded, and the log value of the determinant is further obtained.
Still further, the log value expression of the determinant of the encoding matrix a and the corresponding transpose matrix in step a13 is as follows:
wherein the content of the first and second substances,
a log value representing the determinant of the coding matrix a and its corresponding transpose,
representing a certain row a of the coding matrix A
n Conjugate transpose of (1), x
n Representing the secondary check row vector and delta the operator.
The beneficial effects of the above further scheme are: the condition of the scanning cycle at the time of data recovery is set according to the log value of the corresponding determinant.
Still further, the S2 includes the steps of:
b1, setting Fy according to the check bit n ]Is the source entropy of the n-th to-be-optimized check matrix based on the erasure code, and Fy n ]=C 1 Wherein, C 1 Represents a constant;
b2 according to said Fy n ]Using primitive variables P 1 Carrying out scaling processing on the check matrix based on the erasure codes;
b3, using the source entropy of the check matrix, taking the log value of the determinant of the coding matrix A and the corresponding transpose matrix as a regularization item, and calculating to obtain a specific block number Fs corresponding to the lost data according to the generated redundant data;
b4, introducing a diagonal matrix D of a full rank to perform auxiliary operation on the scaling of the check matrix based on the erasure codes to obtain a coding auxiliary matrix x;
b5, calculating to obtain an auxiliary check matrix X according to the identity matrix I and the coding auxiliary matrix X, wherein I = X X;
b6, constant C 2 Is the log value of determinant of the coding matrix A;
b7 according to constant C 1 And constant C 2 Calculating to obtain primitive number P 2 ;
The primitive number P 2 The expression of (a) is as follows:
wherein, fy n ]Representing the source entropy of the n-th check matrix to be optimized based on erasure codes, C 1 And C 2 Each represents a constant, M represents the total number of columns of the encoding matrix a, M represents the number of columns in the encoding matrix a, and M =1, 2.. Multidot.m;
b8, according to the primitive number P 2 Scaling the check matrix, and numbering Fs corresponding to the lost data]And the lost data is recovered.
The beneficial effects of the further scheme are as follows: the invention simplifies the multiplication operation of the matrix by optimizing the row or the column corresponding to the check matrix, thereby improving the data recovery efficiency.
The invention also discloses a data recovery device based on the erasure code, which comprises:
the generating matrix module is used for coding original data based on erasure codes, generating redundant bits and storing the original data and the generated redundant data;
and the check matrix module is used for recovering the lost data according to the original data and the generated redundant data.
The invention also discloses a data recovery device based on the erasure code, which comprises:
one or more processors; and
storage means for storing at least one program;
the at least one program is executed by the one or more processors to implement the data recovery method.
The invention also discloses a computer readable storage medium, wherein at least one computer execution instruction or at least one program is stored in the computer readable storage medium, and the at least one computer execution instruction or the at least one program is executed by one or more processors to realize the data recovery method.
The beneficial effects of the invention are: the invention reduces the time spent on data recovery, particularly the time spent on matrix operation during data recovery, and reduces the steps and complexity of matrix operation by optimizing the check matrix, thereby improving the efficiency of data recovery.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
Example 1
As shown in fig. 1, the present invention provides an erasure code-based data recovery method, which is implemented as follows:
s1, encoding original data based on erasure codes by using a generating matrix to generate redundant bits, and storing the original data and the generated redundant data, wherein the encoding in the step S1 comprises the following steps:
a1, representing an encoding matrix A to be optimized based on erasure codes by using a vector A, A = [ a = 1 ,a 2 ,a 3 ,…,a N ] T ∈R M ×N Wherein N is less than or equal to M, a n Represents a certain row of the coding matrix a, and N =1,2The total number of rows, M, R, represents the total number of columns of the coding matrix A M×N Representing an M × N matrix of positive real numbers;
a2, recording N-1 behavior A in the coding matrix A n ,A n =[a 1 ,a 2 ,a 3 ,…,a n-1 ,a n+1 ,…,a N ] T ∈R M×(N-1) Wherein a is N Is shown as A n Column N of (5), R M×(N-1) A matrix of positive real numbers representing M (N-1);
a3, introducing an encoding auxiliary permutation matrix T n,N Carrying out auxiliary multiplication on the coding matrix A to obtain a check bit, and storing the check bit;
a4, according to the coding auxiliary permutation matrix T n,N Exchanging the nth row and the N row in the coding matrix A with the coding matrix A, wherein N represents the nth row in the coding matrix A;
a5, introducing a new auxiliary check matrix X, and calculating to obtain an augmentation matrix of the new auxiliary check matrix X according to the new auxiliary coding matrix X and the coding matrix A
Wherein R is
M×M Representing an M × M matrix of positive real numbers;
novel augmentation matrix of auxiliary check matrix X
The expression of (a) is as follows:
wherein, I represents a unit array,
is an augmented matrix of the coding matrix a,
to represent
The conjugate transpose of (1);
a6, according to the new auxiliary check matrix X augmentation matrix
Calculating to obtain a full rank matrix
Wherein the content of the first and second substances,
denotes a
n The conjugate transpose of (1);
a7, according to the full rank matrix
And calculating by using linear algebra to obtain an operation equation of the coding matrix A and a permutation matrix of the coding matrix A, wherein the operation equation expression is as follows:
wherein det [ AxA [ ]
T ]Representing the operational equation, A represents the coding matrix, A
T Representing the conjugate transpose of the coding matrix A, T
n,N Representing the encoded auxiliary permutation matrix, P representing the specific value of the primitive variable obtained in the optimization process,
represents T
n,N Conjugate transpose of (a)
n Represents a certain row of the coding matrix a, and N =1, 2., N represents the total number of rows of the coding matrix a,
denotes a
n The conjugate transpose of (a) is performed,
an augmentation matrix representing the coding matrix a,
to represent
The conjugate transpose of (1);
a8, obtaining a new augmentation matrix of the auxiliary check matrix X according to the operation equation
Augmentation matrix for coding matrix A
An orthogonal complement projection matrix over the line space expansion;
a9, setting a row N = M, and decomposing a reversible encoding matrix A to be optimized;
a10, introducing a new auxiliary check row vector x according to the decomposed reversible encoding matrix A to be optimized
n Wherein the row vector x is checked auxiliarily
n Satisfy the requirement of
R
(M-1)×1 Represents a positive real matrix of (M-1) xN;
a11, checking the row vector x according to the auxiliary
n And calculating by utilizing an orthogonal complementary projection matrix to obtain an auxiliary check line, wherein aiming at | | x
n When | l =1,
and A12, calculating to obtain the determinant values of the coding matrix A and the corresponding transpose matrix according to the auxiliary check row:
wherein the content of the first and second substances,
representing an encoding matrix A and its corresponding transpose matrixThe value of the determinant of (a) is,
an augmented matrix representing the encoding matrix a,
to represent
The conjugate transpose of (a) is performed,
representing a certain row a of the coding matrix A
n Transpose of (x)
n Representing a secondary check row vector;
a13, calculating according to the determinant values of the coding matrix A and the corresponding transpose matrix to obtain a corresponding log value, and optimizing the row or column corresponding to the check matrix according to the log value, thereby completing the coding processing of the original data based on the erasure codes;
the log-valued expression of the determinant of the coding matrix a and its corresponding transpose is as follows:
wherein the content of the first and second substances,
a log value representing the determinant of the coding matrix a and its corresponding transpose,
representing a certain row a of the coding matrix A
n Conjugate transpose of (c), x
n Representing a secondary check row vector;
s2, recovering the lost data by using the check matrix according to the original data and the generated redundant data to finish the data recovery method, wherein the realization method comprises the following steps:
b1, setting Fy according to the check bit n ]Is a baseSource entropy of the n-th to-be-optimized check matrix of erasure codes, and Fy n ]=C 1 Wherein, C 1 Represents a constant;
b2, according to said Fy n ]Using primitive variables P 1 Carrying out scaling processing on the check matrix based on the erasure codes;
b3, using the source entropy of the check matrix, taking the log value of the determinant of the coding matrix A and the corresponding transpose matrix as a regularization item, and calculating according to the generated redundant data to obtain a specific block number Fs corresponding to the lost data;
b4, introducing a diagonal matrix D of a full rank to perform auxiliary operation on the scaling of the check matrix based on the erasure codes to obtain a coding auxiliary matrix x;
b5, calculating to obtain an auxiliary check matrix X according to the identity matrix I and the coding auxiliary matrix X, wherein I = X X;
b6, constant C 2 Is the log value of determinant of the coding matrix A;
b7 according to constant C 1 And constant C 2 Calculating to obtain primitive number P 2 ;
The primitive number P 2 The expression of (a) is as follows:
wherein, fy n ]Representing the source entropy of the n-th parity check matrix to be optimized based on erasure codes, C 1 And C 2 Each represents a constant, M represents the total number of columns of the encoding matrix a, M represents the number of columns in the encoding matrix a, and M =1, 2.. Multidot.m;
b8, according to the primitive number P 2 Scaling the check matrix, and numbering Fs according to the specific block number corresponding to the lost data]And the lost data is recovered.
In this embodiment, the check matrix source to be optimized is continuously scaled according to steps B1 to B8 to simplify the operation of the matrix in encoding and decoding, thereby improving the efficiency of the matrix algorithm.
In this embodiment, the data recovery method is applicable to RS codes, X codes, event codes, and the like, and simultaneously, a certain row of a matrix is processed independently in the encoding and decoding optimization process, so that some complex operations in the encoding and decoding matrix optimization process are avoided, and the method can also be applied to wider non-orthogonal matrix classes.
Example 2
The invention also provides a data recovery device based on erasure codes, which comprises:
the generating matrix module is used for coding original data based on erasure codes, generating redundant bits and storing the original data and the generated redundant data;
and the check matrix module is used for recovering the lost data according to the original data and the generated redundant data.
In the embodiment, the original data based on the erasure codes are encoded by using the generating matrix to generate redundant bits, and the original data and the generated redundant data are stored; and recovering the lost data by using the check matrix according to the original data and the generated redundant data to finish the data recovery method. The invention reduces the time spent on data recovery, particularly the time spent on matrix operation during data recovery, and reduces the steps and complexity of matrix operation by optimizing the check matrix, thereby improving the efficiency of data recovery.
Example 3
The invention also provides a data recovery device based on erasure codes, which comprises:
one or more processors; and
storage means for storing at least one program;
the at least one program is executed by the one or more processors to implement the data recovery method of embodiment 1.
In this embodiment, the one or more processors may be a Central Processing Unit (CPU), or may be other general-purpose processors, digital Signal Processors (DSP), application Specific Integrated Circuits (ASIC), field-Programmable Gate arrays (FPGA) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In this embodiment, the memory is configured to store at least one program, and the processor executes or executes the program stored in the memory to implement the data recovery method described in embodiment 1. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Example 4
The present invention also provides a computer-readable storage medium, in which at least one computer-executable instruction or at least one program is stored, and the at least one computer-executable instruction or the at least one program is executed by one or more processors to implement the data recovery method described in embodiment 1.
In this embodiment, the computer-readable storage medium includes, but is not limited to, various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Through the design, the time spent on data recovery is reduced, particularly the time spent on matrix operation during data recovery is reduced, and the steps and the complexity of the matrix operation are reduced by optimizing the check matrix, so that the efficiency of data recovery is improved.