Summary of the invention
The problem existing for prior art, the low bandwidth data reconstructing method for binary coding redundant storage system of the network bandwidth pressure bringing to storage system when fundamental purpose of the present invention is to provide a kind of reduction to recover obliterated data piece.
For achieving the above object, the invention provides a kind of low bandwidth data reconstructing method for binary coding redundant storage system, this binary coding redundant storage system comprises an encoder matrix and a data check matrix, this data check matrix comprises row vector and column vector, when the memory node generation of binary coding redundant storage system, damage and cause dropout of data block, the data block of losing is recovered, and this low bandwidth data reconstructing method comprises the steps (1) to step (4):
(1) set up the data block of this loss and the corresponding relation between this data check row matrix vector, and determine low bandwidth check matrix according to the submatrix that in binary coding redundant storage system, the corresponding data check matrix column of obliterated data piece vector does not form;
(2) judge that whether low bandwidth check matrix is more than one;
(3) if low bandwidth check matrix is more than one, whether identical in judgement if utilizing each low bandwidth check matrix to recover needed not obliterated data number of blocks to obliterated data piece, i.e. whether judgement utilizes different low bandwidth check matrixes to recover to obliterated data piece the I/O pressure that brings to each memory node of binary coding redundant storage system identical;
(4) not identical if utilize different low bandwidth check matrixes to recover to obliterated data piece the I/O pressure that brings to each memory node of binary coding redundant storage system, select required reconstruct data piece (data block of not losing) minimum, the low bandwidth check matrix of memory node I/O pressure influence minimum is carried out to data reconstruction to the data block of losing.
Further, when this step (2) judgement low bandwidth check matrix only has one, utilize this low bandwidth check matrix to carry out data reconstruction to the data block of losing.
Further, when this step (3) judgement utilizes each low bandwidth check matrix to recover reconstruct to obliterated data piece, the data volume of needed reconstruct data piece (data block of not losing) is identical, identical to the I/O pressure influence of each memory node of binary coding redundant storage system, select arbitrarily a low bandwidth check matrix to carry out data reconstruction to the data block of losing.
Further, utilize low bandwidth check matrix and part not obliterated data piece the data block of losing is carried out to data reconstruction.
Further, this data check matrix is H
(k+r) mrmthis data check matrix comprises (k+r) m row vector and rm column vector, this damage nodes is r ' (1≤r '≤r), the data block that is system loss is r ', the microdata piece that the data block r ' of this loss comprises is r ' m, and this step (1) comprises following steps (11) to step (12):
(11) from data check matrix H
(k+r) mrmr ' m column vector of middle selection, makes the non-singular matrix that matrix that the corresponding row vector of microdata piece of matrix that r ' m column vector form and loss forms is (r ' m) * (r ' m);
(12) be somebody's turn to do the non-singular matrix H of (r ' m) * (r ' m)
(r ' m) (r ' m)for definite low bandwidth check matrix.
Further, this step (11) comprises following steps (111) to step (115):
(111) computational data check matrix H
(k+r) mrmeach column vector in the number of element " 1 ";
(112) from data check matrix H
(k+r) mrmin extract the corresponding row vector of obliterated data piece, by the obliterated data piece extracting, form binary matrix H
(r ' m) (rm), by data check matrix H
(k+r) mrmin remaining row vector form binary matrix H
(k+r-r ') mrm, binary matrix H
(k+r-r ') mrmthe vector of rm bottom formed a unit matrix, binary matrix H
(k+r-r ') mrmthe individual vector of k-r ' on top forms binary matrix H
(k-r ') mrm;
(113) determine successively the binary matrix H of the individual vector formation of k-r ' on this top
(k-r ') mrmthe number of element in row vector " 0 ", when the number of " 0 " is more than or equal to r ' m in this row vector, records the column vector at each " 0 " element place;
(114) in the column vector at " 0 " element place of this record, further whether searching exists " 0 " element number to be more than or equal to the row vector of r ' m, if nothing, the determined column vector of recording step (113); If have, record new column vector;
(115), according to the number of " 1 " in each group column vector of step (114) record, determine " 1 " element and be r ' m minimum column vector, and determine and be somebody's turn to do " 1 " element and minimum r ' m the H that column vector is corresponding
(r ' m) (r ' m)order be full rank, form the non-singular matrix H of (r ' m) * (r ' m)
(r ' m) (r ' m).
Further, this step " utilize low bandwidth check matrix and part not obliterated data piece the data block of losing is carried out to data reconstruction " comprises following steps: utilize low bandwidth check matrix H
(r ' m) (r ' m)determine the microdata piece that need to participate in data reconstruction; Form a r ' m equation that includes r ' m the microdata piece of losing, utilize the solving equations of this r ' m equation formation to go out r ' m data block of loss.
Further, if this binary matrix H
(k-r ') mrmwhile being all less than r ' m without the number of " 0 " element in row vector, cannot obtain low bandwidth check matrix, now, system, when recovering obliterated data piece, cannot obtain the low bandwidth data reconstructing method that reduces system storage node I/O bandwidth.
With respect to prior art, first, the present invention recovers obliterated data piece by determining the low bandwidth check matrix of required reconstruct data amount minimum, in the time of can reducing data block reconstruct, system storage node is carried out to the network bandwidth consumption that data read, reduce storage system internal network and safeguard the pressure of bandwidth, reduce the volume of transmitted data between internal system network, the reading times of reduction system to memory device; Secondly, the present invention can be according to the ruuning situation of data memory node, and the network bandwidth, and I/O situation is determined optimum data reconstruction strategy, to realize the minimum data block of system call, realizes the minimum obliterated data piece reconstructing method of safeguarding bandwidth.At aspects such as mass data storage system, sensor-based system networks, there is good using value.
Embodiment
Below in conjunction with accompanying drawing, describe the specific embodiment of the present invention in detail.
As shown in Figure 1, for existing, utilize binary coding redundant storage system to carry out the schematic diagram of code storage graphic file.Graphic file to be stored is divided into d
1,1, d
1,1, d
1,3individual microdata piece, corresponding D
1, D
2individual data block, also can be called macrodata piece.The corresponding memory node of each macrodata piece.Macrodata piece is the set of microdata piece, and microdata piece is minute module unit minimum in storage system, and for storage file, storage file varies in size, and the size of microdata piece is also different.Each macrodata piece consists of m microdata piece, and each macrodata piece with a global storage in different memory nodes.In cataloged procedure and decode procedure, with microdata Kuai Wei unit, carry out.When storage, with macrodata Kuai Wei unit, store.
If during memory node 1 damage, data block D so
1lose, i.e. corresponding microdata set of blocks d
1,1, d
1,1, d
1,3lose, at this moment just need to carry out decoding reconstruct to obliterated data piece.Wherein, in Fig. 1, the leftmost side is encoder matrix, by existing technology, encoder matrix can be converted to data check matrix, after document No. storage, has just determined the data check matrix recovering for data.Encoder matrix, for source document is carried out to redundancy encoding, produces checking data piece (redundant data piece); When memory node damage appears in system, during dropout of data block, data check matrix is for reconstructing the data block of loss.
As shown in Figure 2, be the low bandwidth data reconstructing method system flowchart that the present invention is directed to binary coding redundant storage system.This low bandwidth data reconstructing method comprises the steps:
There is damage and cause dropout of data block in the memory node of S1, binary coding redundant storage system;
S2, determine the data check matrix of storage system, according to the coding principle of data storage, after document No. storage, just determined the data check matrix recovering for data, this data check matrix comprises row vector and column vector;
S3, judge whether to determine low bandwidth check matrix the whole low bandwidth check matrixes that utilize the file storage corresponding data check matrix of encoder matrix initial used and the relation between data block to determine reconstruction of lost data block.Definite method of low bandwidth check matrix is: set up the data block of this loss and the corresponding relation between this data check row matrix vector, and determine low bandwidth check matrix according to the submatrix that in binary coding redundant storage system, the corresponding data check matrix column of obliterated data piece vector does not form.If can not determine, enter step S4, if can determine, enter step S5;
S4, carry out data reconstruction according to the conventional method;
S5, judged whether if not, to enter step S6 by more than one low bandwidth check matrix, if so, entered step S7;
S6, utilize this low bandwidth check matrix and the intact memory node corresponding with this low bandwidth check matrix to read corresponding data block the data block of losing is carried out to data reconstruction;
Whether S7, judgement judgement utilize each low bandwidth check matrix to recover needed not obliterated data number of blocks to obliterated data piece identical, whether i.e. judgement utilizes different low bandwidth check matrixes to recover to obliterated data piece the I/O pressure that brings to each intact memory node of binary coding redundant storage system identical, if, enter step S8, if not, enter step S9;
S8, select a low bandwidth check matrix and the intact memory node corresponding with this low bandwidth check matrix to read corresponding data block arbitrarily the data block of losing is carried out to data reconstruction;
S9, the I/O that calculates the corresponding memory node of each low bandwidth check matrix read pressure sum;
S10, select whole I/O to read low bandwidth check matrix corresponding to the memory node group of pressure sum minimum as final restructuring matrix, utilize this matrix intact memory node corresponding thereto to read corresponding data block the data block of losing is reconstructed.
The principle of said method is: from check matrix, according to the situation of obliterated data piece, select low bandwidth check matrix, low bandwidth check matrix has been determined needed not obliterated data number of blocks in the process of reconstruction of lost data block, and then in system rejuvenation, because the needed data block of low bandwidth check matrix is less than the needed data block of original method, thereby, the I/O pressure of each memory node of meeting reduction system in data recovery procedure.
If this data check matrix is H
(k+r) mrmthis data check matrix comprises (k+r) m row vector and rm column vector, this damage nodes is r ' (1≤r '≤r), the data block that is system loss is r ', the microdata piece that the data block r ' of this loss comprises is r ' m, and usually, damage node number can not surpass r, the data block number of losing can not surpass r, and corresponding microdata piece number of losing can not surpass r * m.Due to when 1≤r ' <r, from data check matrix H
(k+r) mrmthe low bandwidth check matrix of r ' m column vector composition recovery obliterated data piece of middle selection has multiple choices method.Thereby whether research exists the wider method of low-dimensional protecting band, to go out to need the low bandwidth check matrix of minimum reconstruct bandwidth be one of innovative point of the present invention to How to choose.Definite method of above-mentioned steps S3 low bandwidth check matrix comprises the steps that S31 is to step S32:
S31, from data check matrix H
(k+r) mrmr ' m column vector of middle selection, makes the non-singular matrix that matrix that the corresponding row vector of microdata piece of matrix that r ' m column vector form and loss forms is (r ' m) * (r ' m);
The non-singular matrix H of S32, this (r ' m) * (r ' m)
(r ' m) (r ' m)for definite low bandwidth check matrix.
This step S31 comprises following steps S311 to step S315:
S311, computational data check matrix H
(k+r) mrmeach column vector in the number of element " 1 ";
S312, from data check matrix H
(k+r) mrmin extract the corresponding row vector of obliterated data piece, by the obliterated data piece extracting, form binary matrix H
(r ' m) (rm), by data check matrix H
(k+r) mrmin remaining row vector form binary matrix H
(k+r-r ') mrm, binary matrix H
(k+r-r ') mrmthe vector of rm bottom formed a unit matrix, binary matrix H
(k+r-r ') mrmthe individual vector of k-r ' on top forms binary matrix H
(k-r ') mrm.Due to each column vector of this unit matrix corresponding data block only, therefore, this unit matrix will not affect the number of the data block that participates in restructuring procedure.Therefore, usable range of the present invention can be limited to binary matrix H
(k+r-r ') mrmtop, the binary matrix H being built by the individual vector of k-r '
(k-r ') mrm.
S313, the binary matrix H that the individual vector of the k-r ' on definite this top forms successively
(k-r ') mrmthe number of element in row vector " 0 ", when the number of " 0 " is more than or equal to r ' m in this row vector, records the column vector at each " 0 " element place;
S314, in the column vector at " 0 " element place of this record, further whether searching exists " 0 " element number to be more than or equal to the row vector of r ' m, if nothing, the determined column vector of recording step S313; If have, record new column vector, so circulation, and record the determined column vector of each circulation.
S315, according to the number of " 1 " in each group column vector of step S314 record, determine " 1 " element and be r ' m minimum column vector, and determine and be somebody's turn to do " 1 " element and minimum r ' m the H that column vector is corresponding
(r ' m) (r ' m)order be full rank, form the non-singular matrix H of (r ' m) * (r ' m)
(r ' m) (r ' m).
After definite low bandwidth check matrix, utilize low bandwidth check matrix and not obliterated data piece the data block of losing is carried out to data reconstruction, utilize low bandwidth check matrix H
(r ' m) (r ' m)determine the microdata piece that need to participate in data reconstruction; Form a r ' m equation that includes r ' m the microdata piece of losing, utilize the solving equations of this r ' m equation formation to go out r ' m data block of loss.
In addition, if this binary matrix H
(k-r ') mrmwhile being all less than r ' m without the number of " 0 " element in row vector, low bandwidth check matrix cannot be obtained, data reconstruction can only be carried out according to the conventional method.
Embodiment mono-
When there is node damage in storage system inside, for a storage system by tradition (n, k) MDS correcting and eleting codes structure, when damage appears in have in system≤n-k node, system all needs to call k the data on node to be recovered it, and the correcting and eleting codes of structure has n-k≤k conventionally.
If the check matrix of (6,3) the Fan Demeng system correcting and eleting codes building on scale-of-two as shown in Figure 3, due to β .H
(k+r) r=0, wherein β represents to deposit in source document piecemeal and the checking data piecemeal of storage system, with [D
1, D
2, D
3, D
4, L, D
10, L, D
(k+r) r] represent.If in storage system there is damage, i.e. storage file piecemeal [D in first memory node
1, D
2, D
3] go out active, β=[X
1, X
2, X
3, D
4, L, D
10, L, D
18], [X wherein
1, X
2, X
3] the corresponding data block for having lost, and [D
10, L, D
18] be checking data piece.Obviously, no matter select data check matrix---any three column vectors in H matrix, as the foundation of recovering obliterated data piece, all can have three checking data pieces to participate in reconstruct.Thereby, can only be by observing the submatrix [l of H matrix
4, l
5, L, l
9] distribution situation of " 0 " " 1 " is determined the reconstructing method of low bandwidth in T.
From H matrix, if will reconstruct three data blocks of loss, need to from 9 column vectors of H matrix, select three column vectors, and the rank of matrix that three column vectors are formed is 3.Nearly step ground for obtaining minimum reconstruct bandwidth reconstructing method, and consumes minimum calculated amount in decode procedure, has following process (as shown in Figure 3):
When restructuring procedure does not need data block D
4participate in computing, the column vector that in H matrix column vector, corresponding the 4th element is " 0 " has: C
2, C
3, C
5, C
9;
Column vector C wherein
2in have 6 elements on position for " 1 ", can be expressed as C
2(6); Column vector C
3in have 4 elements on position for " 1 ", can be expressed as C
3(4); Column vector C
5in have 7 elements on position for " 1 ", can be expressed as C
5(7); Column vector C
9in have 7 elements on position for " 1 ", can be expressed as C
9(7);
When restructuring procedure does not need data block D
5while participating in computing, the column vector that in H matrix column vector, corresponding the 5th bit element is " 0 " has: C
1, C
3, C
7, column vector can be expressed as: C
1(5), C
3(4), C
7(6); Same, when restructuring procedure does not need data block D
6while participating in computing, the column vector that in H matrix column vector, corresponding the 6th element is " 0 " has: C
1, C
2, C
6, C
8, C
9, column vector can be expressed as: C
1(5), C
2(6), C
6(8), C
8(5), C
9(7); When restructuring procedure does not need data block D
7while participating in computing, the column vector that in H matrix column vector, corresponding the 7th bit element is " 0 " has: C
3, C
4, C
5, C
6, C
8, column vector can be expressed as: C
3(4), C
4(8), C
5(7), C
7(6), C
8(5); When restructuring procedure does not need data block D
8while participating in computing, the column vector that in H matrix column vector, corresponding the 8th bit element is " 0 " has: C
1, C
5, C
8, column vector can be expressed as: C
1(5), C
5(7), C
8(5); When restructuring procedure does not need data block D
9while participating in computing, the column vector that in H matrix column vector, corresponding the 9th bit element is " 0 " has: C
2, C
3, column vector can be expressed as: C
2(6), C
3(4).Owing to only having two elements in this row vector, be zero, explanation, when macrodata piece is reconstructed, must have data block D
9participate in recovering reconstruct.
(6,3) Fan Demeng systematic code on binary field, can be easy to determine the low bandwidth restructing algorithm that can save a microdata piece.But can not save the restructing algorithm of two microdata pieces, because there is no on different pieces of information piece identical three column vectors with tense marker in check matrix.Can be from Search Results optional one be recovered Vector Groups, is used for the data block of reconstruction of lost.
For this low bandwidth restructuring procedure, because former method need to be called the macrodata piece on 3 memory nodes when the macrodata piece of reconstruct, i.e. 9 microdata pieces, and utilize the present invention, need 8 microdata pieces,, for whole system, can save 11.1% reconstruct bandwidth, this,, for the limited storage system of internal network, has certain Practical significance.
Embodiment bis-
In order to further illustrate validity of the present invention, the decode procedure that the present invention is directed to STAR code carries out low bandwidth optimization according to the method for the present invention's proposition, the information scale m=5 of STAR code in the present embodiment, and check column scale is 3.According to the building process of STAR code, can obtain, when the information scale m=5 of STAR code, its generator matrix can be expressed as:
P can be expressed as:
The character by linear block codes can obtain this coded data check matrix H, if H is expressed as:
Q is carried out to mark, can obtain:
The low bandwidth data reconstructing method that utilizes the present invention to propose, can obtain the low bandwidth data reconstructing method of system in difference damage situation.During as the damage of first memory node in system, i.e. microdata piece c
0,0, c
1,0, c
2,0, c
3,0during loss, i.e. column vector C in its available Q
3, C
4, C
5, C
6the data block that the submatrix forming carries out losing is reconstructed, c in restructuring procedure
1,1, c
0,2, c
1,2, c
0,3to not participate in restructing operation,, for former reconstructing method, while utilizing the data block on first memory node of this method reconstruct, system can be saved 20% network data transmission bandwidth.As table 1, during for different memory node damage, utilize the effect reaching of low bandwidth data reconstructing method of the present invention:
The low bandwidth restructing algorithm performance of table 1STAR code
While there is two node damages in system simultaneously, each file will have 8 corresponding dropout of data block, from data check matrix, if reconstruct 8 data blocks of each File lose, need to select in data check matrix 8 column vectors, and the rank of matrix by these 8 column vector structures is 8, the data block that restructural goes out to lose.From data check matrix, for any two nodes damage in system, can select the reconstructing method that can save 1 microdata piece.During the data block in recovering two memory nodes, 19 data blocks of minimum needs, can save a data block.
More than introduced a kind of low bandwidth data reconstructing method for binary coding redundant storage system.The present invention is according in data check matrix " 0; 1 " characteristic distributions, utilize optimizing search method, and read pressure according to the I/O of each memory node, find optimum low bandwidth check matrix and for the restructuring procedure of obliterated data piece, the method can reduce the reconstruct bandwidth of system when reconstruct data piece, alleviates the bandwidth pressure of storage system internal network.The present invention has versatility, can be applied in all code storage systems of utilizing binary matrix structure.The present invention is not limited to above embodiment, and any technical solution of the present invention that do not depart from only carries out to it improvement or change that those of ordinary skills know, within all belonging to protection scope of the present invention.