WO2018072294A1

WO2018072294A1 - Method for constructing check matrix and method for constructing horizontal array erasure code

Info

Publication number: WO2018072294A1
Application number: PCT/CN2016/110613
Authority: WO
Inventors: 唐聃; 舒红平; 王亚强
Original assignee: 成都信息工程大学
Priority date: 2016-10-17
Filing date: 2016-12-18
Publication date: 2018-04-26
Also published as: CN106484559B; CN106484559A

Abstract

A method for constructing a check matrix and a method for constructing a horizontal array erasure code. A check matrix H of a horizontal array erasure code may be expressed in a standard form: H=[P|Ir]; each row of the check matrix H represents a check equation, which indicates that the binary XOR sum of code elements of a horizontal array erasure code corresponding to "1" in the row is "0". The method for constructing a check matrix of a horizontal array erasure code is capable of constructing a corresponding check matrix of a horizontal erasable code according to a preset fault-tolerant quantity and storage efficiency, and then constructing a corresponding horizontal array erasure code. The construction method is simple to implement, and capable of constructing an array code having a theoretically unrestricted fault tolerance capability; moreover, no strict constraint condition needs to be satisfied during construction, and the operational efficiency is very high; after the array code is determined, the update cost and the restoration cost are fixed constants and will not increase with the expansion of the scale of the system or the improvement of the fault tolerance capability.

Description

Method for constructing check matrix and construction method of horizontal array erasure code

Technical field

The invention belongs to the technical field of computer information storage, in particular to a method for constructing a check matrix and a method for constructing a horizontal array erasure code.

Background technique

With the rapid growth of networks and servers, the capacity of data is increasing, and the importance and security of data is also being taken more seriously. In order to cope with the problem of data storage reliability caused by the rapid growth of data volume, mass storage systems must be able to provide secure storage services, as well as continuous online operation and efficient and reliable fault tolerance mechanism. At the same time, in order to improve the concurrency of data access. Efficiency and cost reduction, it is usually effective to build a storage system using multiple storage nodes. The storage system is usually a network-based distributed storage system, and its prototype can be traced back to the centralized RAID (Redundant Array of Inexpensive Disk). Cheap redundant disk array) system.

RAID disk groups built with high storage density. When a disk failure needs to be corrected for terabytes of data, the reconstruction process takes a long time, often taking a day or more; especially in production systems, the reconstruction time is longer. When rebuilding large-capacity storage content for a long time, the possibility of failure of the second disk and the third disk that make up the RAID group is greatly increased; during the reconstruction process, one disk failure significantly increases the access load of other disks. The probability of causing other disk failures to increase dramatically increases the availability of the storage system.

For traditional data redundancy protection technology in large-capacity disk storage systems Insufficient, a more efficient data redundancy technology, the erasure code, appears in mass storage systems characterized by distributed, large-scale, and large-capacity disk storage. The erasure code originated from the field of communication transmission, and is now gradually appearing in large-scale storage systems, especially distributed storage environments, for data redundancy protection. The basic idea of erasure code technology is to divide a piece of data into k-block original data, and obtain m-block redundant data based on k-block original data redundancy calculation. For the k+m block data, when any m block element error occurs, the storage system can recover the original k block data through the reconstruction algorithm, and the erasure code redundancy protection technology solves the problem that the traditional redundancy protection technology is not suitable for distribution. The problem of producing a storage system.

Compared with the traditional mirroring and copying technology, the method based on the erasure code has the advantages of low redundancy and high disk utilization. For the cloud computing and big data services, the diversity of mass storage systems and the large-scale storage capacity requirements can better adapt to the distributed storage application environment. Therefore, in the case of the storage system, in the case of determining the size of the storage block and the desired fault tolerance, it is important to construct the erasure code.

Summary of the invention

It is an object of the present invention to provide a method for constructing a check matrix of a horizontal array erasure code suitable for a predetermined storage block size and a desired fault tolerance, and to obtain a corresponding horizontal array correction The construction method of the code deletion.

To achieve the above objective, the specific technical solution of the method for constructing the horizontal array erasure code check matrix of the present invention is: the memory array corresponding to the check matrix has m rows, and the fault tolerance is f, wherein m and f are positive An integer; let I, I ₁ , I ₂ , ... I _n-1 are the basic columns of the cyclic matrix of the _n- th order basic circulant matrix D, where I = I _n is a unit matrix; the specific steps of the method are:

S1: The basic columns I of the f t-th order cyclic matrices are arranged in a column, and the matrix formed is denoted as T ₁ ;

Where t is a positive integer, t=f*(m-1)+1;

S2: The other f*(m-1) distinct basic columns I ₁ , I ₂ , . . . , I _f* _(m-1) other than I in the t-order cyclic matrix are combined in a row-first order. a matrix, denoted as T ₂ ; there are (m-1) sub-matrices on the matrix T ₂ row, and f sub-matrices on the column;

S3: splicing two matrices T ₁ and T ₂ on a line, denoted as P=[T ₁ |T ₂ ];

S4: A new matrix H composed of a unit matrix I _{r of} f*t order and a matrix P on a line is a check matrix of the horizontal array erasure code:

Using a method for constructing a horizontal array erasure code obtained by the above method, all data element numbers of the horizontal array erasure code are arranged in a row priority in the data array portion, and all check element numbers are in the check array portion. The column is preferentially arranged, wherein each column vector of the sub-matrix P of the check matrix H of the horizontal array erasure code corresponds to one data element of the horizontal array erasure code, and each unit matrix I _r One column vector corresponds to one check element; each row of the check matrix H of the horizontal array erasure code represents a check equation, that is, the symbol binary exclusive OR of the "1" corresponding horizontal array erasure code in the row Is 0.

The check matrix H according to the present invention can be expressed as a standard form: H=[P|I _r ], where r=f*t, _Ir is an element matrix of r order, r is the number of check elements in the code . Each row of the check matrix H represents a check equation which indicates that the binary exclusive OR of the symbols of the horizontal array erasure code corresponding to "1" in the row is "0". Therefore, the standard form of the check matrix H of the present invention also reflects which information elements are determined by the check elements. The r rows of the check matrix H represent r check equations, and also represent the code words of the code determined by H. There are r check elements in it.

For the horizontal array erasure code, once the check matrix is determined, the horizontal array erasure code can be determined; and the array code is also a kind of linear block code, that is, through the check matrix, most of the properties of the corresponding array code can be Analysis shows. The method described above first constructs a check matrix using a basic cyclic matrix, and then determines a corresponding horizontal array erasure code from the check matrix.

The invention constructs a corresponding check matrix according to the preset fault tolerance quantity and storage efficiency, and then constructs a horizontal array erasure code through the check matrix, and the beneficial effects are as follows: (1) The construction method described in the method is simple to implement, It can construct an array erasure code with theoretically unrestricted fault tolerance, and it is not necessary to satisfy strong constraints when constructing. (2) Using the check matrix constructed by this method, the horizontal array erasure code is obtained. The binary operation is used in the compilation operation, which has extremely high computational efficiency, and the method is simple and easy to implement. (3) After the horizontal array erasure code is determined, the update cost and the repair cost are fixed constants. As the size of the system expands or the capacity for fault tolerance increases. In summary, the method of the present invention can improve the reliability of the storage system, and is suitable for a case where the amount of data of the company or the institution is large and the data stability is high.

DRAWINGS

The drawings are intended to provide a further understanding of the invention and are not intended to limit the invention.

1 is a schematic diagram of elements of a horizontal array erasure code according to the present invention.

2 is a schematic diagram of elements of a 3*4 horizontal array erasure code according to the present invention.

FIG. 3 is a correspondence diagram of a horizontal array erasure code element of the 3*4 of the present invention and its check matrix.

detailed description

The implementation of the present invention will be further described below in conjunction with the embodiments.

In any one of the horizontal array erasure codes, or only the data elements are stored, or only the check elements are stored, there is no case where there are data elements and check elements in one column. In the storage array, all of the portions that store data elements are referred to as data array portions, and all portions that store check elements are referred to as check array portions. As shown in FIG. 1, the symbol d _i represents the i-th data element, all data element numbers are arranged in row priority from 1 in the data array portion; the symbol c _j represents the j-th check element, and all check elements The sequence numbers are ranked first in column in the check array section with column priority. Where i and j are positive integers, and 1 ≤ i ≤ m·n, 1 ≤ j ≤ m·k.

The corresponding relationship between the horizontal array erasure code and its check matrix: the check matrix H of the present invention can be expressed as a standard form: H=[P|I _r ], _Ir is an element matrix of r order, and r is in the code The number of check elements. Each column vector of the matrix P corresponds to one data element of the horizontal array erasure code, and each column vector of the identity matrix _Ir corresponds to one check element. Generally, in an array storage system, one column corresponds to one storage node. When a node fails, it means that all the elements on the corresponding column of the node are invalid or become unknown.

For example, suppose a storage array of size 3*4, with the first 3 columns for raw data storage and the last column for storing calculated check data, as shown in Figure 2. Let the check relationship of three check elements and one data element be the following equation group (1):

Then the check matrix H of the horizontal array erasure code is as follows:

In this example, the check matrix H can obviously be divided into two sub-matrices, P and I, where I is a third-order identity matrix I ₃ , and P is a sub-matrix after the H matrix is removed from I ₃ , as shown in FIG. 3 . The correspondence between the columns of the check matrix and the horizontal array erasure code elements.

Embodiment 1

It is required to construct a horizontal array erasure code with a fault tolerance of f=1, and the block size m of the horizontal array erasure code is 2. The steps to construct the check matrix under this condition are as follows:

S1: The basic column I of one t-th order cyclic matrix is arranged in a column, and the formed matrix is denoted as T ₁ ; wherein t=1*(2-1)+1=2; the second-order basic cyclic matrix has two different The basic columns are I, I ₁ , where I is a 2nd order identity matrix; therefore, a 2nd order identity matrix is arranged in a column to form a matrix T ₁ : T ₁ = (I)

S2: Combine another dissimilar basic column I ₁ other than I in the second-order cyclic matrix into a matrix in a row-first order, denoted as T ₂ ; and (2-1) sub-matrices on the matrix T ₂ row The column contains 1 submatrix as follows: T ₂ = I ₁

S3: splicing the two matrices T ₁ and T ₂ on the line, and obtaining the matrix P as follows:

P=(T ₁ |T ₂ )=(I|I ₁ )

S4: Combining the unit matrix I _{2 of the} 1*2 order with the matrix P on the line to form a new matrix H as shown below, that is, a check matrix of the array erasure code:

At this point, the check matrix H is determined, and the corresponding array erasure code can also be determined: each column vector of the sub-matrix P of the check matrix H of the horizontal array erasure code corresponds to the horizontal array erasure code a data element, and each column vector of the identity matrix _Ir corresponds to a check element; each row of the check matrix H of the horizontal array erasure code represents a check equation, that is, a corresponding level of "1" in the row The symbol binary exclusive OR of the array erasure code is 0.

This check matrix H is suitable for use in an array memory system of 2 rows and 3 columns. Horizontal array The relationship between the data elements of the erasure code, the check elements and the columns in the check matrix can be obtained by linear equations (2) of the linear relationship between the check elements and the data elements in the array code system:

The element array structure of the horizontal array erasure code is as follows:

Embodiment 2

It is required to construct an array erasure code with a fault tolerance of f=2, and the block size m is also 2. The steps to construct the check matrix under this condition are as follows:

S1: The basic columns I of the two t-th order cyclic matrices are arranged in a column, and the formed matrix is denoted as T ₁ ; wherein t=2*(2-1)+1=3; the third-order basic cyclic matrix has three different The basic columns are I, I ₁ , I ₂ , where I is a 3rd order identity matrix; therefore, two 3rd order identity matrices are arranged in a column to form a matrix T ₁ as follows:

S2: The third-order matrix except the I cycle outside the other two substantially distinct columns I _1, I _2, in row major order are combined into a matrix, denoted as T _2; with a (2-1 rows on the matrix T ₂ ) Submatrices with 2 submatrices on the column, as shown below:

S4: Combining the 2*3 order identity matrix I _r with the matrix P on the line to form a new matrix H as shown below, that is, a check matrix of the array erasure code:

Obviously, the horizontal array erasure code determined by the check matrix H is suitable for an array of 2 rows and 6 columns of array storage systems. In such a memory array, there are 3 columns for storing data elements and another 3 columns for storing check elements. According to the relationship between the data elements of the horizontal array erasure code, the check elements and the columns in the check matrix, the linear equations of the linear relationship between the check elements and the data elements in the array code system can be obtained (3) Show:

The element array structure of the horizontal array erasure code is as follows:

Embodiment 3

It is required to construct an array erasure code with a fault tolerance of f=3, and the block size m is also 3. The steps to construct the check matrix under this condition are as follows:

S1: The basic columns I of the three t-th order cyclic matrices are arranged in a row, and the formed matrix is denoted as T ₁ ; where t=3*(3-1)+1=7; in this example, t=7,7 is used. The basic basic cyclic matrix has seven different basic columns, namely I, I ₁ , ..., I ₆ , where I is a 7-order identity matrix; therefore, three 7-order unit matrices are arranged in a column to form a matrix T ₁ as follows Shown as follows:

S2: The 7-step cycle external matrix, in addition to six other I substantially distinct columns _{_{I 1, I 2, ......,}} I 6, in row major order are combined into a matrix, denoted as T _2; T ₂ row matrix There are (3-1) sub-matrices on the column, and the column contains 3 sub-matrices, as shown below:

S4: Combining the 3*7-order unit matrix I ₂₁ with the matrix P on the line to form a new matrix H as shown below, that is, a check matrix of the array erasure code:

The check matrix is analyzed, wherein the P matrix has 3 columns, so the storage array has 3 rows, that is, the storage array has a block size of 3; the cyclic matrix has 7th order, so it can be determined that the data array portion of the storage array has 7 columns; The I matrix in the matrix is a 21-order identity matrix, which corresponds to 21 check elements, that is, the check array portion of the memory array also has 7 columns. In addition, when the check matrix is determined, the linear relationship between the elements in the horizontal array erasure code is also determined, as shown by the linear equations (4):

The element array structure of the horizontal array erasure code is as follows:

The present invention has been described in connection with the embodiments of the present invention, and it is obvious that the invention is not limited by the above-described manner, as long as various insubstantial improvements made by the method concept and technical solution of the present invention are adopted, or the present invention is not improved. The concept and technical solution of the invention are directly applicable to other occasions, and are all within the scope of the invention.

Claims

A method of construction of a parity check matrix, the parity check matrix corresponding to the memory array has m rows, fault tolerance is f, wherein m and f are positive integers; set I, I 1, I 2, ... I n-1 are The basic matrix of the cyclic matrix of the n-th order basic cyclic matrix D, wherein I=I n is a unit matrix; the method is characterized in that: the specific steps of the method are:

S1: The basic columns I of the f t-th order cyclic matrices are arranged in a column, and the formed matrix is denoted by T 1 ; wherein t is a positive integer, t=f*(m-1)+1;

S2: The other f*(m-1) distinct basic columns I 1 , I 2 , . . . , I f* (m-1) other than I in the t-order cyclic matrix are combined in a row-first order. a matrix, denoted as T 2 ; there are (m-1) sub-matrices on the matrix T 2 row, and f sub-matrices on the column;

S3: splicing two matrices T 1 and T 2 on a line, denoted as P=[T 1 |T 2 ];

S4: A new matrix H composed of a unit matrix I r of f*t order and a matrix P on a line is a check matrix of the horizontal array erasure code:
With the method for constructing a horizontal array erasure code according to claim 1, all data element numbers of the horizontal array erasure code are arranged in a row priority in the data array portion, and all check element numbers are in the check array. The column is preferentially arranged in a portion, wherein each column vector of the sub-matrix P of the parity check matrix H of the horizontal array erasure code corresponds to one data element of the horizontal array erasure code, and the unit matrix I r Each column vector corresponds to one check element; each row of the check matrix H of the horizontal array erasure code represents a check equation, that is, a symbol binary exclusive OR of a horizontal array erasure code corresponding to "1" in the row And is 0.