WO2017041232A1

WO2017041232A1 - Encoding and decoding framework for binary cyclic code

Info

Publication number: WO2017041232A1
Application number: PCT/CN2015/089179
Authority: WO
Inventors: 李挥; 侯韩旭; 李硕彦; 沈颖祺; 陈俊
Original assignee: 广东超算数据安全技术有限公司; 李挥
Priority date: 2015-09-08
Filing date: 2015-09-08
Publication date: 2017-03-16

Abstract

The present invention relates to the field of distributed storage systems. Disclosed is an encoding and decoding framework for a binary cyclic code, the framework consisting of a linear code and an alphabet. The linear code is a binary cyclic code, the binary cyclic code is an R_m-based linear code, and a binary parity-check code C_m is used as the alphabet, wherein R_m is a polynomial ring, a variable z represents a cyclic right shift operation of the ring R_m, and C_m consists of an even number of nonzero coefficient polynomials. In a binary-based finite field F₂, the parity-check code C_m has a dimension of m-1, and C_m has a check polynomial of h(z):=1+z+…+z^m-1. The present invention has the following beneficial effects: Because the encoding and decoding processes of the binary cyclic code both involve an exclusive-or operation only and have low computation complexity and small computation overheads, the present invention can greatly reduce the system computation delay, save time and resources, and reduce the costs, and is suitable for actual storage systems. The present invention provides necessary and sufficient conditions under which the binary cyclic code satisfies an MDS property, which is an important theoretical basis for designing an MDS code with low computation complexity.

Description

A codec framework for binary cyclic codes

[Technical Field]

The present invention relates to the field of distributed storage systems, and in particular, to a codec framework of a binary cyclic code.

【Background technique】

With the rapid development of computer network applications, the amount of network information data has become larger and larger, and massive information storage has become more and more important. The ever-increasing data storage pressure has driven the rapid development of the entire storage market, and distributed storage is cost-effective. The superior features of low initial investment and pay-as-you-go have become the mainstream technology of today's big data storage.

Storage node failure of distributed storage systems has become a normal state. When the storage nodes deployed by the system become unreliable, redundancy must be introduced to improve the reliability of the node failure. The easiest way to introduce redundancy is to Data direct backup, direct backup is simple, but its storage efficiency and system reliability are not high, and the introduction of redundancy through coding can improve its storage efficiency, so the high probability availability, reliability and security of distributed storage are Key technical issues with distributed storage systems.

In the current storage system, the encoding method generally adopts the MDS code, and the MDS code can achieve the best storage space efficiency. One (n, k) MDS erasure code needs to divide an original file into k equal-sized data modules, and Generate n uncorrelated coding modules by linear coding, store different modules by n nodes, and satisfy the MDS attribute (any k of n coding modules can reconstruct the original file). This coding technology is effective. It plays an important role in network storage redundancy, and is especially suitable for storing large files and archive data backup applications.

In a distributed storage system, data of size B is stored in n storage nodes, and the size of data stored in each storage node is

The data receiver only needs to connect and download the data of any k storage nodes of the n storage nodes to recover the original data B. This process is called a data reconstruction process or a decoding process, and the RS code is a one that satisfies the characteristics of the MDS code. Kind of code word.

The paper [Cauchy Reed-Solomon Code (CRS code) proposed by James S. Plank, "Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Network Storage Applications" Network Computing and Applications, 2006.] is currently the most One of the commonly used RS codes has been widely used in distributed storage systems, for example, in HDFS. A distributed storage system based on CRS coding is provided; in the operation of traditional RS code, the addition is relatively simple, but the operations of multiplication and division are very complicated, and even need to be realized by means of discrete logarithm operation and table lookup, CRS The code overcomes the multiplication problem in the traditional RS code, and uses a finite field binary matrix composed only of 0 and 1 as a generator matrix, which greatly improves the efficiency of codec. On this basis, people continue to Optimization has become an efficient and widely used storage encoding; however, CRS still has some shortcomings. First, using the 0-1 generator matrix, although greatly reducing the codec complexity, in fact, its decoding complexity is not optimal. Secondly, the finite field binary matrix used for codec is still relatively complicated, and the scattered 0 and 1 make it difficult to further optimize the codec.

RDP code, the full name of Row Diagonal Parity Code, is a simple erasure code (quoted from the paper References P. Corbett et al. "Row diagonal parity for double disk failure correction," 4th Usenix Conf. on File and Storage Tech., San Francisco, 2004), it does not need to use finite fields or generator matrices, but XORs by row and pan-diagonal, generating two check blocks, forming a correction with 2 check blocks. Deleting code, when decoding, it only needs to directly calculate the inverse of the check block, and can solve all the original data blocks cyclically, and simple codec rules, so that RDP becomes a check block with 2 check blocks. Among the erasure codes, the codec has the best complexity; however, the RDP code also has defects: it cannot be extended: RDP has only two check blocks, and at most two blocks are allowed to be lost, just like the strategy of three data backups. If the number of lost more than two blocks can not be repaired.

[Summary of the Invention]

In order to solve the problems in the prior art, the present invention provides a codec framework of a binary cyclic code, which solves the problem of high computational complexity and large computational overhead in repairing decoding in the prior art.

The invention provides a codec framework of a binary cyclic code, which is composed of a linear code and a alphabet, the linear code is a binary cyclic code, the binary cyclic code is a linear code based on R _m , and the binary parity code C _m As the alphabet; where R _{m is} expressed as a polynomial ring, R _m :=F ₂ [z]/(1+z ^m )., vector

Corresponding to a polynomial in the ring R _m

The variable z represents a cyclic right shift operation of the ring R _m ; C _m consists of an even number of non-zero coefficient polynomials over R _m C _m ={a(z)(1+z): a(z)∈R _m }, In the binary finite field F ₂ based, the dimension of the parity code C _m is m-1, and the check polynomial of C _m is h(z):=1+z+...+z ^m-1 .

As a further improvement of the present invention: in the codec framework of the binary cyclic code, given an odd number m and a positive integer k, v, the binary cyclic code is a type from F ₂ to

The mapping is represented by a k×v generation matrix on the ring R _m . The encoding process is specifically as follows: (A) equally divide (m-1) k bits into k groups, each group containing m-1 Bit, for each group of m-1 bits, add a parity bit, generate a polynomial on C _m , and combine the generated polynomials together into a k-tuple

The binary cyclic coding is an encoding corresponding to (m-1) k input bits, obtained by wG; (B) a polynomial in C _m obtained by adding a parity bit is called an original data packet or a data packet, and one will be The polynomial in wG is called an encoding packet, and one encoding packet is a linear combination of R _m of k packets, and the encoding coefficient is a polynomial in R _m .

As a further improvement of the present invention, the codec algorithms of the codec framework of the binary cyclic code all involve only binary exclusive OR operations.

As a further improvement of the present invention: in the codec framework of the binary cyclic code, the decoding process of the binary cyclic code is: recovering k original data packets from k coding packets, specifically: setting s ₁ (z),... , s _k (z) is k packets, p ₁ (z),..., p _k (z) are k coded packets indexed by I, and the decoding process is (p ₁ (z), .. , p _k (z)) = (s ₁ (z), ..., s _k (z)) · G _I , where G _I is a k × k submatrix passing through the matrix G of the reserved I column.

As a further improvement of the invention: the binary cyclic code is closed in addition and multiplication.

The beneficial effects of the invention are: the encoding and decoding process of the binary cyclic code only involves the exclusive OR operation, the computational complexity is low, the calculation overhead is small, the system calculation delay is greatly reduced, the time and resources are saved, and the energy can be saved. The cost reduction is suitable for the actual storage system; this patent gives the critical condition that the binary cyclic code satisfies the MDS attribute and is an important theoretical basis for designing the MDS code with low computational complexity.

[Description of the Drawings] 【Detailed ways】

The invention is further described below in conjunction with specific embodiments.

A codec frame of a binary cyclic code consisting of a linear code and a alphabet, the linear code is a binary cyclic code, the binary cyclic code is a linear code based on R _m , and the binary parity code C _{m is} used as an alphabet; Where R _{m is} represented as a polynomial ring, R _m :=F ₂ [z]/(1+z ^m )., vector

Corresponding to a polynomial in the ring R _m

In the codec framework of binary cyclic code, given an odd number m and a positive integer k, v, the binary cyclic code is a kind from F ₂ to

The codec algorithm of the codec framework of the binary cyclic code only involves binary XOR operations.

In the codec framework of the binary cyclic code, the decoding process of the binary cyclic code is: recovering k original data packets from k coding packets as follows: Let s ₁ (z),...,s _k (z) k packets, p ₁ (z),...,p _k (z) are k coded packets indexed by I, and the decoding process is (p ₁ (z),...,p _k (z) ) = (s ₁ (z), ..., s _k (z)) · G _I , where G _I is a k × k submatrix that passes through the matrix G of the reserved I column.

Binary cyclic code is closed in addition and multiplication

In an embodiment, a new coding framework, called a binary cyclic coding framework, whose codec algorithm only involves binary XOR operations, provides theoretical support for designing storage coding with low computational complexity. Prior art RDP The code can be seen as a specific example of the binary loop coding framework proposed in this patent.

The theoretical knowledge of binary cyclic codes, let m be a positive odd number and R _m be a polynomial ring

R _m :=F ₂ [z]/(1+z ^m ).

(2)

The element in the ring R _m is called a polynomial. Vector

Can correspond to a polynomial in the ring R _m

In the above formula (2), the variable z represents a cyclic right shift operation of the ring R _m . Define a binary cyclic code of length m as a subset of R _m that is closed in addition and multiplication.

In this patent, only the simple parity code C _m is considered, which is composed of an even number of non-zero coefficient polynomials over R _m

C _m ={a(z)(1+z): a(z)∈R _m }.

(3)

In the binary finite field F ₂ based, the dimension of the parity code C _m is m-1, and the check polynomial of C _m is h(z):=1+z+...+z ^m-1 .

The encoding framework of the binary cyclic code:

The binary cyclic code is defined as a linear code based on R _m which uses the binary parity code C _m as the alphabet. Specifically, given an odd number m and a positive integer k, v, the binary loop encoding is a kind from F ₂ to

The mapping, which can be represented by a k × v generation matrix on the ring R _m . The encoding process can be divided into two steps: First, (m-1) k bits are equally divided into k groups, each group containing m-1 bits. For each group of m-1 bits, add a parity bit to generate a polynomial on C _m . Combine the generated polynomials together into a k-tuple

Binary cyclic coding is an encoding corresponding to (m-1) k input bits, obtained by wG.

Thereafter, the polynomial in C _m obtained by adding one parity bit is referred to as an original data packet or a data packet, and a polynomial in a wG is referred to as an encoded packet. An encoding packet is a linear combination of R _m of k packets, and the coding coefficient is a polynomial in R _m .

The encoding process of binary encoding is illustrated by an example below. Suppose you want to store 2 (m-1) information bits into 4 storage nodes, where m is an odd positive integer. In storage In the node, m bits are stored on each node. Node 1 and Node 2 are referred to as information nodes, which store m-1 information bits and 1 parity code bit, respectively. Node 3 and Node 4 are said to be coding nodes, which store m coded bits, respectively.

The 2 (m-1) information bits are equally divided into two parts. The bits of the first part are denoted as s _1,0 , s _1,1 ,...,s _1,m-2 , and the second part of the bits are denoted as s _2,0 , s _2,1 ,...,s _2,m-2 . For i=1, 2, the node i stores the bits s _i,0 , s _i,1 ..., s _i,m-2 , and the parity bit is

Node 3 stores bits as

s _3,j :=s _1,j +s _2,j ,

j=0,1,...,m-1, node 4 stores bits as

j=0,1,...,m-1. symbol

Indicates the modulo m plus. An example of m=7 is given in Table I. It is found that the coded bits in node 3 are calculated by adding the bits in nodes 1 and 2, while the coded bits in node 4 are calculated by adding the bits in node 1 and the cyclic conversion of the bits in node 2.

Table I: An example of a 4-node binary loop encoding.

The following proves that the data in any two nodes can recover the original information bits. In nodes 1 and 2, information bits can be obtained directly. If you want the node 1 and nodes 3 decode the information bits, from the _{_{s 3, j = s 1,}} j + s 2, j is subtracted S _{1, j} is _obtained, the value of _j s _2. Similarly, information bits can be recovered from any one of the information nodes and any one of the coding nodes. Finally, it is desirable to decode the information bits from node 3 and node 4. First, you can calculate

Where j=1, 3, 5, ... m-2. Next, you can calculate s _2,0 ,

If the value of s _2,0 is known, then s _3,0 +s _2,0 =s _{1,0 can} be obtained as s _1,0 by s _4,0 +s _1,0 =s _2,1 Come to get s _2,1 . The remaining information bits can be iteratively decoded.

An example of the binary coded parameters m=7, k=2, and v=4 is given above. Two packets are

i=1, 2, its generator matrix is

Since c(z)h(z) = 0 for all c(z) ∈C _m , the linear combination of the encoded packet as a packet can be obtained in a number of ways. A check polynomial can be added to any element of the generator matrix G without changing the coded packet. For example, you can choose

As a generator matrix in this example.

Binary cyclic code decoding method

The k coded packets are said to be decodable, meaning that k original data packets can be recovered from the k coded packets. In this summary, the necessary and sufficient conditions for the solution of the binary cyclic code are given. First define some symbols. For a polynomial

If you can find another polynomial

Satisfy

Equal to 1 or 1+h(z), then the polynomial f(z) on R _m is said to be C _m reversible. For a subset of |I|=k

Let G _I be a k × k submatrix that passes through the matrix G of the I column.

First, a sufficient condition is given: if det(G _I ) is C _m reversible, the k coded packets indexed by I are decodable.

Let s ₁ (z),...,s _k (z) be k packets, p ₁ (z),...,p _k (z) be k coded packets indexed by I, and their encoding Process is

(p ₁ (z),...,p _k (z))=(s ₁ (z),...,s _k (z))·G _I

Assuming that the determinant of G _I is C _m reversible, by definition a polynomial on R _m can be obtained which satisfies δ(z)det(G _I ) equal to 1 or 1+h(z). Therefore, the original data packet can be recovered from k coded packets in the following manner.

(p ₁ (z),...,p _k (z))·adj(G _I )·δ(z)

=(s ₁ (z),...,s _k (z))·G _I ·adj(G _I )·δ(z)

=(s ₁ (z),...s _k (z))·det(G _I )·δ(z)

=(s ₁ (z),...,s _k (z)),

In the above formula, adj(G _I ) is the adjoint matrix of G _I . In the last step, the property in C _m is used: if s _i (z) ∈ C _m , then s _i (z)(1+h(z))=s _i (z).

Next, a judgment will be given as to whether or not the polynomial on a ring C _m is a reversible condition of C _m . Let f ₁ (z), f ₂ (z), ... f _L (z) be the prime factorization of the check polynomial h(z) based on the binary finite field F ₂ . The irreducible polynomials f ₁ (z) through f _L (z) are different except for 1+z ^m because m is an odd number. In a general exchange ring R with unit elements, for an element u∈R, if an element can be found

Satisfy

Equal to the unit cell in R, then u is called a unit.

Let f ₁ (z), f ₂ (z), ..., f _L (z) be the irreducible factors of the check polynomial h(z) of the parity code C _m . Let a(z) be a polynomial on R _m , then the following conditions are equivalent:

1) a(z) is C _m reversible.

2) The a(z) modulo h(z) is a unit on F ₂ [z]/(h(z)).

3) a(z) for all l = 1, 2, ..., L is a unit on F ₂ [z] / (f _l (z)).

Define f ₀ (z) as a polynomial 1+z. From the Chinese remainder theorem, the ring R _{m is} isomorphic

In fact, you can define the mapping φ: R _m → R' _m

a(z)α(a(z)mod1+z, a(z)modh(z)),

Define the inverse mapping φ': R' _m → R _m

(a ₀ (z), a ₁ (z)) αh(z)a ₀ (z)+(1+h(z))a ₁ (z)mod1+z ^m

The above two mappings are reversible. Assuming that a(z)modh(z) is a unit of F ₂ [z]/(h(z)), we can find a polynomial d(z) that satisfies φ(a(z)d(z))=(a, 1), a is equal to 0 or 1. Thus a(z)d(z) is equal to φ'((0,1))=1+h(z) or φ'((1,1))=1. Therefore, a(z) is C _m reversible.

Conversely, assuming that a(z) is C _m reversible, ie, there exists a polynomial

Satisfy

Equal to 1 or 1+h(z). If using map φ for

For a∈F ₂ , there is

Therefore a(z)modh(z) is a unit.

h(z) can be decomposed into f ₁ (z)f ₂ (z)...f _L (z). From the Chinese remainder theorem, it can be concluded that the (2) condition of the theorem and the (3) condition are equivalent.

Let I be the index set with base k,

The encoded data packet indexed by I is a decodable and sufficient condition that det(G _I ) is C _m reversible.

The "sufficient" part has been proven. The "necessary" part is discussed below, assuming that det(G _I ) is not C _m reversible. For some l ₀ ∈{1,2,...,L}, satisfy

If the element of the matrix G _I is modular

Then the generator matrix is in the finite field

The top is a singular matrix. So you can find a non-zero vector

Each of its elements belongs to

Then

Is a non-zero vector. For j=1, 2,...k, choose a _j (z)∈C _{m to} satisfy

If a _j (z) is taken as the original data packet, the v-tuple obtained from (a ₁ (z), a ₂ (z), ..., a _k (z)) G _I is a zero v-tuple. Then the encoding table is not single shot, then the encoded packet indexed by I is not decodable.

Continue with the previous example of m=7. The polynomial 1+z ⁷ can be decomposed into a product of f ₀ (z)=1+z, f ₁ (z)=1+z+z ² and f ₂ (z)=1+z ² +z ³ . It can be checked that any two of the encoded packets are decodable. For example, if the index set is I={3,4}, the determinant det(G _I )=1+z cannot be decomposed into f ₁ (z) and f ₂ (z). In fact, 1+z is C _m reversible because

(1+z)(z+z ³ +z ⁵ )=z+z ² +...z ⁶ =1+h(z).

Therefore, two data packets can be calculated from node 3 and node 4.

In software implementations, cyclic shifts can be achieved by using pointers. M bits are stored consecutively in memory, and a pointer is used to store the header address of the packet. Cyclic shifting can be done only by modifying the pointer without modifying the packet itself. It is also possible to use byte cyclic shift instead of bit cyclic shift, which is also easier to control for software implementations.

In addition, it can be verified that the RDP code of one of the prior art can be regarded as an example of the binary cyclic code proposed in this patent, and the generation matrix is

Compared to previous coding schemes, such as RDP codes, binary cyclic codes are more general, and RDP can be seen as a special case of binary cyclic codes.

The above is a further detailed description of the present invention in connection with the specific preferred embodiments, and the specific embodiments of the present invention are not limited to the description. It will be apparent to those skilled in the art that the present invention may be made without departing from the spirit and scope of the invention.

Claims

A codec frame of a binary cyclic code, characterized in that it consists of a linear code and a alphabet, the linear code is a binary cyclic code, the binary cyclic code is a linear code based on R m , and the binary parity code C m As the alphabet; where R m is expressed as a polynomial ring, R m :=F 2 [z]/(1+z m )., vector
Corresponding to a polynomial in the ring R m
The variable z represents a cyclic right shift operation of the ring R m ; C m consists of an even number of non-zero coefficient polynomials over R m C m ={a(z)(1+z): a(z)∈R m }, In the binary finite field F 2 based, the dimension of the parity code C m is m-1, and the check polynomial of C m is h(z):=1+z+...+z m-1 .
The codec framework of the binary cyclic code according to claim 1, wherein: in the codec frame of the binary cyclic code, an odd number m and a positive integer k, v are given, and the binary cyclic code is a type from F 2 to
The mapping is represented by a k×v generation matrix on the ring R m . The encoding process is specifically as follows: (A) equally divide (m-1) k bits into k groups, each group containing m-1 Bit, for each group of m-1 bits, add a parity bit, generate a polynomial on C m , and combine the generated polynomials together into a k-tuple
The binary cyclic coding is an encoding corresponding to (m-1) k input bits, obtained by wG; (B) a polynomial in C m obtained by adding a parity bit is called an original data packet or a data packet, and one will be The polynomial in wG is called an encoding packet, and one encoding packet is a linear combination of R m of k data packets, and the coding coefficient is a polynomial in R m , where G is a generation matrix of the corresponding coding packet.
The codec framework of the binary cyclic code according to claim 1, wherein the codec algorithm of the codec frame of the binary cyclic code only involves a binary exclusive OR operation.
The codec framework of the binary cyclic code according to claim 1, wherein in the codec framework of the binary cyclic code, the decoding process of the binary cyclic code is: recovering k original data packets from k coding packets. Let: s 1 (z),...,s k (z) be k packets, p 1 (z),...,p k (z) are k coded packets indexed by I, The decoding process is (p 1 (z),...,p k (z))=(s 1 (z),...,s k (z))·G I , where G I is by retaining column I A k × k submatrix of the matrix G.
A codec framework for a binary cyclic code according to claim 1, wherein the binary cyclic code is closed in addition and multiplication.