CN106788454B

CN106788454B - Construction method of local unequal codes

Info

Publication number: CN106788454B
Application number: CN201611071571.XA
Authority: CN
Inventors: 朱丹; 周晓波; 朱洁
Original assignee: Shaanxi Shangpin Information Technology Co ltd
Current assignee: Shaanxi Shangpin Information Technology Co ltd
Priority date: 2016-11-29
Filing date: 2016-11-29
Publication date: 2020-04-24
Anticipated expiration: 2036-11-29
Also published as: CN106788454A

Abstract

The invention discloses a construction method of local unequal codes, which has the following implementation scheme: under the condition of unequal locality of information symbols, firstly, a parity check bit matrix of maximum distance separable codes MDS is separated, and then any subset obtained by separation is further separated according to a certain rule to obtain unequal locality of information symbols. The case of unequal locality of all symbols is first to

Vector on domain

Gabidulin coding is carried out, then two rounds of division are carried out on the coding result, and the maximum distance divisible code MDS coding is carried out on the result of the second division to obtain the full symbol locality unequal code. The invention effectively solves the problem that the minimum distance of the existing local repair code can not reach the upper bound, correspondingly reduces the repair locality and the I/O operation in the repair process, and keeps a lower repair bandwidth in the repair process.

Description

Construction method of local unequal codes

Technical Field

The invention belongs to the field of data coding and storage, and particularly relates to a construction method of local unequal codes.

Background

In recent years, with the rapid development of portable internet devices and mobile internet, the demand for storage has been increasing. Distributed storage is emerging as the primary solution to mass storage, which stores data in a distributed manner on multiple independent devices. In order to save cost, the nodes usually adopt common switching equipment and cheap servers, so that the nodes in the storage system are very easy to fail under the conditions of node replacement, hardware failure and software upgrading, thereby causing the loss of data in the storage nodes. In order to ensure data reliability, the system frequently performs node repair, and therefore how to effectively perform node repair becomes an urgent problem to be solved. Now, a series of schemes adopted by the repair node, such as a copy policy, a regeneration code, an erasure code, etc., may cause a waste of storage space, occupy a large amount of repair bandwidth, and require a large repair locality (i.e., a large number of other nodes that need to be connected in the data node repair process).

The local repair code LRCs means that all information symbols of one code are divided into a plurality of groups, and each group generates one parity bit. When an information symbol in a certain packet is lost or damaged, the lost or damaged data can be recovered by only using other information symbols in the group and the parity check bits of the group, and the recovery is not required by all the coded symbols. Therefore, the local repair codes LRCs can reduce the repair locality, so that the I/O operation in the repair process can be reduced, and a lower repair bandwidth can be kept.

For a local repair code LRCs, the larger its minimum distance d, the fewer the number of nodes that need to be connected in the data node repair process, and the smaller the repair bandwidth. In the existing method for constructing the local repair code, the minimum distance of the local repair code can rarely reach an upper limit.

Disclosure of Invention

The invention provides a new construction method of local unequal codes, aiming at the defect that the existing construction method of local repair codes LRCs has larger repair locality. And constructing a new local unequal code based on two conditions that the local repair code information symbols are unequal in locality and the full symbols are unequal in locality.

The repair locality of the locally unequal codes is minimized, and the technical scheme of the invention comprises the following steps:

a construction method of local unequal codes comprises two preferred embodiments:

first preferred embodiment: the method is characterized in that the locality of information symbols is unequal, information symbol locality unequal codes are constructed, a parity check bit matrix of the MDS code with the maximum distance divisible is separated, any subset obtained by separation is further separated, an information symbol locality unequal code generating matrix with the code symbol length n is obtained, and a code word generated by the matrix is an information symbol locality unequal code with the minimum distance d reaching the upper bound;

second preferred embodiment: the method is characterized in that the full symbol locality is unequal, the full symbol locality comprises an information symbol and a parity check symbol, Gabidulin codes are adopted for coding, coding results are coded again by using maximum distance separable MDS codes, a (n, k, d) full symbol locality unequal code reaching the upper bound of the minimum distance is obtained, the length of a code word is n, and the code word generated by the matrix is the information symbol locality unequal code reaching the upper bound of the minimum distance d.

Further according to the method for constructing the locally unequal codes, the method comprises the following steps:

F_qrepresenting a q-ary domain;

wherein F represents a domain, q^mRepresenting an m-th order expansion of q-elements, wherein the highest order of the polynomial in the domain is k-1;

a code of length n contains k information symbols, where a certain symbol i may pass through the other r symbols in the code_iThe symbols are recovered, then the locality of i is r_iIf the locality of each information symbol in k information symbols included in one code is maximum r, the locality of the code is r;

system code

n denotes the code length, k denotes the information symbol length, d denotes the minimum distance, if the information symbols can be divided into disjoint subsets, the information symbols of different subsets have different locality, i.e. the code is an information symbol locality unequal code.

The system code

The information symbol locality profile is

k_jThe number of information symbols with locality j (j is more than or equal to 1 and less than or equal to r);

the full symbol locality inequality code is a further extension of the information symbol locality inequality code, and the coded symbols, including the information symbols and the parity check symbols, can be divided into disjoint subsets, the coded symbols of different subsets have different locality, and represent the codes of which the parity check symbols also have locality constraint;

the system code

If the parity check symbols also have locality constraints, a full symbol locality profile is defined, similar to the information symbol locality profile, r_iRepresenting the locality of the ith symbol in the code, i is more than or equal to 1 and less than or equal to n, and r is_a＝max(r₁，r₂，…，r_n) Then its full-symbol locality profile is represented as

n_jIs localized as j (1 ≦ j ≦ r_a) The number of information symbols of (a);

if XⁿIs an n-dimensional vector space over a finite field GF (q), q being a prime number or a prime power, XⁿThe rank of the element vector X in GF (q) is R (X), XⁿX, y of the two elements x, y of (a) is a rank distance d_R(X, Y) is defined as d_R(x, y) ═ R (x-y); the minimum value of the rank distances of all two different codewords of the code c is the minimum rank distance of the code c and is denoted as d_R(c) (ii) a A linear code with a code length of N, an information symbol number of K, and a minimum rank distance of D is called a rank distance (N, K, D) code;

the above-mentioned

The above Gabidulin code is denoted as (N, K, N-K +1) code, where N is the number of coding symbols, K is the number of information symbols, and N-K +1 is the minimum distance of code words.

Is one of the code words, defined as

f (x) is within the order of m × qOne coefficient of (a) is a linear polynomial of the information symbol, i.e. the number of elements in the finite field is m x q, g₁，…，g_NIs that

A particular point on;

the Gabidulin code belongs to a rank distance code;

if the code length of the (N, N-Y, Y +1) code is N, the redundancy is Y, and the linear code with the minimum distance of Y +1 is the maximum distance separable MDS code; if N-Y information sign bits are expanded into N information sign bits after being coded, when any sign bit of the N-Y information sign bits is lost or damaged, K sign bits in the existing N-1 sign bits can be used for recovering the lost or damaged information sign bits;

if Gabidulin codes are adopted for coding, then maximum distance separable MDS codes are adopted for coding, and the finally obtained coding result can reach the upper bound of Singleton, namely the upper bound of the minimum distance of the code words;

the Singleton upper bound is a measure of the code word, and is an upper bound of the number of the code words when the length of the code word and the minimum distance are given; a. the_q(n, d) is the maximum value that the number of q element code words can reach, namely A_q(n，d)≤q^n-d+1Wherein q represents that the codeword is a q-ary code; if a code word reaches the Singleton upper bound, the minimum distance of the code word can reach the maximum value; if it is

(N, K, D) rank distance code over field

The number of coding symbols is N, the number of information symbols is K, the minimum rank distance is D, and the Singleton upper bound can be converted into a representation related to the minimum distance of the code words, namely:

further according to the construction method of the locally unequal codes, when the information symbols in the first preferred embodiment are locally unequal;

code

Wherein k + d-1 represents the coding length, d-1 represents the parity check bit length, d represents the minimum distance, if the parity check bit length is less than the minimum distance by one bit, the construction condition of the maximum distance separable MDS code is met, the generating matrix of the maximum distance separable MDS is constructed by a unit matrix and a parity check bit matrix, wherein the column number of the unit matrix is the same as the information symbol length of the code;

code

Can be expressed as

Wherein

Is the jth column vector of the identity matrix of k x k,

is the j-th column vector in the parity check bit matrix of k (d-1), i.e.:

of the locality pairs G' of each information symbol

Dividing the subsets, and keeping jp as p (l is less than or equal to p and less than or equal to m, and m is less than or equal to r) in the j column of the parity check bit matrix, the locality of the coordinate point is represented by p

All coordinates with locality p in the same subset s_pThen, then

Is divided into m disjoint subsets s₁，…，s_m，|s_pI represents the set s_pThe number of the elements in the Chinese character,

denotes j_pThe number of (2);

will s_pArbitrarily divided into

Disjoint subsets, each subset not exceeding the locality p, i.e.

Is a k-dimensional vector, the set S is contained in this k-dimensional vector,

the elements in the representation set S are taken from

Any | S | row of (1);

the resulting codeword generator matrix is G:

the decomposition vector

Taking this as an example only, but not limiting to this example;

if the original information code word is

Obtain a code word of

The obtained codeword length n:

upper bound of minimum distance d:

then the information symbol locality inequality codes with the minimum distance d reaching the upper bound are constructed.

Further according to the construction method of the locally unequal code, when the all symbols in the second preferred embodiment are locally unequal, the method includes an information symbol and a parity check symbol;

vector quantity

Belong to

A field, having a length of k,

the full-symbol locality profile of

I.e. the number of symbols with locality j is n_jJ is more than or equal to 1 and less than or equal to ra;

finding n corresponding to each locality j_jIs of the formula

Further solving N to obtain the code word length N of the Gabidulin code;

according to the code word length N, the information symbol length k and the minimum distance N-k +1 pair

Gabidulin coding is carried out to obtain code words

Code word

The symbols in (1) are divided into r according to the locality of each symbol_aA disjoint group

I.e. the locality of the elements in each group is j, each group

The number of the middle symbols is N_j；

If N is present_jWhen the value is 0, then group

Such groups are not further divided;

if N is present_j> 0, i.e. group

Then further N is added_jCorresponding group

The symbols in (1) are arbitrarily divided into Nj/j disjoint locality groups, the number of symbols in each group is j, then

For each locality grouping of j symbols

Using F_qThe (j +1, j,2) maximum distance separable MDS codes are coded again, so that the number of symbols of each group is changed from j to j + 1;

obtaining (n, k, d) full symbol locality inequality codes reaching a minimum distance upper bound, a codeword length n:

upper bound of minimum distance d:

a full symbol locality inequality code is constructed with a minimum distance d up to the upper bound.

Compared with the prior art, the invention has the following advantages:

(1) the invention adopts the construction method of local unequal codes, so that the repair locality of the code words reaches a minimum value, thereby reducing I \ O operation and reducing repair bandwidth.

(2) Because the locality inequality codes divide the unequal locality, a locality outline is formed, and the basis for dividing subsets of code words for multiple times is given, so that the minimum distance d reaches the upper bound.

Drawings

Fig. 1 shows two preferred modes of locally unequal codes.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear, some definitions and parameters related to the method of the present invention are first explained as follows:

F_q: representing a q-ary domain.

F denotes a field, q^mRepresenting an m-th order extension of q-elements, where the highest order of the polynomial in the field is k-1.

The locality is as follows: setting a code of length n to contain k information symbolsWhere a certain symbol i may pass through the others r in the code_iThe symbols are recovered, then the locality of i is r_i. If the locality of each of k information symbols included in a code is at most r, the locality of the code is said to be r.

Information symbol locality unequal code: for a system code

n denotes a code length, k denotes an information symbol length, and d denotes a minimum distance. If the information symbols can be divided into disjoint subsets, the information symbols of different subsets have different locality, and the code is called an information symbol locality inequality code.

Information symbol locality profile: for a system code

Its information symbol locality profile is expressed as

k_jThe number of information symbols having locality j (1. ltoreq. j. ltoreq.r) is represented.

Full symbol locality unequal code: a full symbol locality unequal code is a further extension of an information symbol locality unequal code to mean a code where the parity check symbols also have locality constraints. For a systematic code, if its code symbols (including information symbols and parity check symbols) can be divided into disjoint subsets, the code symbols of different subsets have different locality, such a code is called a full symbol locality unequal code.

Full-symbol locality profile: when the parity symbols also have locality constraints, a full symbol locality outline may be defined, similar to the information symbol locality outline. Firstly, for a system code

r_iRepresenting the locality of the ith symbol in the code, i is more than or equal to 1 and less than or equal to n, and then r is made_a＝max(r₁，r₂，…，r_n) Then its full-symbol locality profile is represented as

n_jIs localized as j (1 ≦ j ≦ r_a) The number of information symbols of (2).

Rank distance: let XⁿRepresents an n-dimensional vector space over the finite field gf (q), where q is a prime number or a prime power. XⁿThe rank of the element vector x over gf (q) is r (x). XⁿX, y of the two elements x, y of (a) is a rank distance d_R(X, Y) is defined as d_R(x, y) ═ R (x-y). The minimum value of the rank distances of all two different codewords of the code c is the minimum rank distance of the code c and is denoted as d_R(c) In that respect A linear code with a code length of N, an information symbol number of K, and a minimum rank distance of D is called a rank distance (N, K, D) code.

Gabidulin code:

Is one of the code words, defined as

(x) is a linear polynomial with the information symbol as a coefficient in a finite field of order m x q (i.e. the number of elements in the finite field is m x q), g₁，…，g_NIs that

A particular point on. The Gabidulin code belongs to rank distance codes.

MDS code: one such (N, N-Y, Y +1) code is called a maximum distance separable MDS code, which is a linear code with a code length N, redundancy Y, and minimum distance Y + 1. It has the following properties: if N-Y information sign bits are expanded into N information sign bits after being coded, when any sign bit in the N-Y information sign bits is lost or damaged, the lost or damaged information sign bit can be recovered by only using K sign bits in the existing N-1 sign bits.

If Gabidulin code is adopted for coding, then maximum distance separable MDS code is adopted for coding, the finally obtained coding result can reach the upper bound of Singleton, and the upper bound of the minimum distance of the code words is also reached.

Singleton upper bound: the Singleton upper bound is a measure of the number of codewords, which is an upper bound on the number of codewords given the codeword length and minimum distance. The Singleton upper bound gives the relation between the length of the codeword and the minimum distance, A_q(n, d) represents the maximum possible number of q-element code words, namely A_q(n，d)≤q^n-d+1Where q denotes that the codeword is a q-ary code. When a code reaches the Singleton upper bound, the minimum distance of this codeword may reach a maximum. One is

(N, K, D) rank distance code over field

The number of coding symbols is N, the number of information symbols is K, and the minimum rank distance is D. Such codes, the Singleton upper bound of which can be translated into a representation related to the minimum distance of the codeword, i.e. a code with a minimum distance between the codewords

In order to make the objects, technical solutions and advantages of the present invention more apparent, a method for constructing a locally unequal code according to the present invention is described in detail below with reference to the accompanying drawings, as shown in fig. 1, the method according to the present invention preferably includes a first preferred embodiment and a second preferred embodiment.

First preferred embodiment:

the first preferred embodiment is applicable to cases where the locality of information symbols is unequal, and in order to construct information symbol locality unequal codes, first, the first preferred embodiment is to construct information symbol locality unequal codesFirstly, the parity check bit matrix of the maximum distance separable MDS code is separated, then any subset obtained by separation is further separated, and the length of the code symbol is obtained

The information symbol locality inequality code of (2) generates a matrix, and a codeword generated by the matrix is an information symbol locality inequality code of which the minimum distance d reaches an upper bound.

Given a code

Wherein k + d-1 represents the coding length, d-1 represents the parity check bit length, d represents the minimum distance, when the parity check bit length is less than the minimum distance by one bit, the construction condition of the maximum distance separable MDS code is satisfied, the generation matrix of the maximum distance separable MDS is constructed by a unit matrix and a parity check bit matrix, wherein the column number of the unit matrix is the same as the information symbol length of the code, so that the code has the advantages of simple structure, low cost, high reliability, and high reliability

Can be expressed as

Wherein

Is the jth column vector of the identity matrix of k x k,

is the j-th column vector in the parity check bit matrix of k x (d-1). Namely, it is

According to locality of each information symbol in G

Dividing the subsets into subsets, note j_pThe locality of the representative coordinate point in the jth column of the parity check bit matrix is p (1 ≦ p ≦ m, m ≦ r). Specifically, will

All coordinates with locality p in the same subset s_pThen, then

|s_pI represents the set s_pThe number of the elements in the Chinese character,

denotes j_pThe number of (2).

Is divided into m disjoint subsets s₁，…，s_m。

Will s_pArbitrarily divided into

Disjoint subsets, each subset not exceeding the locality p, i.e.

Here, a vector in k-dimension is defined

The set S is contained in this k-dimensional vector,

the elements in the representation set S are taken from

Arbitrary | S | row of (1). By decomposing the vector

For example, and not by way of limitation, the finally obtained codeword generator matrix is G:

if the original information code word is

Then the coded codeword is obtained as

The obtained codeword length n:

the upper bound of the minimum distance d is:

Second preferred embodiment:

the second preferred embodiment is applicable to cases where the full symbol locality is unequal, including information symbols and parity check symbols, and the coding is performed using the Gabidulin code first, and then the coding result is encoded again using the maximum distance separable MDS code.

Select one to

Vector of domain

The length of the first electrode is k,

the full-symbol locality profile of

I.e. the number of symbols with locality j is n_j1 is more than or equal to j is less than or equal to r_a. Order to

Thereby obtaining n corresponding to each locality j_jThen run by

And further solving N to obtain the code word length N of the Gabidulin code. Then according to the code word length N, the information symbol length k and the minimum distance N-k +1 pair

Gabidulin coding is carried out to obtain code words

Coding the word

I.e. the locality of the elements in each group is j, each group

The number of the middle symbols is N_j. If N is present_jWhen the value is 0, then group

Such groups are not further divided; if N is present_j> 0, i.e. group

Then further N is added_jCorresponding group

For each locality grouping of j symbols

By using E_qThe above (j +1, j,2) maximum distance separable MDS code is encoded again, so that the number of symbols per packet is changed from j to j + 1. This results in (n, k, d) full symbol locality unequal code reaching a minimum distance upper bound, codeword length:

upper bound of minimum distance d:

The above examples are merely illustrative for clearly illustrating the present invention and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. It is not necessary or necessary to exhaustively enumerate all embodiments herein, and obvious variations or modifications can be made without departing from the scope of the invention.

Claims

1. A method for constructing locally unequal codes, comprising:

the method comprises the following steps: when the locality of the information symbols is unequal, constructing an unequal locality code of the information symbols, separating a parity check bit matrix of the MDS code with the maximum distance, further separating any subset obtained by separation, obtaining an unequal locality code generating matrix of the information symbols with the code symbol length n, wherein a code word generated by the matrix is the unequal locality code of the information symbols with the minimum distance d reaching the upper bound:

code

When the length of the parity check bit is only one bit less than the minimum distance, the construction condition of the maximum distance separable MDS code is met, a generating matrix of the maximum distance separable MDS code is constructed by a unit matrix and a parity check bit matrix, wherein the column number of the unit matrix is the same as the length of the information symbols of the maximum distance separable MDS code;

code

Can be expressed as

Wherein

Is the jth column vector of the identity matrix of k x k,

is the j-th column vector in the parity check bit matrix of k (d-1), i.e.:

of the locality pairs G' of each information symbol

Dividing the subsets into subsets, note j_pThe locality of the representative coordinate point in the jth column of the parity check bit matrix is p (1 ≦ p ≦ m, m ≦ r), and the representative coordinate point is represented by p

All coordinates with locality p in the same subset s_pThen, then

denotes j_pThe number of (2);

will s_pArbitrarily divided into

Disjoint subsets, the number of elements in each subset not exceeding the locality p, i.e.

Is a k-dimensional vector, the set S is contained in this k-dimensional vector,

the elements in the representation set S are taken from

Any | S | row of (1);

by decomposing the vector

For example, the finally obtained codeword generator matrix is G:

if the original information code word is

Obtain a code word of

The obtained codeword length n:

upper bound of minimum distance d:

constructing the information symbol locality inequality code with the minimum distance d reaching the upper bound;

step two: when the full symbols are not equal in locality, the full symbols comprise information symbols and parity check symbols, Gabidulin codes are adopted for coding, coding results are coded again by using maximum distance separable MDS codes, and (n, k, d) full symbol unequal codes reaching the upper bound of the minimum distance, the length n of the code words and the length k of the information symbols are obtained, and the code words generated by the matrix are the information symbol unequal codes reaching the upper bound of the minimum distance d:

vector quantity

Belong to

A field, having a length of k,

the full-symbol locality profile of

I.e. symbols with locality jThe number is n_j1 is more than or equal to j is less than or equal to r_a；

Determining what each locality j corresponds to _jNIs of the formula

Further solving N to obtain the code word length N of the Gabidulin code; according to the code word length N, the information symbol length k and the minimum distance N-k +1 pair

Gabidulin coding is carried out to obtain code words

Code word

I.e. the locality of the elements in each group is j, each group

The number of the middle symbols is N_j；

If N is present_jWhen the value is 0, then group

Such groups are not further divided;

if N is present_j> 0, i.e. group

Then further N is added_jCorresponding group

Symbol in (1) is arbitrarily divided into N_jA/j number of different phases

Coding again to change the number of symbols of each group from j to j + 1;

upper bound of minimum distance d:

constructing a full-symbol partial inequality code with the minimum distance d reaching an upper bound;

wherein, F is_qRepresenting a q-ary domain;

the above-mentioned

system code

n denotes the code length, k denotes the information symbol length, d denotes the minimum distance, if the information symbols can be divided into disjoint subsets, the information symbols of different subsets have different locality, i.e. the information symbolsInformation symbol locality unequal codes;

the system code

The information symbol locality profile is

the system code

If the parity check symbols also have locality constraints, then a full-symbol locality profile is defined, r_iRepresenting the locality of the ith symbol in the code, i is more than or equal to 1 and less than or equal to n, and r is_a＝max(r₁，r₂，…，r_n) Then its full-symbol locality profile is represented as

the above-mentioned

The Gabidulin code is marked as an (N, K, N-K +1) code, wherein N is the number of coding symbols, K is the number of information symbols, and N-K +1 is the minimum distance of code words;

is one of the code words, defined as

(x) is a linear polynomial with information symbols as a coefficient in a finite field of order m x q, i.e. the number of elements in the finite field is m x q, g₁，…，g_NIs that

A particular point on;

the Gabidulin code belongs to a rank distance code;

if the linear code satisfies that the code length is M, the redundancy is Y and the minimum distance is Y +1, the (M, M-Y, Y +1) code is a maximum distance separable MDS code; if M-Y information sign bits are expanded into M information sign bits after being coded, when any one of the M-Y information sign bits is lost or damaged, K sign bits in the existing M-1 sign bits can be used for recovering the lost or damaged information sign bits;

the Singleton upper bound is a measure of the codeword, i.e. an upper bound on the number of codewords when the codeword length and minimum distance are given; a. the_q(n, d) is the maximum value that the number of q element code words can reach, namely A_q(n，d)≤q^n-d+1Wherein q represents that the codeword is a q-ary code; if a code word reaches the Singleton upper bound, the minimum distance of the code word can reach the maximum value; if it is

(N, K, D) rank distance code over field

The number of coding symbols is N, the number of information symbols is K, and the minimum rank distance is D, then its Singleton upper bound can be converted into the expression of the minimum distance of the codeword, that is: