WO2023134130A1

WO2023134130A1 - Galois field multiplier and erasure coding and decoding system

Info

Publication number: WO2023134130A1
Application number: PCT/CN2022/102524
Authority: WO
Inventors: 张磊; 王明明; 王凛
Original assignee: 苏州浪潮智能科技有限公司
Priority date: 2022-01-14
Filing date: 2022-06-29
Publication date: 2023-07-20
Also published as: CN114063973A; CN114063973B

Abstract

Disclosed in the present application are a Galois field multiplier and an erasure coding and decoding system. The Galois field multiplier comprises a plurality of basic operation units connected in series and a plurality of cyclic processing units connected in series, wherein the total number of basic operation units and the total number of cyclic processing units are determined according to a data bit width of input data of the Galois field multiplier. Each basic operation unit performs a Galois field multiplication operation on received data and a target generation element, and outputs a multiplication calculation result to the next operation unit and the corresponding cyclic processing unit. A cyclic processing unit group is used for determining the current number of cycles according to the input data, initialized data, and a Galois field multiplication operation result output by a basic operation unit group, and used for outputting a final calculation result.

Description

Galois Field Multiplier and Erasure Correction Codec System

Cross References to Related Applications

This application claims the priority of the Chinese patent application with the application number 202210039878.0 and the application name "Galois Field Multiplier and Erasure Correction Codec System" submitted to the China Patent Office on January 14, 2022, the entire contents of which are incorporated by reference incorporated in this application.

technical field

The present application relates to the field of computer technology, in particular to a Galois field multiplier and an erasure correction codec system.

Background technique

In the fields of data transmission and data storage, erasure codes are favored for their lower storage costs. RS code (Reed-Solomon, Reed-Solomon code) is a relatively common EC code (Erasure Code, erasure code), which can calculate N check data blocks based on M data blocks. Among the total number of M+N data blocks, only N normal data blocks can be selected arbitrarily to recover all the original data. Especially in the field of data storage, erasure coding is an extremely important means to ensure data reliability. The RS erasure coding process is shown in Figure 1, where B is the matrix used for coding, the gray parts such as the lower half B11 are Cauchy or Vandermonde matrices, D is the storage data disk that needs erasure correction, and the obtained C is the coded matrix. The data. When some data blocks are lost, a new matrix operation relationship is reorganized, and multiplied by its inverse matrix to obtain the original data. This process is also the RS erasure decoding process shown in Figure 2. Among them, Survivors is the remaining normal data after the abnormal storage occurs, B' is the matrix re-formed corresponding to the encoding array B in the row where the normal data is located, and B' ^-1 is the inverse matrix of B'.

GF (Galois Field, Galois field) multiplication is widely used in RS codec, considering the increasing number of storage disks and the amount of data stored on each disk in distributed storage systems, high speed and Distributed storage with high throughput rate using high-speed RS erasure calculation is the main challenge of current design, so the application of Galois field multiplier with hardware circuit is born. The multiplication operation on the Galois field uses the theory of minimum polynomial simplification of high-order matrix operations in linear algebra. The basic idea is: first convert two vectors into two polynomials respectively, and then perform polynomial multiplication on the two polynomials, and convert the result of polynomial multiplication to the original polynomial modulo operation into a vector. The traditional Galois field multiplier is implemented by multiplying first and then taking modulus, which takes more cycles and is more complicated to implement. In order to solve the technical disadvantages of the traditional method, the related technology uses the method of looking up the table instead of the method of modulus, so that the calculation cycle can be greatly reduced.

A generator is a special type of element on a domain, and the power of a generator can traverse all elements on the domain. For example, g is a generator on the field GF(2 ^w ), then the set {g0, g1, ..., g(2 ^w-1 )} contains all non-zero elements on the field GF(2 ^w ). In the field GF(2 ^w ), 2 is always a generator. Applying generators to polynomials, all polynomials in GF(2 ^w ) can be obtained by polynomial generator g through exponentiation, that is, any element z in the field can be expressed as z=g ^k . GF(2 ^w ) is a finite field, but the exponent k is infinite, so there must be a cycle, the cycle period is 2 ^w -1, and g cannot generate polynomial 0. When k is greater than or equal to 2 ^w -1, g ^k =g ^{(k%(2^w-1))} , where ^ represents an exclusive OR operation. For z=g^k, there are forward process and reverse process, known exponent k calculates z value as positive process, known z value calculates exponent k as reverse process. For multiplication, assuming a=g ⁱ , b=g ^j , then a*b=g ⁱ *g ^j =g ^(i+j) . The method of looking up the table is to get i and j according to a and b respectively, and then look up the table g^(i+j). Therefore, it is necessary to construct a positive table and a negative table, which are respectively recorded as gflog and gfilog on the GF(2^w) field. The positive table gflog maps the binary form to the polynomial form, and the negative table gfilog maps the polynomial form to the binary form. The calculation formula of look-up table GF multiplication is:

c=a*b=gfilog[(gflog[a]+gflog[b])mod(2^w–1)];

When the hardware implements the look-up table GF multiplier, the calculation sequence can be divided into three steps:

Step 1: Select the corresponding value in the positive table according to a and b;

The second step: check the value of the correction table and take the remainder;

Step 3: The value of the remainder lookup table is the final result.

Considering the real-time performance, the hardware implementation of the multiplier architecture is shown in Figure 3. The hardware implementation of the look-up table GF multiplier needs to store two positive tables and one negative table. In general, when the GF multiplier is used, the data bit width w has been determined, then the remainder operation is transformed into a subtraction operation for a constant, so the resource consumption is mainly a table lookup operation. The hardware implementation method of the look-up table GF multiplier can be seen from its algorithm and hardware implementation method, the method principle is simple, the computational complexity is small, the timeliness is high, but this method is owing to use a plurality of LUT (Lookup table, lookup table), This causes a large loss of hardware resources and chip area. It can be seen from the hardware implementation scheme of the erasure codec look-up table GF multiplier that when the data volume of the erasure codec system is large, a large number of GF multipliers need to be used. The number of tables increases exponentially. The inventor realizes that at this time, the hardware resource overhead increases, so that the original advantage of the look-up table GF multiplier is not obvious, and there is a problem of introducing larger chip area and space resources.

In view of this, how to reduce the hardware resources consumed in the table lookup process of the GF multiplier is a technical problem to be solved by those skilled in the art.

Contents of the invention

On the one hand, an embodiment of the present application provides a Galois field multiplier, including a group of basic operation units and a group of cyclic processing units;

The basic arithmetic unit group includes a series-connected initial arithmetic unit, a plurality of intermediate arithmetic units and a termination arithmetic unit, and the loop processing unit group includes a plurality of series-connected loop processing units; the total number of arithmetic units contained in the basic arithmetic unit group and the loop processing unit The total number of cyclic processing units included is determined according to the data bit width of the input data of the Galois field multiplier;

The initial operation unit is used to perform Galois field multiplication on the first input data and the target generator, and output the result of the multiplication to the next operation unit and the corresponding loop processing unit; each intermediate operation unit is used to process the received Galois field multiplication is performed on the data and the target generator, and the multiplication result is output to the next operation unit and the corresponding loop processing unit; the termination operation unit is used to perform Galois field multiplication on the received data and the target generator field multiplication, and output the result of the multiplication to a corresponding cyclic processing unit; and

The cyclic processing unit group is used to determine the current cycle number according to the second input data, the initialization data and the Galois field multiplication operation result output by the basic operation unit group, and output the final calculation result.

In one of the embodiments, the total number of cyclic processing units included in the cyclic processing unit group is the same as the data bit width value of the input data of the Galois field multiplier; each bit data of the second input data corresponds to a cyclic processing unit .

In one embodiment, the total number of arithmetic units included in the basic arithmetic unit group is the difference between the data bit width value of the input data of the Galois field multiplier and 1.

In one of the embodiments, each loop processing unit of the loop processing unit group includes a register, an exclusive OR gate and a selector; the register is connected to the exclusive OR gate and the selector respectively, and the exclusive OR gate is connected to the selector;

The register is used to store the original data received by the associated loop processing unit and for timing alignment; the original data is the previous data result or initialization data output by the previous loop processing unit;

The XOR gate is used to perform XOR calculation on the multiplication calculation result output by the operation unit corresponding to the loop processing unit and the original data, and output the XOR calculation result to the selector; and

The selector is used to select target data as an output result from the original data and the XOR calculation result according to the bit value of the second input data.

In one embodiment, the register is a D-type flip-flop.

Another aspect of the embodiment of the present application provides an erasure correction codec system, including:

Including data distribution module, operation module and reordering module;

The data distribution module is used to distribute the data to be deleted to obtain multiple rows of data to be calculated;

The operation module includes a plurality of operation sub-modules, and each operation sub-module is used to multiply and accumulate a row of data to be calculated; each operation sub-module includes an adder and a plurality of Galois field multipliers as in any one of the preceding items; The total number of Galois field multipliers is determined according to the number of bus bytes; the adder is used to accumulate the multiplication results output by each Galois field multiplier; and

The reordering module is used for splicing the multiplication and accumulation calculation results output by each operator module according to the distribution order.

In one of the embodiments, the operation sub-module also includes a PE controller, and the PE controller is respectively connected with the adder and each Galois field multiplier;

The PE controller is used to determine the number of accumulation iterations of the adder and the number of uses of the data to be erased according to the total number of multiplication and accumulation calculations.

In one of the embodiments, the operation sub-module also includes an EC block output unit;

The EC block output unit is used to control the back pressure of the operation sub-module and whether the re-sequence module performs an output operation according to the operation state of the operation sub-module and the output state of the re-sequence module.

In one of the embodiments, the total number of Galois field multipliers included in the operation sub-module is the same as the number of bytes of the bus; The product value of the number of matrix rows and the number of bus bytes.

In one embodiment, the adder is a Galois field adder.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features and advantages of the application will be apparent from the description, drawings, and claims.

Description of drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application or related technologies, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or related technologies. Obviously, the accompanying drawings in the following description are only For some embodiments of the present invention, those of ordinary skill in the art can also obtain other drawings based on these drawings on the premise of not paying creative efforts.

FIG. 1 is a schematic diagram of an RS erasure coding process in an exemplary application scenario provided by the present application according to one or more embodiments;

FIG. 2 is a schematic diagram of an RS erasure correction decoding process in an exemplary application scenario provided by the present application according to one or more embodiments;

FIG. 3 is a schematic diagram of a hardware implementation method of a table lookup GF multiplier in an exemplary application scenario provided by the present application according to one or more embodiments;

FIG. 4 is a structural diagram of a specific implementation of a Galois field multiplier provided by the present application according to one or more embodiments;

FIG. 5 is a structural diagram of another specific implementation manner of a Galois field multiplier provided by the present application according to one or more embodiments;

FIG. 6 is a structural diagram of a specific implementation manner of a Galois field multiplier in a schematic example provided by the present application according to one or more embodiments;

Fig. 7 is a structural diagram of a specific implementation manner of a cycle processing unit in a schematic example provided by the present application according to one or more embodiments;

Fig. 8 is a structural diagram of another specific implementation manner of a cycle processing unit in a schematic example provided by the present application according to one or more embodiments;

Fig. 9 is a structural diagram of a specific implementation of an erasure correction coding and decoding system provided by the present application according to one or more embodiments;

Fig. 10 is a structural diagram of another specific implementation manner of an erasure correction coding and decoding system provided by the present application according to one or more embodiments.

Detailed ways

In order to enable those skilled in the art to better understand the solution of the present application, the present application will be further described in detail below in conjunction with the drawings and specific implementation methods. Apparently, the described embodiments are only some of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

The terms "first", "second", "third" and "fourth" in the specification and claims of this application and the above drawings are used to distinguish different objects, rather than to describe a specific order . Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or device comprising a series of steps or units is not limited to the listed steps or units, but may include unlisted steps or units.

After introducing the technical solutions of the embodiments of the present application, various non-limiting implementation manners of the present application will be described in detail below.

First, referring to FIG. 4, FIG. 4 is a schematic structural diagram of a Galois field multiplier provided in an embodiment of the present application in an implementation manner. The embodiment of the present application may include the following:

The Galois field multiplier of the present embodiment includes a basic operation unit group 41 and a loop processing unit group 42, each operation unit included in the basic operation unit group 41 has the same structure, and each loop processing unit included in the loop processing unit group 42 The structures are also the same. In order to describe the connection relationship and data processing flow of each computing unit more clearly, the basic computing unit group 41 may include a serially connected initial computing unit, a plurality of intermediate computing units, and a terminating computing unit, and the loop processing unit group 42 may include multiple The cycle processing units connected in series; correspondingly, the cycle processing unit group 42 may include a serially connected initial cycle processing unit, a plurality of intermediate cycle processing units and a termination cycle unit. The initial computing unit is the first computing unit connected in series in the basic computing unit group 41, which receives the original data, that is, one of the multipliers or polynomials for Galois field calculations. This embodiment can be called For the first input data, the terminating computing unit refers to the last computing unit connected in series, and the intermediate computing unit refers to computing units connected in series between the starting computing unit and the terminating computing unit. Similarly, the initial cyclic processing unit is the first cyclic processing unit connected in series in the cyclic processing unit group 42, and what it receives is the original data, that is, another multiplier or polynomial for performing Galois field calculations. The embodiment can be referred to as the second input data, the terminating cyclic processing unit refers to the last cyclic processing unit connected in series, and the intermediate cyclic processing unit refers to each cyclic processing unit connected in series between the starting cyclic processing unit and the terminating cyclic processing unit unit. The total number of arithmetic units included in the basic arithmetic unit group 41 and the total number of loop processing units included in the loop processing unit group 42 are determined according to the input data of the Galois field multiplier, that is, the data bit width of the first input data and the second input data . Optionally, the total number of cyclic processing units included in the cyclic processing unit group 42 may be the same as the data bit width value of the input data of the Galois field multiplier; correspondingly, each bit of the second input data uniquely corresponds to A loop processing unit. The total number of arithmetic units included in the basic arithmetic unit group 41 is the difference between the data bit width value of the input data of the Galois field multiplier and 1. For example, if the input data is 8-bit wide, the total number of cyclic processing units included in the cyclic processing unit group 42 is 8, and the total number of operation units included in the basic operation unit group 41 is 7. If the input data is 16 bits wide, the total number of cyclic processing units included in the cyclic processing unit group 42 is 16, and the total number of operation units included in the basic operation unit group 41 is 15. If the input data is Nbit wide, the total number of loop processing units included in the loop processing unit group 42 is N, and the total number of operation units included in the basic operation unit group 41 is N−1. If the computing unit included in the basic computing unit group 41 is the gmul2 module, and the loop processing unit included in the loop processing unit group is the cacu&sel module, then the structure of the Galois field multiplier can be as shown in Figure 5 and Figure 6, Figure 5 shows that the input data is 8bit wide, the number of gmul2 modules is 7, and the number of cacu&sel modules is 8. Figure 6 shows that the input data is Nbit wide, the number of gmul2 modules is N-1, and the number of cacu&sel modules is N.

In this embodiment, the initial operation unit is configured to perform Galois field multiplication on the first input data and the target generator, and output the result of the multiplication to the next operation unit and the corresponding loop processing unit. The target generator can be any generator, for example, it can be 2. Each intermediate operation unit is used to perform Galois field multiplication on the received data and the target generator, and output the result of the multiplication to the next operation unit and the corresponding loop processing unit; the data received by each intermediate operation unit is The multiplication calculation result output by the previous intermediate operation unit. The termination operation unit is used to perform Galois field multiplication on the received data and the target generator, and output the result of the multiplication to the corresponding cyclic processing unit; the cyclic processing unit group is used to The Galois field multiplication operation result output by the basic operation unit group determines the current cycle number, and outputs the final calculation result. The initialization data can be the initialization value of the input data, and the data bit width of the initialization data is the same as the data bit width of the input data. For example, for the input data whose data bit width is 8 bits, the initialization data can be 8'd0. Specifically, the initial cycle processing unit is used to determine the current cycle number according to the corresponding bit value of the second input data, the initialization data and the first input data, and output the current calculation result to the next cycle processing unit. Each intermediate cycle processing unit is used to determine the current cycle number according to the calculation result input by the previous cycle processing unit, the multiplication calculation result input by the corresponding operation unit, and the corresponding bit value of the second input data, and output the current calculation result to terminate the cycle The processing unit terminates the loop processing unit to determine the final calculation result according to the calculation result input by the previous loop processing unit, the multiplication calculation result input by the corresponding operation unit, and the corresponding bit value of the second input data, and output the final calculation result. The so-called final calculation result is the Galois field multiplication value of the first input data and the second input data.

In the technical solution provided by the embodiment of the present application, the Galois field multiplier is structurally designed based on the pipeline method, and the Galois field polynomial can be changed in real time according to the usage requirements for calculation, and the polynomial can be configured, and the table lookup is no longer fixed. The fixed forward and reverse tables effectively improve the flexibility of the GF multiplier. The structure of the pipelined GF multiplier is used to replace the structure of the original look-up table, and there is no need to use multiple look-up tables to consume hardware resources like the traditional look-up table GF multiplier, thus making the space resources and area of the GF multiplier in the RS erasure codec The substantial reduction of the table lookup effectively reduces the consumption of hardware resources occupied by the table, reduces the consumption of hardware resources and area of the storage system, and does not affect the timeliness of calculation.

The above embodiment does not make any limitation on the structure of the loop processing unit. This embodiment also provides an optional implementation of the loop processing unit. As shown in FIG. 7, the loop processing unit is used to Bit

value

0 and 1 calculate the number of times of circulation and whether to use the result of the gmul2 among the selection operation unit such as Fig. 5, can comprise following content:

In this embodiment, each loop processing unit of the loop processing unit group 42 includes a register, an XOR gate and a selector; the register is respectively connected to the XOR gate and the selector, and the XOR gate is connected to the selector.

The register is used to store the original data received by the loop processing unit to which it belongs and to perform timing alignment. Among them, the original data is the previous data result or initialization data output by the previous loop processing unit; for the register of the initial loop processing unit, the original data is the initialization data, for the register of the intermediate loop processing unit and the termination loop processing unit In other words, the original data is the calculation result output by the previous cyclic processing unit. In this embodiment, the calculation result output by the previous cyclic processing unit is called the previous data result. The register can be, for example, a D-type flip-flop DFF. In addition to storing data as a storage unit in the entire GF multiplier, the DFF register can also implement sequential logic in hardware design, that is, it can also be used for timing alignment of pipeline design. For other types of devices, this does not affect the implementation of this application.

The XOR gate is used to perform XOR calculation on the multiplication calculation result output by the operation unit corresponding to the loop processing unit and the original data, and output the XOR calculation result to the selector. In this embodiment, the original data of the XOR gate is sent to it by the register, but it needs to be delayed by one clock cycle, and the waiting of the clock cycle is the time for the corresponding operation unit to calculate the result.

The selector is used to select target data as an output result from the original data and the XOR calculation result according to the bit value of the second input data. Optionally, the bit value of the second input data is 0, the original data is selected as the target data, the bit value of the second input data is 1, and the XOR calculation result is selected as the target data. Of course, the bit value of the second input data is 1, the original data can be selected as the target data, the bit value of the second input data is 0, and the XOR calculation result can be selected as the target data, which can be flexibly determined by those skilled in the art according to actual needs . The selector in this embodiment is a two-to-one selector, which selects whether the calculation process uses the calculation result of the XOR operation unit of the previous data result or only the previous data result. Optionally, referring to FIG. 7 and FIG. 8 , whether the calculation selected by the one-of-two selector uses the calculation of pre_result XOR gmul2 or only the calculation of pre_result. When the bit value corresponding to data_a[n] is 1, select pre_result XOR gmul2; when the bit value corresponding to data_a[n] is 0, select pre_result.

It can be seen from the above that the implementation of the Galois field multiplier of the pipelined method in this embodiment can be realized by relying on simple AND-OR gates and selectors, which improves the flexibility of the GF multiplier and reduces the consumption of hardware resources occupied by the look-up table , and supports polynomial configurability.

In order to make those skilled in the art clearly understand the technical solution of the present application, the present application provides a schematic example of a Galois field multiplier in conjunction with FIG. 6, which may include the following:

In this embodiment, an irreducible polynomial P(x)=x ⁸ +x ⁴ +x ³ +x+1 specified by the AES (Advanced Encryption Standard, symmetric encryption algorithm) algorithm is used for an analysis example. For the convenience of programming, first find the law, assuming that the function GMul(u, v) represents Galois field multiplication, u and v are not divided into left and right, first look at the Galois field calculation multiplied by 2, that is, GMul(2, v):

For v=7, 2*7=x*(x ² +x+1)=x ³ +x ² +x, it can be seen that multiplying a number by 2 in the Galois Field is equivalent to shifting the number to the left by one bit. If the degree of the polynomial x corresponding to v is greater than 7, that is, the highest bit of v is 1, that is, if v>>7==1, modP(x) simplification is performed, for example:

2*129＝x*(x ⁷ +1)＝x ⁸ +x＝(x ⁸ +x)+P(x)

=(x ⁸ +x)+(x ⁸ +x ⁴ +x ³ +x+1)=x ⁴ +x ³ +1

=00011001=0x19=00000010∧00011011=0x02∧0x1B

=(129<<1)∧0x1B

2*176＝x*(x ⁷ +x ⁵ +x ⁴ )＝x ⁸ +x ⁶ +x ⁵ ＝(x ⁸ +x ⁶ +x ⁵ )+P(x)＝

(x ⁸ +x ⁶ +x ⁵ )+(x ⁸ +x ⁴ +x ³ +x+1)＝x ⁶ +x ⁵ +x ⁴ +x ³ +x+1＝

01111011=0x7B=01100000∧00011011=0x60∧0x1B=(176<<

1) ∧0x1B

From the above several examples, the calculation law can be summarized, that is, GMul(2, v) is used to perform the following calculation relationship:

It can be calculated by GMul(2, v):

GMul(3,v)

3*v=(2+1)*v=GMul(2, v)+v=GMul(2, v)^v

GMul(4,v)

4*v=2*2*v=GMul(2,GMul(2,v))

GMul(7,v)

7*v=(2*2+2+1)*v=GMul(2,GMul(2,v))^GMul(2,v)^v

GMul(8,v)

8*v=(2*2*2)*v=GMul(2,GMul(2,GMul(2,v)))

GMul(11111111b,v)

11111111b＝2 ⁷ +2 ⁶ +2 ⁵ +2 ⁴ +2 ³ +2 ² +2+1

So it is equivalent to: GMul(2, v) loops 7 times ^GMul(2, v) loops 6 times...GMul(2, v) loops 1 time ^v. ^ means XOR operation, << means left shift, >> means right shift. According to the calculation and derivation of the Galois field polynomial multiplier above, the module is designed as a Galois field multiplier architecture of the pipeline method as shown in Figure 5 and Figure 6 . That is to say, in this embodiment, the operation formula of the GF multiplier is result=data_a*data_b, data_a and data_b are two multipliers respectively, that is, two input data of the GF multiplier, and the external interface of the GF multiplier has only two An 8bit input, the initialization 8'd0 is defined according to the 8bit bit width of the multiplicand, if the bit width is 16bit, the initialization data is 16'd0, but the initialization value of 0 is immutable, see the external interface less than. The data bit width of the input data is 8 bits, and the initialization data is 8'd0. The Galois field multiplier includes 7 serially connected GMul(2, v) and 8 serially connected cacu&sel, and GMul(2, v) is the operation unit, cacu&sel is a loop processing unit, and each bit of 8-bit input data is input to one cacu&sel.

It can be seen from the above that the GF multiplier of the pipelined method implemented in the embodiment of the present application has low resource consumption, fast performance, high throughput and good flexibility, and has the characteristics of novelty, creativity, simplicity and practicality.

The embodiment of the present application also provides a system in a corresponding application scenario for the Galois field multiplier, which further makes the Galois field multiplier more practical. The following is an introduction to the erasure correction codec system provided by the embodiment of this application, please refer to Figure 9, which may include the following content:

The erasure correction codec system may include a data distribution module 91 , an operation module 92 and a reordering module 93 . The data distribution module 91 , the computing module 92 and the reordering module 93 are connected to each other through a bus.

Wherein, the data distribution module 91 can be used to perform data distribution on the data to be erased to obtain multiple rows of data to be calculated. The data to be erased includes matrix data and data. For the erasure coding process, the data to be erased is the encoded matrix data and the original disk data, as shown in the B matrix and D data in Figure 1. For the erasure correction decoding process, the data to be erasure correction is the inverse matrix corresponding to the matrix re-formed by the encoding matrix corresponding to the line where the normal data is located, and the remaining normal data after the abnormal transmission of the storage compass, as shown in B′ ^-1 in Figure 2 Matrix and Survivors data. In the entire erasure correction codec, it can be understood as matrix multiplication calculation, and the fundamental calculation of matrix multiplication is multiply-accumulate. The corresponding rows need to be multiplied and accumulated. Based on this, the matrix data needs to be split into multiple rows of data through the data distribution module 91, and the multiplication and calculation of each row of data are performed separately. Finally, the multiplication and calculation results of each row are accumulated to obtain the final result.

The operation module 92 may include a plurality of operation sub-modules, because each operation sub-module performs multiplication based on the Galois field multiplier, and the Galois field multiplier does not calculate all the bytes of each row of data at the same time, Instead, the corresponding number of bytes is calculated based on the byte operations supported by the Galois field multiplier, so each operator module includes multiple Galois field multipliers, and is used to calculate each Galois field multiplier The calculation results of the multiplier are accumulated and calculated by the adder, that is, each operation sub-module is used to multiply and accumulate a row of data to be calculated; each operation sub-module includes an adder and a plurality of gals in any one of the above embodiments Galois field multiplier; the total number of Galois field multipliers is determined according to the number of bus bytes. Optionally, if the Galois field multiplier is a single-byte operation, the total number of Galois field multipliers contained in each operation sub-module is the same as the number of bus bytes; the Galois field multiplier contained in the operation module The total number of field multipliers is the product value of the number of matrix rows of data to be erased and the number of bus bytes. The adder is used for accumulating the multiplication calculation results output by each Galois field multiplier. The adder may be, for example, a Galois field adder. The reordering module 93 is used to concatenate the multiplication and accumulation calculation results output by each operation sub-module according to the distribution order of the data to be erased by the data distribution module 91 .

Further, since each operation sub-module needs to perform multiple multiplication and addition calculations, in order to ensure that the operation sub-module successfully executes the calculation, based on the above-mentioned embodiment, each operation sub-module can also include a PE controller, and the PE controller communicates with the adder respectively It is connected with each Galois field multiplier; the PE controller is used to calculate the total number according to the multiplication and accumulation, and determine the number of accumulation iterations of the adder and the number of times of use of the data to be erased.

Since the operation module includes a plurality of operation sub-modules, the reordering module 93 is used to output the final calculation result. In order to ensure that the result output is correct, based on the above-mentioned embodiment, each operation sub-module can also include an EC block output unit; the EC block output unit It is used to control whether the back pressure of the operation sub-module and the reordering module perform an output operation according to the operation state of the operation sub-module and the output state of the reordering module.

In order to make those skilled in the art more clearly understand the technical solution of this application, this application also provides a schematic example in conjunction with Figure 10, which may include the following content:

In this embodiment, the bus is 16 bytes, the Galois field multiplier is a single-byte operation, and the adder is a Galois field adder. In the entire erasure correction codec, it can be understood as matrix multiplication calculation, and the fundamental calculation of matrix multiplication is multiply-accumulate. The corresponding rows need to be multiplied and accumulated. Specifically, the bus data of each row is subdivided into 16 bytes, and the current multiplier is a single-byte operation, so 16 multipliers are needed for parallel calculation to calculate the bus data once. That is, each operation sub-module includes 16 Galois field multipliers executed in parallel, and the number of matrix rows of the data to be erased × the number of bus bytes.

In this embodiment, the data distribution module distributes the matrix and data of the erasure correction calculation according to the number of rows corresponding to the matrix calculation. Specifically, the data to be erased is RS/RS ^-1 , and the 4*16-byte RS/RS ^-1 matrix is divided into 4 rows, and each row contains 16 bytes of data. Feed 16-byte chunks of data into all rows. Then the input for each row is a data block of 16 bytes and a matrix block of 16 bytes. Using 16 GF multipliers, the data block of each byte and the matrix block of each byte are respectively used as two inputs of GF multipliers data_a and data_b. According to the number of multiplication and accumulation, the PE controller calculates the number of accumulation iterations and the time when the operation is completed. According to the scheduling of the PE controller, determine the use times of RS/RS ^-1 and the accumulation times of the adder. GF adder In fact, the essence of the GF adder is an XOR operation, which is used as an accumulation calculation of matrix multiplication and accumulation in this embodiment. The output of the EC block controls the back pressure of the front stage and whether the output of the latter stage is based on the operation state of the previous stage operation module and the output state of the subsequent stage operation module. The EC data block reordering module has an EC block output module for each row of matrix operations. After the specific calculation is completed, the position of the row needs to be determined and re-spliced and sent.

For the functions of the functional modules of the Galois field multiplier in the embodiment of the present application, reference may be made to the relevant description of the implementation process in the foregoing embodiments, and details are not repeated here.

It can be seen from the above that, from the perspective of the overall system in the embodiment of the present application, the GF multiplier of the pipeline method is fully suitable for the functional requirements of erasure correction codec, and the hardware implementation is relatively easy, and can ensure high computing efficiency and data throughput. It greatly reduces the resource consumption of the look-up table GF multiplier in hardware implementation, and can flexibly change the Galois field polynomial in real time according to the application requirements of various system applications for calculation.

Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other. Professionals can further realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two. In order to clearly illustrate the possible Interchangeability, in the above description, the components and steps of each example have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

The technical features of the above embodiments can be combined arbitrarily. To make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered to be within the range described in this specification.

The above-mentioned embodiments only express several implementation modes of the present application, and the description thereof is relatively specific and detailed, but it should not be construed as limiting the scope of the patent for the invention. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the scope of protection of the patent application should be based on the appended claims.

Claims

A Galois field multiplier, characterized in that it includes a group of basic arithmetic units and a group of cyclic processing units;

The basic operation unit group includes a series-connected start operation unit, a plurality of intermediate operation units and a termination operation unit, and the cycle processing unit group includes a plurality of series-connected cycle processing units; the operations contained in the basic operation unit group The total number of units and the total number of cyclic processing units included in the cyclic processing unit are determined according to the data bit width of the input data of the Galois field multiplier;

The initial operation unit is used to perform Galois field multiplication on the first input data and the target generator, and output the result of the multiplication to the next operation unit and the corresponding loop processing unit; each intermediate operation unit is used to Galois field multiplication is performed on the received data and the target generator, and the result of the multiplication is output to the next operation unit and the corresponding loop processing unit; the termination operation unit is used to process the received data and the The target generator performs Galois field multiplication, and outputs the result of the multiplication to the corresponding cyclic processing unit; and

The cycle processing unit group is used to determine the current cycle number according to the second input data, initialization data and the Galois field multiplication operation result output by the basic operation unit group, and output the final calculation result.
The Galois field multiplier according to claim 1, wherein the total number of loop processing units included in the loop processing unit group is the same as the data bit width value of the input data of the Galois field multiplier; Each bit of data in the second input data corresponds to a loop processing unit.
The Galois field multiplier according to claim 2, wherein the total number of computing units included in the basic computing unit group is equal to the data bit width value and 1 of the input data of the Galois field multiplier difference.
The Galois field multiplier according to any one of claims 1 to 3, wherein each loop processing unit of the loop processing unit group includes a register, an exclusive OR gate and a selector; the registers are respectively connected with the XOR gate and the selector, and the XOR gate is connected with the selector;

The register is used to store the original data received by the associated loop processing unit and to perform timing alignment; the original data is the previous data result or initialization data output by the previous loop processing unit;

The XOR gate is used to perform XOR calculation on the multiplication calculation result output by the operation unit corresponding to the loop processing unit and the original data, and output the XOR calculation result to the selector; and

The selector is used to select target data as an output result from the original data and the XOR calculation result according to the bit value of the second input data.
The Galois field multiplier according to claim 4, wherein the register is a D-type flip-flop.
An erasure correction codec system is characterized in that it includes a data distribution module, an operation module and a reordering module;

The data distribution module is used to distribute the data to be deleted to obtain multiple rows of data to be calculated;

The operation module includes a plurality of operation sub-modules, and each operation sub-module is used to multiply and accumulate a row of data to be calculated; each operation sub-module includes an adder and a plurality of operation sub-modules as described in any one of claims 1 to 5. The Galois field multiplier; the total number of the Galois field multipliers is determined according to the number of bus bytes; the adder is used to accumulate the multiplication calculation results output by each of the Galois field multipliers operation; and

The reordering module is used for splicing the multiplication and accumulation calculation results output by each operator module according to the distribution order.
The erasure correction codec system according to claim 6, wherein the operation sub-module further includes a PE controller, and the PE controller is respectively connected to the adder and each of the Galois field multipliers ;

The PE controller is configured to determine the number of accumulation iterations of the adder and the number of uses of the data to be erased according to the total number of multiplication and accumulation calculations.
The erasure correction codec system according to claim 7, wherein the operation sub-module further includes an EC block output unit;

The EC block output unit is used to control the back pressure of the operation sub-module and whether the re-sequence module performs an output operation according to the operation state of the operation sub-module and the output state of the re-sequence module.
The erasure correction encoding and decoding system according to any one of claims 6 to 8, wherein the total number of Galois field multipliers included in the operation sub-module is the same as the number of bus bytes; the operation The total number of Galois field multipliers included in the module is the product value of the number of matrix rows of the data to be erased and the number of bytes of the bus.
The erasure correction codec system according to claim 9, wherein the adder is a Galois field adder.