WO2023134130A1 - Galois field multiplier and erasure coding and decoding system - Google Patents

Galois field multiplier and erasure coding and decoding system Download PDF

Info

Publication number
WO2023134130A1
WO2023134130A1 PCT/CN2022/102524 CN2022102524W WO2023134130A1 WO 2023134130 A1 WO2023134130 A1 WO 2023134130A1 CN 2022102524 W CN2022102524 W CN 2022102524W WO 2023134130 A1 WO2023134130 A1 WO 2023134130A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
galois field
module
processing unit
multiplication
Prior art date
Application number
PCT/CN2022/102524
Other languages
French (fr)
Chinese (zh)
Inventor
张磊
王明明
王凛
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023134130A1 publication Critical patent/WO2023134130A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only

Definitions

  • the present application relates to the field of computer technology, in particular to a Galois field multiplier and an erasure correction codec system.
  • RS code Random-Solomon, Reed-Solomon code
  • EC code Erasure Code, erasure code
  • the RS erasure coding process is shown in Figure 1, where B is the matrix used for coding, the gray parts such as the lower half B11 are Cauchy or Vandermonde matrices, D is the storage data disk that needs erasure correction, and the obtained C is the coded matrix. The data. When some data blocks are lost, a new matrix operation relationship is reorganized, and multiplied by its inverse matrix to obtain the original data.
  • This process is also the RS erasure decoding process shown in Figure 2.
  • Survivors is the remaining normal data after the abnormal storage occurs
  • B' is the matrix re-formed corresponding to the encoding array B in the row where the normal data is located
  • B' -1 is the inverse matrix of B'.
  • Galois Field, Galois field multiplication is widely used in RS codec, considering the increasing number of storage disks and the amount of data stored on each disk in distributed storage systems, high speed and Distributed storage with high throughput rate using high-speed RS erasure calculation is the main challenge of current design, so the application of Galois field multiplier with hardware circuit is born.
  • the multiplication operation on the Galois field uses the theory of minimum polynomial simplification of high-order matrix operations in linear algebra. The basic idea is: first convert two vectors into two polynomials respectively, and then perform polynomial multiplication on the two polynomials, and convert the result of polynomial multiplication to the original polynomial modulo operation into a vector.
  • the traditional Galois field multiplier is implemented by multiplying first and then taking modulus, which takes more cycles and is more complicated to implement.
  • the related technology uses the method of looking up the table instead of the method of modulus, so that the calculation cycle can be greatly reduced.
  • a generator is a special type of element on a domain, and the power of a generator can traverse all elements on the domain.
  • g is a generator on the field GF(2 w ), then the set ⁇ g0, g1, ..., g(2 w-1 ) ⁇ contains all non-zero elements on the field GF(2 w ).
  • 2 is always a generator.
  • GF(2 w ) is a finite field, but the exponent k is infinite, so there must be a cycle, the cycle period is 2 w -1, and g cannot generate polynomial 0.
  • k is greater than or equal to 2 w -1
  • g k g (k%(2 ⁇ w-1))
  • represents an exclusive OR operation.
  • known exponent k calculates z value as positive process
  • known z value calculates exponent k as reverse process.
  • the method of looking up the table is to get i and j according to a and b respectively, and then look up the table g ⁇ (i+j). Therefore, it is necessary to construct a positive table and a negative table, which are respectively recorded as gflog and gfilog on the GF(2 ⁇ w) field.
  • the positive table gflog maps the binary form to the polynomial form
  • the negative table gfilog maps the polynomial form to the binary form.
  • the calculation formula of look-up table GF multiplication is:
  • Step 1 Select the corresponding value in the positive table according to a and b;
  • the second step check the value of the correction table and take the remainder
  • Step 3 The value of the remainder lookup table is the final result.
  • the hardware implementation of the multiplier architecture is shown in Figure 3.
  • the hardware implementation of the look-up table GF multiplier needs to store two positive tables and one negative table.
  • the data bit width w has been determined, then the remainder operation is transformed into a subtraction operation for a constant, so the resource consumption is mainly a table lookup operation.
  • the hardware implementation method of the look-up table GF multiplier can be seen from its algorithm and hardware implementation method, the method principle is simple, the computational complexity is small, the timeliness is high, but this method is owing to use a plurality of LUT (Lookup table, lookup table), This causes a large loss of hardware resources and chip area.
  • LUT Lookup table, lookup table
  • an embodiment of the present application provides a Galois field multiplier, including a group of basic operation units and a group of cyclic processing units;
  • the basic arithmetic unit group includes a series-connected initial arithmetic unit, a plurality of intermediate arithmetic units and a termination arithmetic unit
  • the loop processing unit group includes a plurality of series-connected loop processing units; the total number of arithmetic units contained in the basic arithmetic unit group and the loop processing unit The total number of cyclic processing units included is determined according to the data bit width of the input data of the Galois field multiplier;
  • the initial operation unit is used to perform Galois field multiplication on the first input data and the target generator, and output the result of the multiplication to the next operation unit and the corresponding loop processing unit; each intermediate operation unit is used to process the received Galois field multiplication is performed on the data and the target generator, and the multiplication result is output to the next operation unit and the corresponding loop processing unit; the termination operation unit is used to perform Galois field multiplication on the received data and the target generator field multiplication, and output the result of the multiplication to a corresponding cyclic processing unit; and
  • the cyclic processing unit group is used to determine the current cycle number according to the second input data, the initialization data and the Galois field multiplication operation result output by the basic operation unit group, and output the final calculation result.
  • the total number of cyclic processing units included in the cyclic processing unit group is the same as the data bit width value of the input data of the Galois field multiplier; each bit data of the second input data corresponds to a cyclic processing unit .
  • the total number of arithmetic units included in the basic arithmetic unit group is the difference between the data bit width value of the input data of the Galois field multiplier and 1.
  • each loop processing unit of the loop processing unit group includes a register, an exclusive OR gate and a selector; the register is connected to the exclusive OR gate and the selector respectively, and the exclusive OR gate is connected to the selector;
  • the register is used to store the original data received by the associated loop processing unit and for timing alignment; the original data is the previous data result or initialization data output by the previous loop processing unit;
  • the XOR gate is used to perform XOR calculation on the multiplication calculation result output by the operation unit corresponding to the loop processing unit and the original data, and output the XOR calculation result to the selector;
  • the selector is used to select target data as an output result from the original data and the XOR calculation result according to the bit value of the second input data.
  • the register is a D-type flip-flop.
  • an erasure correction codec system including:
  • the data distribution module is used to distribute the data to be deleted to obtain multiple rows of data to be calculated;
  • the operation module includes a plurality of operation sub-modules, and each operation sub-module is used to multiply and accumulate a row of data to be calculated; each operation sub-module includes an adder and a plurality of Galois field multipliers as in any one of the preceding items; The total number of Galois field multipliers is determined according to the number of bus bytes; the adder is used to accumulate the multiplication results output by each Galois field multiplier; and
  • the reordering module is used for splicing the multiplication and accumulation calculation results output by each operator module according to the distribution order.
  • the operation sub-module also includes a PE controller, and the PE controller is respectively connected with the adder and each Galois field multiplier;
  • the PE controller is used to determine the number of accumulation iterations of the adder and the number of uses of the data to be erased according to the total number of multiplication and accumulation calculations.
  • the operation sub-module also includes an EC block output unit
  • the EC block output unit is used to control the back pressure of the operation sub-module and whether the re-sequence module performs an output operation according to the operation state of the operation sub-module and the output state of the re-sequence module.
  • the total number of Galois field multipliers included in the operation sub-module is the same as the number of bytes of the bus; The product value of the number of matrix rows and the number of bus bytes.
  • the adder is a Galois field adder.
  • FIG. 1 is a schematic diagram of an RS erasure coding process in an exemplary application scenario provided by the present application according to one or more embodiments;
  • FIG. 2 is a schematic diagram of an RS erasure correction decoding process in an exemplary application scenario provided by the present application according to one or more embodiments;
  • FIG. 3 is a schematic diagram of a hardware implementation method of a table lookup GF multiplier in an exemplary application scenario provided by the present application according to one or more embodiments;
  • FIG. 4 is a structural diagram of a specific implementation of a Galois field multiplier provided by the present application according to one or more embodiments;
  • FIG. 5 is a structural diagram of another specific implementation manner of a Galois field multiplier provided by the present application according to one or more embodiments;
  • FIG. 6 is a structural diagram of a specific implementation manner of a Galois field multiplier in a schematic example provided by the present application according to one or more embodiments;
  • Fig. 7 is a structural diagram of a specific implementation manner of a cycle processing unit in a schematic example provided by the present application according to one or more embodiments;
  • Fig. 8 is a structural diagram of another specific implementation manner of a cycle processing unit in a schematic example provided by the present application according to one or more embodiments;
  • Fig. 9 is a structural diagram of a specific implementation of an erasure correction coding and decoding system provided by the present application according to one or more embodiments;
  • Fig. 10 is a structural diagram of another specific implementation manner of an erasure correction coding and decoding system provided by the present application according to one or more embodiments.
  • FIG. 4 is a schematic structural diagram of a Galois field multiplier provided in an embodiment of the present application in an implementation manner.
  • the embodiment of the present application may include the following:
  • the Galois field multiplier of the present embodiment includes a basic operation unit group 41 and a loop processing unit group 42, each operation unit included in the basic operation unit group 41 has the same structure, and each loop processing unit included in the loop processing unit group 42 The structures are also the same.
  • the basic computing unit group 41 may include a serially connected initial computing unit, a plurality of intermediate computing units, and a terminating computing unit
  • the loop processing unit group 42 may include multiple The cycle processing units connected in series; correspondingly, the cycle processing unit group 42 may include a serially connected initial cycle processing unit, a plurality of intermediate cycle processing units and a termination cycle unit.
  • the initial computing unit is the first computing unit connected in series in the basic computing unit group 41, which receives the original data, that is, one of the multipliers or polynomials for Galois field calculations.
  • This embodiment can be called For the first input data, the terminating computing unit refers to the last computing unit connected in series, and the intermediate computing unit refers to computing units connected in series between the starting computing unit and the terminating computing unit.
  • the initial cyclic processing unit is the first cyclic processing unit connected in series in the cyclic processing unit group 42, and what it receives is the original data, that is, another multiplier or polynomial for performing Galois field calculations.
  • the embodiment can be referred to as the second input data
  • the terminating cyclic processing unit refers to the last cyclic processing unit connected in series
  • the intermediate cyclic processing unit refers to each cyclic processing unit connected in series between the starting cyclic processing unit and the terminating cyclic processing unit unit.
  • the total number of arithmetic units included in the basic arithmetic unit group 41 and the total number of loop processing units included in the loop processing unit group 42 are determined according to the input data of the Galois field multiplier, that is, the data bit width of the first input data and the second input data .
  • the total number of cyclic processing units included in the cyclic processing unit group 42 may be the same as the data bit width value of the input data of the Galois field multiplier; correspondingly, each bit of the second input data uniquely corresponds to A loop processing unit.
  • the total number of arithmetic units included in the basic arithmetic unit group 41 is the difference between the data bit width value of the input data of the Galois field multiplier and 1. For example, if the input data is 8-bit wide, the total number of cyclic processing units included in the cyclic processing unit group 42 is 8, and the total number of operation units included in the basic operation unit group 41 is 7.
  • the structure of the Galois field multiplier can be as shown in Figure 5 and Figure 6,
  • Figure 5 shows that the input data is 8bit wide, the number of gmul2 modules is 7, and the number of cacu&sel modules is 8.
  • Figure 6 shows that the input data is Nbit wide, the number of gmul2 modules is N-1, and the number of cacu&sel modules is N.
  • the initial operation unit is configured to perform Galois field multiplication on the first input data and the target generator, and output the result of the multiplication to the next operation unit and the corresponding loop processing unit.
  • the target generator can be any generator, for example, it can be 2.
  • Each intermediate operation unit is used to perform Galois field multiplication on the received data and the target generator, and output the result of the multiplication to the next operation unit and the corresponding loop processing unit; the data received by each intermediate operation unit is The multiplication calculation result output by the previous intermediate operation unit.
  • the termination operation unit is used to perform Galois field multiplication on the received data and the target generator, and output the result of the multiplication to the corresponding cyclic processing unit; the cyclic processing unit group is used to The Galois field multiplication operation result output by the basic operation unit group determines the current cycle number, and outputs the final calculation result.
  • the initialization data can be the initialization value of the input data, and the data bit width of the initialization data is the same as the data bit width of the input data. For example, for the input data whose data bit width is 8 bits, the initialization data can be 8'd0.
  • the initial cycle processing unit is used to determine the current cycle number according to the corresponding bit value of the second input data, the initialization data and the first input data, and output the current calculation result to the next cycle processing unit.
  • Each intermediate cycle processing unit is used to determine the current cycle number according to the calculation result input by the previous cycle processing unit, the multiplication calculation result input by the corresponding operation unit, and the corresponding bit value of the second input data, and output the current calculation result to terminate the cycle
  • the processing unit terminates the loop processing unit to determine the final calculation result according to the calculation result input by the previous loop processing unit, the multiplication calculation result input by the corresponding operation unit, and the corresponding bit value of the second input data, and output the final calculation result.
  • the so-called final calculation result is the Galois field multiplication value of the first input data and the second input data.
  • the Galois field multiplier is structurally designed based on the pipeline method, and the Galois field polynomial can be changed in real time according to the usage requirements for calculation, and the polynomial can be configured, and the table lookup is no longer fixed.
  • the fixed forward and reverse tables effectively improve the flexibility of the GF multiplier.
  • the structure of the pipelined GF multiplier is used to replace the structure of the original look-up table, and there is no need to use multiple look-up tables to consume hardware resources like the traditional look-up table GF multiplier, thus making the space resources and area of the GF multiplier in the RS erasure codec
  • the substantial reduction of the table lookup effectively reduces the consumption of hardware resources occupied by the table, reduces the consumption of hardware resources and area of the storage system, and does not affect the timeliness of calculation.
  • the loop processing unit is used to Bit value 0 and 1 calculate the number of times of circulation and whether to use the result of the gmul2 among the selection operation unit such as Fig. 5, can comprise following content:
  • each loop processing unit of the loop processing unit group 42 includes a register, an XOR gate and a selector; the register is respectively connected to the XOR gate and the selector, and the XOR gate is connected to the selector.
  • the register is used to store the original data received by the loop processing unit to which it belongs and to perform timing alignment.
  • the original data is the previous data result or initialization data output by the previous loop processing unit; for the register of the initial loop processing unit, the original data is the initialization data, for the register of the intermediate loop processing unit and the termination loop processing unit
  • the original data is the calculation result output by the previous cyclic processing unit.
  • the calculation result output by the previous cyclic processing unit is called the previous data result.
  • the register can be, for example, a D-type flip-flop DFF.
  • the DFF register can also implement sequential logic in hardware design, that is, it can also be used for timing alignment of pipeline design. For other types of devices, this does not affect the implementation of this application.
  • the XOR gate is used to perform XOR calculation on the multiplication calculation result output by the operation unit corresponding to the loop processing unit and the original data, and output the XOR calculation result to the selector.
  • the original data of the XOR gate is sent to it by the register, but it needs to be delayed by one clock cycle, and the waiting of the clock cycle is the time for the corresponding operation unit to calculate the result.
  • the selector is used to select target data as an output result from the original data and the XOR calculation result according to the bit value of the second input data.
  • the bit value of the second input data is 0, the original data is selected as the target data, the bit value of the second input data is 1, and the XOR calculation result is selected as the target data.
  • the bit value of the second input data is 1, the original data can be selected as the target data, the bit value of the second input data is 0, and the XOR calculation result can be selected as the target data, which can be flexibly determined by those skilled in the art according to actual needs .
  • the selector in this embodiment is a two-to-one selector, which selects whether the calculation process uses the calculation result of the XOR operation unit of the previous data result or only the previous data result.
  • the one-of-two selector uses the calculation of pre_result XOR gmul2 or only the calculation of pre_result.
  • the implementation of the Galois field multiplier of the pipelined method in this embodiment can be realized by relying on simple AND-OR gates and selectors, which improves the flexibility of the GF multiplier and reduces the consumption of hardware resources occupied by the look-up table , and supports polynomial configurability.
  • FIG. 6 provides a schematic example of a Galois field multiplier in conjunction with FIG. 6, which may include the following:
  • AES Advanced Encryption Standard, symmetric encryption algorithm
  • the module is designed as a Galois field multiplier architecture of the pipeline method as shown in Figure 5 and Figure 6 .
  • data_a and data_b are two multipliers respectively, that is, two input data of the GF multiplier, and the external interface of the GF multiplier has only two An 8bit input
  • the initialization 8'd0 is defined according to the 8bit bit width of the multiplicand, if the bit width is 16bit, the initialization data is 16'd0, but the initialization value of 0 is immutable, see the external interface less than.
  • the data bit width of the input data is 8 bits, and the initialization data is 8'd0.
  • the Galois field multiplier includes 7 serially connected GMul(2, v) and 8 serially connected cacu&sel, and GMul(2, v) is the operation unit, cacu&sel is a loop processing unit, and each bit of 8-bit input data is input to one cacu&sel.
  • the GF multiplier of the pipelined method implemented in the embodiment of the present application has low resource consumption, fast performance, high throughput and good flexibility, and has the characteristics of novelty, creativity, simplicity and practicality.
  • the embodiment of the present application also provides a system in a corresponding application scenario for the Galois field multiplier, which further makes the Galois field multiplier more practical.
  • the following is an introduction to the erasure correction codec system provided by the embodiment of this application, please refer to Figure 9, which may include the following content:
  • the erasure correction codec system may include a data distribution module 91 , an operation module 92 and a reordering module 93 .
  • the data distribution module 91 , the computing module 92 and the reordering module 93 are connected to each other through a bus.
  • the data distribution module 91 can be used to perform data distribution on the data to be erased to obtain multiple rows of data to be calculated.
  • the data to be erased includes matrix data and data.
  • the data to be erased is the encoded matrix data and the original disk data, as shown in the B matrix and D data in Figure 1.
  • the data to be erasure correction is the inverse matrix corresponding to the matrix re-formed by the encoding matrix corresponding to the line where the normal data is located, and the remaining normal data after the abnormal transmission of the storage compass, as shown in B′ -1 in Figure 2 Matrix and Survivors data.
  • the entire erasure correction codec it can be understood as matrix multiplication calculation, and the fundamental calculation of matrix multiplication is multiply-accumulate.
  • the corresponding rows need to be multiplied and accumulated.
  • the matrix data needs to be split into multiple rows of data through the data distribution module 91, and the multiplication and calculation of each row of data are performed separately. Finally, the multiplication and calculation results of each row are accumulated to obtain the final result.
  • the operation module 92 may include a plurality of operation sub-modules, because each operation sub-module performs multiplication based on the Galois field multiplier, and the Galois field multiplier does not calculate all the bytes of each row of data at the same time, Instead, the corresponding number of bytes is calculated based on the byte operations supported by the Galois field multiplier, so each operator module includes multiple Galois field multipliers, and is used to calculate each Galois field multiplier The calculation results of the multiplier are accumulated and calculated by the adder, that is, each operation sub-module is used to multiply and accumulate a row of data to be calculated; each operation sub-module includes an adder and a plurality of gals in any one of the above embodiments Galois field multiplier; the total number of Galois field multipliers is determined according to the number of bus bytes.
  • the Galois field multiplier is a single-byte operation
  • the total number of Galois field multipliers contained in each operation sub-module is the same as the number of bus bytes; the Galois field multiplier contained in the operation module
  • the total number of field multipliers is the product value of the number of matrix rows of data to be erased and the number of bus bytes.
  • the adder is used for accumulating the multiplication calculation results output by each Galois field multiplier.
  • the adder may be, for example, a Galois field adder.
  • the reordering module 93 is used to concatenate the multiplication and accumulation calculation results output by each operation sub-module according to the distribution order of the data to be erased by the data distribution module 91 .
  • each operation sub-module can also include a PE controller, and the PE controller communicates with the adder respectively It is connected with each Galois field multiplier; the PE controller is used to calculate the total number according to the multiplication and accumulation, and determine the number of accumulation iterations of the adder and the number of times of use of the data to be erased.
  • each operation sub-module can also include an EC block output unit; the EC block output unit It is used to control whether the back pressure of the operation sub-module and the reordering module perform an output operation according to the operation state of the operation sub-module and the output state of the reordering module.
  • the bus is 16 bytes
  • the Galois field multiplier is a single-byte operation
  • the adder is a Galois field adder.
  • the bus data of each row is subdivided into 16 bytes
  • the current multiplier is a single-byte operation, so 16 multipliers are needed for parallel calculation to calculate the bus data once. That is, each operation sub-module includes 16 Galois field multipliers executed in parallel, and the number of matrix rows of the data to be erased ⁇ the number of bus bytes.
  • the data distribution module distributes the matrix and data of the erasure correction calculation according to the number of rows corresponding to the matrix calculation.
  • the data to be erased is RS/RS -1
  • the 4*16-byte RS/RS -1 matrix is divided into 4 rows, and each row contains 16 bytes of data. Feed 16-byte chunks of data into all rows.
  • the input for each row is a data block of 16 bytes and a matrix block of 16 bytes.
  • 16 GF multipliers the data block of each byte and the matrix block of each byte are respectively used as two inputs of GF multipliers data_a and data_b.
  • the PE controller calculates the number of accumulation iterations and the time when the operation is completed. According to the scheduling of the PE controller, determine the use times of RS/RS -1 and the accumulation times of the adder.
  • GF adder In fact, the essence of the GF adder is an XOR operation, which is used as an accumulation calculation of matrix multiplication and accumulation in this embodiment.
  • the output of the EC block controls the back pressure of the front stage and whether the output of the latter stage is based on the operation state of the previous stage operation module and the output state of the subsequent stage operation module.
  • the EC data block reordering module has an EC block output module for each row of matrix operations. After the specific calculation is completed, the position of the row needs to be determined and re-spliced and sent.
  • the GF multiplier of the pipeline method is fully suitable for the functional requirements of erasure correction codec, and the hardware implementation is relatively easy, and can ensure high computing efficiency and data throughput. It greatly reduces the resource consumption of the look-up table GF multiplier in hardware implementation, and can flexibly change the Galois field polynomial in real time according to the application requirements of various system applications for calculation.

Abstract

Disclosed in the present application are a Galois field multiplier and an erasure coding and decoding system. The Galois field multiplier comprises a plurality of basic operation units connected in series and a plurality of cyclic processing units connected in series, wherein the total number of basic operation units and the total number of cyclic processing units are determined according to a data bit width of input data of the Galois field multiplier. Each basic operation unit performs a Galois field multiplication operation on received data and a target generation element, and outputs a multiplication calculation result to the next operation unit and the corresponding cyclic processing unit. A cyclic processing unit group is used for determining the current number of cycles according to the input data, initialized data, and a Galois field multiplication operation result output by a basic operation unit group, and used for outputting a final calculation result.

Description

伽罗华域乘法器及纠删编解码系统Galois Field Multiplier and Erasure Correction Codec System
相关申请的交叉引用Cross References to Related Applications
本申请要求于2022年01月14日提交中国专利局,申请号为202210039878.0,申请名称为“伽罗华域乘法器及纠删编解码系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202210039878.0 and the application name "Galois Field Multiplier and Erasure Correction Codec System" submitted to the China Patent Office on January 14, 2022, the entire contents of which are incorporated by reference incorporated in this application.
技术领域technical field
本申请涉及计算机技术领域,特别是涉及一种伽罗华域乘法器及纠删编解码系统。The present application relates to the field of computer technology, in particular to a Galois field multiplier and an erasure correction codec system.
背景技术Background technique
在数据传输和数据存储领域,纠删码以更低地存储成本备受青睐。RS码(Reed-Solomon,里德-所罗门码)是一种比较常见的EC码(Erasure Code,纠删码),其可以根据M个数据块,计算出N个校验数据块。在总数为M+N个数据块中,只需任意选取N个正常的数据块就能恢复全部的原始数据。尤其在数据存储领域,纠删码是保证数据可靠性极为重要的一个手段。RS纠删编码过程如图1所示,其中B是编码所用的矩阵,下半部分B11等灰色部分为柯西或范德蒙矩阵,D为需要纠删的存储数据盘,所得到的C是编码后的数据。当丢失了部分数据块重新组成新的矩阵运算关系,与其逆矩阵相乘,得到原始数据,这个过程也即如图2所示的RS纠删解码过程。其中,Survivors为存储落盘发生异常后剩余的正常数据,B′为正常数据所在行对应编码阵B重新形成的矩阵,B′ -1为B′的逆矩阵。 In the fields of data transmission and data storage, erasure codes are favored for their lower storage costs. RS code (Reed-Solomon, Reed-Solomon code) is a relatively common EC code (Erasure Code, erasure code), which can calculate N check data blocks based on M data blocks. Among the total number of M+N data blocks, only N normal data blocks can be selected arbitrarily to recover all the original data. Especially in the field of data storage, erasure coding is an extremely important means to ensure data reliability. The RS erasure coding process is shown in Figure 1, where B is the matrix used for coding, the gray parts such as the lower half B11 are Cauchy or Vandermonde matrices, D is the storage data disk that needs erasure correction, and the obtained C is the coded matrix. The data. When some data blocks are lost, a new matrix operation relationship is reorganized, and multiplied by its inverse matrix to obtain the original data. This process is also the RS erasure decoding process shown in Figure 2. Among them, Survivors is the remaining normal data after the abnormal storage occurs, B' is the matrix re-formed corresponding to the encoding array B in the row where the normal data is located, and B' -1 is the inverse matrix of B'.
GF(Galois Field,伽罗华域)乘法被广泛应用于RS编解码中,考虑到分布式存储系统越来越大的存储磁盘数量和每个磁盘的存储数据量越来越大,高速率和大吞吐率的分布式存储使用高速RS纠删计算为现在设计的主要挑战,因此用硬件电路实现伽罗华域乘法器应用而生。伽罗华域上的乘法运算使用到了线性代数中最小多项式简化高次矩阵运算的理论。其基本思想是:首先将两个向量分别转化为两个多项式,然后对两个多项式进行多项式乘法运算,多项式乘法运算的结果对本原多项式取模运算后的结果转化为向量。传统的伽罗华域乘法器采用先乘后取模的方式实现,这种方法占用周期较多且实现 较复杂。为了解决传统方法的技术弊端,相关技术利用查表的方法代替求模的方法,从而可以大幅减少运算周期。GF (Galois Field, Galois field) multiplication is widely used in RS codec, considering the increasing number of storage disks and the amount of data stored on each disk in distributed storage systems, high speed and Distributed storage with high throughput rate using high-speed RS erasure calculation is the main challenge of current design, so the application of Galois field multiplier with hardware circuit is born. The multiplication operation on the Galois field uses the theory of minimum polynomial simplification of high-order matrix operations in linear algebra. The basic idea is: first convert two vectors into two polynomials respectively, and then perform polynomial multiplication on the two polynomials, and convert the result of polynomial multiplication to the original polynomial modulo operation into a vector. The traditional Galois field multiplier is implemented by multiplying first and then taking modulus, which takes more cycles and is more complicated to implement. In order to solve the technical disadvantages of the traditional method, the related technology uses the method of looking up the table instead of the method of modulus, so that the calculation cycle can be greatly reduced.
生成元是域上的一类特殊元素,生成元的幂可以遍历域上的所有元素。举例来说,g是域GF(2 w)上的生成元,那么集合{g0,g1,…,g(2 w-1)}包含了域GF(2 w)上所有非零元素。在域GF(2 w)中,2总是生成元。将生成元应用到多项式中,GF(2 w)中的所有多项式可由多项式生成元g通过幂求得,即域中的任意元素z,都可以表示为z=g k。GF(2 w)是一个有限域,但指数k是无穷的,所以必然存在循环,循环周期为2 w-1,g不能生成多项式0。当k大于等于2 w-1时,g k=g (k%(2^w-1)),其中^表示异或运算。对于z=g^k,有正过程和逆过程,已知指数k求z值为正过程,已知z值计算指数k为逆过程。对于乘法,假设a=g i,b=g j,那么a*b=g i*g j=g (i+j)。查表的方法就是根据a和b,分别查表得到i和j,然后查表g^(i+j)即可。因此需要构造正表和反表,在GF(2^w)域上分别记为gflog和gfilog。正表gflog是将二进制形式映射为多项式形式,反表gfilog是将多项式形式映射为二进制形式。查表GF乘法的计算公式为: A generator is a special type of element on a domain, and the power of a generator can traverse all elements on the domain. For example, g is a generator on the field GF(2 w ), then the set {g0, g1, ..., g(2 w-1 )} contains all non-zero elements on the field GF(2 w ). In the field GF(2 w ), 2 is always a generator. Applying generators to polynomials, all polynomials in GF(2 w ) can be obtained by polynomial generator g through exponentiation, that is, any element z in the field can be expressed as z=g k . GF(2 w ) is a finite field, but the exponent k is infinite, so there must be a cycle, the cycle period is 2 w -1, and g cannot generate polynomial 0. When k is greater than or equal to 2 w -1, g k =g (k%(2^w-1)) , where ^ represents an exclusive OR operation. For z=g^k, there are forward process and reverse process, known exponent k calculates z value as positive process, known z value calculates exponent k as reverse process. For multiplication, assuming a=g i , b=g j , then a*b=g i *g j =g (i+j) . The method of looking up the table is to get i and j according to a and b respectively, and then look up the table g^(i+j). Therefore, it is necessary to construct a positive table and a negative table, which are respectively recorded as gflog and gfilog on the GF(2^w) field. The positive table gflog maps the binary form to the polynomial form, and the negative table gfilog maps the polynomial form to the binary form. The calculation formula of look-up table GF multiplication is:
c=a*b=gfilog[(gflog[a]+gflog[b])mod(2^w–1)];c=a*b=gfilog[(gflog[a]+gflog[b])mod(2^w–1)];
硬件实现该查表GF乘法器时,计算顺序可分为三步:When the hardware implements the look-up table GF multiplier, the calculation sequence can be divided into three steps:
第一步:根据a和b选择正表中与其对应的值;Step 1: Select the corresponding value in the positive table according to a and b;
第二步:查正表的值加和后取余;The second step: check the value of the correction table and take the remainder;
第三步:余数查反表的值即为最终结果。Step 3: The value of the remainder lookup table is the final result.
考虑到实时性,硬件实现该乘法器架构如图3所示,硬件实现查表GF乘法器需要存储两个正表及一个反表。一般情况下,使用GF乘法器时数据位宽w已确定,那么取余操作即转换为对一个常数做减法运算,所以资源消耗主要为查表操作。查表GF乘法器的硬件实现方法由其算法和硬件实现方法可以看出,该方法原理简单、计算复杂度小、时效性高,但该方法由于使用多个LUT(Lookup table,查找表),使得硬件资源和芯片面积损失较大。由纠删编解码查表GF乘法器硬件实现方案可以看出,在纠删编解码系统数据量较大时,需要大量使用GF乘法器,随着使用GF乘法器的个数增多的同时,查表的个数成倍数增加。发明人意识到,此时硬件资源开销增大,导致查表GF乘法器原本的优势并不明显,且存在引入芯片面积和空间资源变大的问题。Considering the real-time performance, the hardware implementation of the multiplier architecture is shown in Figure 3. The hardware implementation of the look-up table GF multiplier needs to store two positive tables and one negative table. In general, when the GF multiplier is used, the data bit width w has been determined, then the remainder operation is transformed into a subtraction operation for a constant, so the resource consumption is mainly a table lookup operation. The hardware implementation method of the look-up table GF multiplier can be seen from its algorithm and hardware implementation method, the method principle is simple, the computational complexity is small, the timeliness is high, but this method is owing to use a plurality of LUT (Lookup table, lookup table), This causes a large loss of hardware resources and chip area. It can be seen from the hardware implementation scheme of the erasure codec look-up table GF multiplier that when the data volume of the erasure codec system is large, a large number of GF multipliers need to be used. The number of tables increases exponentially. The inventor realizes that at this time, the hardware resource overhead increases, so that the original advantage of the look-up table GF multiplier is not obvious, and there is a problem of introducing larger chip area and space resources.
鉴于此,如何降低GF乘法器查找表过程中消耗的硬件资源,是所属领域技术人员需要解决的技术问题。In view of this, how to reduce the hardware resources consumed in the table lookup process of the GF multiplier is a technical problem to be solved by those skilled in the art.
发明内容Contents of the invention
本申请实施例一方面提供了一种伽罗华域乘法器,包括基本运算单元组和循环处理单元组;On the one hand, an embodiment of the present application provides a Galois field multiplier, including a group of basic operation units and a group of cyclic processing units;
基本运算单元组包括串联连接的起始运算单元、多个中间运算单元和终止运算单元,循环处理单元组包括多个串联连接的循环处理单元;基本运算单元组包含的运算单元总数和循环处理单元包含的循环处理单元总数根据伽罗华域乘法器的输入数据的数据位宽确定;The basic arithmetic unit group includes a series-connected initial arithmetic unit, a plurality of intermediate arithmetic units and a termination arithmetic unit, and the loop processing unit group includes a plurality of series-connected loop processing units; the total number of arithmetic units contained in the basic arithmetic unit group and the loop processing unit The total number of cyclic processing units included is determined according to the data bit width of the input data of the Galois field multiplier;
起始运算单元用于对第一输入数据和目标生成元进行伽罗华域乘法运算,并将乘法计算结果输出至下一个运算单元和相应的循环处理单元;各中间运算单元用于对接收到的数据和目标生成元进行伽罗华域乘法运算,并将乘法计算结果输出至下一个运算单元和相应的循环处理单元;终止运算单元用于对接收到的数据和目标生成元进行伽罗华域乘法运算,并将乘法计算结果输出至相应的循环处理单元;和The initial operation unit is used to perform Galois field multiplication on the first input data and the target generator, and output the result of the multiplication to the next operation unit and the corresponding loop processing unit; each intermediate operation unit is used to process the received Galois field multiplication is performed on the data and the target generator, and the multiplication result is output to the next operation unit and the corresponding loop processing unit; the termination operation unit is used to perform Galois field multiplication on the received data and the target generator field multiplication, and output the result of the multiplication to a corresponding cyclic processing unit; and
循环处理单元组用于根据第二输入数据、初始化数据和基本运算单元组输出的伽罗华域乘法运算结果确定当前循环次数,并输出最终计算结果。The cyclic processing unit group is used to determine the current cycle number according to the second input data, the initialization data and the Galois field multiplication operation result output by the basic operation unit group, and output the final calculation result.
在其中一个实施例中,循环处理单元组中包含的循环处理单元总数与伽罗华域乘法器的输入数据的数据位宽值相同;第二输入数据的每一个比特位数据对应一个循环处理单元。In one of the embodiments, the total number of cyclic processing units included in the cyclic processing unit group is the same as the data bit width value of the input data of the Galois field multiplier; each bit data of the second input data corresponds to a cyclic processing unit .
在其中一个实施例中,基本运算单元组中包含的运算单元总数为伽罗华域乘法器的输入数据的数据位宽值和1的差值。In one embodiment, the total number of arithmetic units included in the basic arithmetic unit group is the difference between the data bit width value of the input data of the Galois field multiplier and 1.
在其中一个实施例中,循环处理单元组的每个循环处理单元均包括寄存器、异或门和选择器;寄存器分别与异或门和选择器相连,异或门与选择器相连;In one of the embodiments, each loop processing unit of the loop processing unit group includes a register, an exclusive OR gate and a selector; the register is connected to the exclusive OR gate and the selector respectively, and the exclusive OR gate is connected to the selector;
寄存器用于存储所属循环处理单元接收到的原始数据以及用于进行时序对齐;原始数据为前一个循环处理单元输出的前数据结果或初始化数据;The register is used to store the original data received by the associated loop processing unit and for timing alignment; the original data is the previous data result or initialization data output by the previous loop processing unit;
异或门用于对所属循环处理单元对应的运算单元输出的乘法计算结果与原始数据进行异或计算,并将异或计算结果输出至选择器中;和The XOR gate is used to perform XOR calculation on the multiplication calculation result output by the operation unit corresponding to the loop processing unit and the original data, and output the XOR calculation result to the selector; and
选择器用于根据第二输入数据的比特值从原始数据和异或计算结果中,选择作为输出结果的目标数据。The selector is used to select target data as an output result from the original data and the XOR calculation result according to the bit value of the second input data.
在其中一个实施例中,寄存器为D类型触发器。In one embodiment, the register is a D-type flip-flop.
本申请实施例另一方面提供了一种纠删编解码系统,包括:Another aspect of the embodiment of the present application provides an erasure correction codec system, including:
包括数据分发模块、运算模块和重定序模块;Including data distribution module, operation module and reordering module;
数据分发模块用于将待纠删数据进行数据分发,得到多行待计算数据;The data distribution module is used to distribute the data to be deleted to obtain multiple rows of data to be calculated;
运算模块包括多个运算子模块,每个运算子模块用于对一行待计算数据进行乘累加计算;每个运算子模块均包括加法器和多个如前任意一项伽罗华域乘法器;伽罗华域乘法器的总个数根据总线字节数确定;加法器用于对每个伽罗华域乘法器输出的乘法计算结果进行累加操作;和The operation module includes a plurality of operation sub-modules, and each operation sub-module is used to multiply and accumulate a row of data to be calculated; each operation sub-module includes an adder and a plurality of Galois field multipliers as in any one of the preceding items; The total number of Galois field multipliers is determined according to the number of bus bytes; the adder is used to accumulate the multiplication results output by each Galois field multiplier; and
重定序模块用于按照分发顺序将每个运算子模块输出的乘累加计算结果进行拼接处理。The reordering module is used for splicing the multiplication and accumulation calculation results output by each operator module according to the distribution order.
在其中一个实施例中,运算子模块还包括PE控制器,PE控制器分别与加法器和各伽罗华域乘法器相连;In one of the embodiments, the operation sub-module also includes a PE controller, and the PE controller is respectively connected with the adder and each Galois field multiplier;
PE控制器用于根据乘累加计算总数,确定加法器的累加迭代次数和待纠删数据的使用次数。The PE controller is used to determine the number of accumulation iterations of the adder and the number of uses of the data to be erased according to the total number of multiplication and accumulation calculations.
在其中一个实施例中,运算子模块还包括EC块输出单元;In one of the embodiments, the operation sub-module also includes an EC block output unit;
EC块输出单元用于根据运算子模块的运算状态和重定序模块的输出状态,控制运算子模块的反压和重定序模块是否执行输出操作。The EC block output unit is used to control the back pressure of the operation sub-module and whether the re-sequence module performs an output operation according to the operation state of the operation sub-module and the output state of the re-sequence module.
在其中一个实施例中,运算子模块中包含的伽罗华域乘法器的总个数与总线字节数相同;运算模块包含的伽罗华域乘法器的总个数为待纠删数据的矩阵行数和总线字节数的乘积值。In one of the embodiments, the total number of Galois field multipliers included in the operation sub-module is the same as the number of bytes of the bus; The product value of the number of matrix rows and the number of bus bytes.
在其中一个实施例中,加法器为伽罗华域加法器。In one embodiment, the adder is a Galois field adder.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features and advantages of the application will be apparent from the description, drawings, and claims.
附图说明Description of drawings
为了更清楚的说明本申请实施例或相关技术的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application or related technologies, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or related technologies. Obviously, the accompanying drawings in the following description are only For some embodiments of the present invention, those of ordinary skill in the art can also obtain other drawings based on these drawings on the premise of not paying creative efforts.
图1为本申请根据一个或多个实施例提供的一个示例性应用场景的RS纠删编码过程示意图;FIG. 1 is a schematic diagram of an RS erasure coding process in an exemplary application scenario provided by the present application according to one or more embodiments;
图2为本申请根据一个或多个实施例提供的一个示例性应用场景的RS纠删解码过程示意图;FIG. 2 is a schematic diagram of an RS erasure correction decoding process in an exemplary application scenario provided by the present application according to one or more embodiments;
图3为本申请根据一个或多个实施例提供的一个示例性应用场景的查表GF乘法器硬件实现方法示意图;FIG. 3 is a schematic diagram of a hardware implementation method of a table lookup GF multiplier in an exemplary application scenario provided by the present application according to one or more embodiments;
图4为本申请根据一个或多个实施例提供的伽罗华域乘法器的一种具体实施方式结构图;FIG. 4 is a structural diagram of a specific implementation of a Galois field multiplier provided by the present application according to one or more embodiments;
图5为本申请根据一个或多个实施例提供的伽罗华域乘法器的另一种具体实施方式结构图;FIG. 5 is a structural diagram of another specific implementation manner of a Galois field multiplier provided by the present application according to one or more embodiments;
图6为本申请根据一个或多个实施例提供的一个示意性例子中的伽罗华域乘法器的一种具体实施方式结构图;FIG. 6 is a structural diagram of a specific implementation manner of a Galois field multiplier in a schematic example provided by the present application according to one or more embodiments;
图7为本申请根据一个或多个实施例提供的一个示意性例子中的循环处理单元的一种具体实施方式结构图;Fig. 7 is a structural diagram of a specific implementation manner of a cycle processing unit in a schematic example provided by the present application according to one or more embodiments;
图8为本申请根据一个或多个实施例提供的一个示意性例子中的循环处理单元的另一种具体实施方式结构图;Fig. 8 is a structural diagram of another specific implementation manner of a cycle processing unit in a schematic example provided by the present application according to one or more embodiments;
图9为本申请根据一个或多个实施例提供的纠删编解码系统的一种具体实施方式结构图;Fig. 9 is a structural diagram of a specific implementation of an erasure correction coding and decoding system provided by the present application according to one or more embodiments;
图10为本申请根据一个或多个实施例提供的纠删编解码系统的另一种具体实施方式结构图。Fig. 10 is a structural diagram of another specific implementation manner of an erasure correction coding and decoding system provided by the present application according to one or more embodiments.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the present application, the present application will be further described in detail below in conjunction with the drawings and specific implementation methods. Apparently, the described embodiments are only some of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等是用于区别不同的对象,而不是用于描述特定的顺序。此外术语“包括”和“具有”以及他们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可包括没有列出的步骤或单元。The terms "first", "second", "third" and "fourth" in the specification and claims of this application and the above drawings are used to distinguish different objects, rather than to describe a specific order . Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or device comprising a series of steps or units is not limited to the listed steps or units, but may include unlisted steps or units.
在介绍了本申请实施例的技术方案后,下面详细的说明本申请的各种非限制性实施方式。After introducing the technical solutions of the embodiments of the present application, various non-limiting implementation manners of the present application will be described in detail below.
首先参见图4,图4为本申请实施例提供的一种伽罗华域乘法器在一种实施方式下的结构框架示意图,本申请实施例可包括以下内容:First, referring to FIG. 4, FIG. 4 is a schematic structural diagram of a Galois field multiplier provided in an embodiment of the present application in an implementation manner. The embodiment of the present application may include the following:
本实施例的伽罗华域乘法器包括基本运算单元组41和循环处理单元组42,基本运算单元组41包含的各运算单元结构均相同,循环处理单元组42所包含的各循环处理单元 的结构也均相同。为了更清楚地描述各运算单元的连接关系以及数据处理流程,基本运算单元组41可包括串联连接的起始运算单元、多个中间运算单元和终止运算单元,循环处理单元组42可包括多个串联连接的循环处理单元;相应的,循环处理单元组42可包括串联连接的起始循环处理单元、多个中间循环处理单元和终止循环单元。起始运算单元是基本运算单元组41中串联连接的第一个运算单元,其接收到的是原始数据也即进行伽罗华域计算的其中一个乘数或者是多项式,本实施例可称为第一输入数据,终止运算单元是指串联连接的最后一个运算单元,中间运算单元是指在串联连接在起始运算单元和终止运算单元之间的各运算单元。同样的,起始循环处理单元是循环处理单元组42中串联连接的第一个循环处理单元,其接收到的是原始数据也即进行伽罗华域计算的另一个乘数或者是多项式,本实施例可称为第二输入数据,终止循环处理单元是指串联连接的最后一个循环处理单元,中间循环处理单元是指串联连接在起始循环处理单元和终止循环处理单元之间的各循环处理单元。基本运算单元组41包含的运算单元总数和循环处理单元组42包含的循环处理单元总数,根据伽罗华域乘法器的输入数据也即第一输入数据和第二输入数据的数据位宽来确定。可选的,循环处理单元组42中包含的循环处理单元总数可与伽罗华域乘法器的输入数据的数据位宽值相同;相应的,第二输入数据的每一个比特位数据便唯一对应一个循环处理单元。基本运算单元组41中包含的运算单元总数为伽罗华域乘法器的输入数据的数据位宽值和1的差值。举例来说,若输入数据为8bit位宽,则循环处理单元组42中包含的循环处理单元总数为8个,基本运算单元组41中所包含的运算单元总数为7个。若输入数据为16bit位宽,则循环处理单元组42中包含的循环处理单元总数为16个,基本运算单元组41中所包含的运算单元总数为15个。若输入数据为Nbit位宽,则循环处理单元组42中包含的循环处理单元总数为N个,基本运算单元组41中所包含的运算单元总数为N-1个。若以基本运算单元组41中所包含的运算单元为gmul2模块,循环处理单元组中包含的循环处理单元为cacu&sel模块,则伽罗华域乘法器的结构可如图5及图6所示,图5为输入数据为8bit位宽,gmul2模块的个数为7个,cacu&sel模块的个数为8个。图6为输入数据为Nbit位宽,gmul2模块的个数为N-1个,cacu&sel模块的个数为N个。The Galois field multiplier of the present embodiment includes a basic operation unit group 41 and a loop processing unit group 42, each operation unit included in the basic operation unit group 41 has the same structure, and each loop processing unit included in the loop processing unit group 42 The structures are also the same. In order to describe the connection relationship and data processing flow of each computing unit more clearly, the basic computing unit group 41 may include a serially connected initial computing unit, a plurality of intermediate computing units, and a terminating computing unit, and the loop processing unit group 42 may include multiple The cycle processing units connected in series; correspondingly, the cycle processing unit group 42 may include a serially connected initial cycle processing unit, a plurality of intermediate cycle processing units and a termination cycle unit. The initial computing unit is the first computing unit connected in series in the basic computing unit group 41, which receives the original data, that is, one of the multipliers or polynomials for Galois field calculations. This embodiment can be called For the first input data, the terminating computing unit refers to the last computing unit connected in series, and the intermediate computing unit refers to computing units connected in series between the starting computing unit and the terminating computing unit. Similarly, the initial cyclic processing unit is the first cyclic processing unit connected in series in the cyclic processing unit group 42, and what it receives is the original data, that is, another multiplier or polynomial for performing Galois field calculations. The embodiment can be referred to as the second input data, the terminating cyclic processing unit refers to the last cyclic processing unit connected in series, and the intermediate cyclic processing unit refers to each cyclic processing unit connected in series between the starting cyclic processing unit and the terminating cyclic processing unit unit. The total number of arithmetic units included in the basic arithmetic unit group 41 and the total number of loop processing units included in the loop processing unit group 42 are determined according to the input data of the Galois field multiplier, that is, the data bit width of the first input data and the second input data . Optionally, the total number of cyclic processing units included in the cyclic processing unit group 42 may be the same as the data bit width value of the input data of the Galois field multiplier; correspondingly, each bit of the second input data uniquely corresponds to A loop processing unit. The total number of arithmetic units included in the basic arithmetic unit group 41 is the difference between the data bit width value of the input data of the Galois field multiplier and 1. For example, if the input data is 8-bit wide, the total number of cyclic processing units included in the cyclic processing unit group 42 is 8, and the total number of operation units included in the basic operation unit group 41 is 7. If the input data is 16 bits wide, the total number of cyclic processing units included in the cyclic processing unit group 42 is 16, and the total number of operation units included in the basic operation unit group 41 is 15. If the input data is Nbit wide, the total number of loop processing units included in the loop processing unit group 42 is N, and the total number of operation units included in the basic operation unit group 41 is N−1. If the computing unit included in the basic computing unit group 41 is the gmul2 module, and the loop processing unit included in the loop processing unit group is the cacu&sel module, then the structure of the Galois field multiplier can be as shown in Figure 5 and Figure 6, Figure 5 shows that the input data is 8bit wide, the number of gmul2 modules is 7, and the number of cacu&sel modules is 8. Figure 6 shows that the input data is Nbit wide, the number of gmul2 modules is N-1, and the number of cacu&sel modules is N.
在本实施例中,起始运算单元用于对第一输入数据和目标生成元进行伽罗华域乘法运算,并将乘法计算结果输出至下一个运算单元和相应的循环处理单元。目标生成元可为任何一个生成元,例如可为2。各中间运算单元用于对接收到的数据和目标生成元进行伽罗华域乘法运算,并将乘法计算结果输出至下一个运算单元和相应的循环处理单元;各中间运算单元接收到的数据为上一个中间运算单元输出的乘法计算结果。终止运算单 元用于对接收到的数据和目标生成元进行伽罗华域乘法运算,并将乘法计算结果输出至相应的循环处理单元;循环处理单元组用于根据第二输入数据、初始化数据和基本运算单元组输出的伽罗华域乘法运算结果确定当前循环次数,并输出最终计算结果。初始化数据可为输入数据的初始化值,初始化数据的数据位宽与输入数据的数据位宽相同,举例来说,对于数据位宽为8比特的输入数据来说,初始化数据可为8’d0。具体来说,起始循环处理单元用于根据第二输入数据的相应比特位数值、初始化数据和第一输入数据确定当前循环次数,并输出当前计算结果至下一个循环处理单元。各中间循环处理单元用于根据上一个循环处理单元输入的计算结果、相应的运算单元输入的乘法计算结果和第二输入数据的相应比特位数值确定当前循环次数,并输出当前计算结果至终止循环处理单元,终止循环处理单元根据上一个循环处理单元输入的计算结果、相应的运算单元输入的乘法计算结果和第二输入数据的相应比特位数值确定最终计算结果,并将最终计算结果输出。所谓的最终计算结果即为第一输入数据和第二输入数据的伽罗华域乘法值。In this embodiment, the initial operation unit is configured to perform Galois field multiplication on the first input data and the target generator, and output the result of the multiplication to the next operation unit and the corresponding loop processing unit. The target generator can be any generator, for example, it can be 2. Each intermediate operation unit is used to perform Galois field multiplication on the received data and the target generator, and output the result of the multiplication to the next operation unit and the corresponding loop processing unit; the data received by each intermediate operation unit is The multiplication calculation result output by the previous intermediate operation unit. The termination operation unit is used to perform Galois field multiplication on the received data and the target generator, and output the result of the multiplication to the corresponding cyclic processing unit; the cyclic processing unit group is used to The Galois field multiplication operation result output by the basic operation unit group determines the current cycle number, and outputs the final calculation result. The initialization data can be the initialization value of the input data, and the data bit width of the initialization data is the same as the data bit width of the input data. For example, for the input data whose data bit width is 8 bits, the initialization data can be 8'd0. Specifically, the initial cycle processing unit is used to determine the current cycle number according to the corresponding bit value of the second input data, the initialization data and the first input data, and output the current calculation result to the next cycle processing unit. Each intermediate cycle processing unit is used to determine the current cycle number according to the calculation result input by the previous cycle processing unit, the multiplication calculation result input by the corresponding operation unit, and the corresponding bit value of the second input data, and output the current calculation result to terminate the cycle The processing unit terminates the loop processing unit to determine the final calculation result according to the calculation result input by the previous loop processing unit, the multiplication calculation result input by the corresponding operation unit, and the corresponding bit value of the second input data, and output the final calculation result. The so-called final calculation result is the Galois field multiplication value of the first input data and the second input data.
在本申请实施例提供的技术方案中,伽罗华域乘法器基于流水化方法进行结构设计,可根据使用需求实时改变伽罗华域多项式进行计算,支持多项式可配,不再固定使用查表法固定的正表和反表,有效提升GF乘法器的灵活性。采用流水化GF乘法器的结构来替换原始查表的结构,无需像传统查表GF乘法器一样使用多个查表消耗硬件资源,从而使得在RS纠删编解码中GF乘法器空间资源及面积的大幅减少,有效降低了查表所占用的硬件资源消耗,缩减了存储系统的硬件资源及面积消耗,且不会影响计算的时效性。In the technical solution provided by the embodiment of the present application, the Galois field multiplier is structurally designed based on the pipeline method, and the Galois field polynomial can be changed in real time according to the usage requirements for calculation, and the polynomial can be configured, and the table lookup is no longer fixed. The fixed forward and reverse tables effectively improve the flexibility of the GF multiplier. The structure of the pipelined GF multiplier is used to replace the structure of the original look-up table, and there is no need to use multiple look-up tables to consume hardware resources like the traditional look-up table GF multiplier, thus making the space resources and area of the GF multiplier in the RS erasure codec The substantial reduction of the table lookup effectively reduces the consumption of hardware resources occupied by the table, reduces the consumption of hardware resources and area of the storage system, and does not affect the timeliness of calculation.
上述实施例对循环处理单元的结构并不做任何限定,本实施例还给出循环处理单元的一种可选的实施方式,如图7所示,循环处理单元用于根据第二输入数据的比特值0和1计算循环的次数和选择运算单元如图5中的gmul2的结果是否使用,可包括下述内容:The above embodiment does not make any limitation on the structure of the loop processing unit. This embodiment also provides an optional implementation of the loop processing unit. As shown in FIG. 7, the loop processing unit is used to Bit value 0 and 1 calculate the number of times of circulation and whether to use the result of the gmul2 among the selection operation unit such as Fig. 5, can comprise following content:
在本实施例中,循环处理单元组42的每个循环处理单元均包括寄存器、异或门和选择器;寄存器分别与异或门和选择器相连,异或门与选择器相连。In this embodiment, each loop processing unit of the loop processing unit group 42 includes a register, an XOR gate and a selector; the register is respectively connected to the XOR gate and the selector, and the XOR gate is connected to the selector.
寄存器用于存储所属循环处理单元接收到的原始数据以及用于进行时序对齐。其中,原始数据为前一个循环处理单元输出的前数据结果或初始化数据;对于起始循环处理单元的寄存器来说,其原始数据即为初始化数据,对于中间循环处理单元和终止循环处理单元的寄存器来说,其原始数据即为前一个循环处理单元输出的计算结果,本实施 例称前一个循环处理单元输出的计算结果为前数据结果。寄存器例如可为D类型触发器DFF,DFF寄存器在整个GF乘法器中作为存储单元存储数据之外,还可在硬件设计中可以实现时序逻辑,也即用于流水线设计的时序对齐,当然也可为其他类型器件,这均不影响本申请的实现。The register is used to store the original data received by the loop processing unit to which it belongs and to perform timing alignment. Among them, the original data is the previous data result or initialization data output by the previous loop processing unit; for the register of the initial loop processing unit, the original data is the initialization data, for the register of the intermediate loop processing unit and the termination loop processing unit In other words, the original data is the calculation result output by the previous cyclic processing unit. In this embodiment, the calculation result output by the previous cyclic processing unit is called the previous data result. The register can be, for example, a D-type flip-flop DFF. In addition to storing data as a storage unit in the entire GF multiplier, the DFF register can also implement sequential logic in hardware design, that is, it can also be used for timing alignment of pipeline design. For other types of devices, this does not affect the implementation of this application.
异或门用于对所属循环处理单元对应的运算单元输出的乘法计算结果与原始数据进行异或计算,并将异或计算结果输出至选择器中。在本实施例中,异或门的原始数据是寄存器输送给其的,但是需要经过一个时钟周期的延迟,该时钟周期的等待为相应运算单元运算出结果的时间。The XOR gate is used to perform XOR calculation on the multiplication calculation result output by the operation unit corresponding to the loop processing unit and the original data, and output the XOR calculation result to the selector. In this embodiment, the original data of the XOR gate is sent to it by the register, but it needs to be delayed by one clock cycle, and the waiting of the clock cycle is the time for the corresponding operation unit to calculate the result.
选择器用于根据第二输入数据的比特值从原始数据和异或计算结果中,选择作为输出结果的目标数据。可选的,第二输入数据的比特值为0,选择原始数据作为目标数据,第二输入数据的比特值为1,选择异或计算结果作为目标数据。当然,第二输入数据的比特值为1,选择原始数据可作为目标数据,第二输入数据的比特值为0,选择异或计算结果可作为目标数据,所属领域技术人员可根据实际需求灵活确定。本实施例的选择器为二选一选择器,其选择计算过程是使用前数据结果异或运算单元的计算结果还是只有前数据结果。可选的,结合图7及图8来说,二选一选择器选择的计算是使用pre_result异或gmul2的计算还是只有pre_result的计算。当data_a[n]对应的bit值为1时,选择pre_result异或gmul2;当data_a[n]对应的bit值为0时,选择pre_result。The selector is used to select target data as an output result from the original data and the XOR calculation result according to the bit value of the second input data. Optionally, the bit value of the second input data is 0, the original data is selected as the target data, the bit value of the second input data is 1, and the XOR calculation result is selected as the target data. Of course, the bit value of the second input data is 1, the original data can be selected as the target data, the bit value of the second input data is 0, and the XOR calculation result can be selected as the target data, which can be flexibly determined by those skilled in the art according to actual needs . The selector in this embodiment is a two-to-one selector, which selects whether the calculation process uses the calculation result of the XOR operation unit of the previous data result or only the previous data result. Optionally, referring to FIG. 7 and FIG. 8 , whether the calculation selected by the one-of-two selector uses the calculation of pre_result XOR gmul2 or only the calculation of pre_result. When the bit value corresponding to data_a[n] is 1, select pre_result XOR gmul2; when the bit value corresponding to data_a[n] is 0, select pre_result.
由上可知,本实施例的流水化方法的伽罗华域乘法器实现依靠简单的与或门和选择器即可以实现,提高GF乘法器的灵活性,缩减了查表所占用的硬件资源消耗,且支持多项式可配。It can be seen from the above that the implementation of the Galois field multiplier of the pipelined method in this embodiment can be realized by relying on simple AND-OR gates and selectors, which improves the flexibility of the GF multiplier and reduces the consumption of hardware resources occupied by the look-up table , and supports polynomial configurability.
为了使所示领域技术人员更加清楚明白本申请的技术方案,本申请结合图6给出伽罗华域乘法器的一个示意性的例子,可包括下述内容:In order to make those skilled in the art clearly understand the technical solution of the present application, the present application provides a schematic example of a Galois field multiplier in conjunction with FIG. 6, which may include the following:
本实施例采用AES(Advanced Encryption Standard,对称加密算法)算法指定的不可约多项式P(x)=x 8+x 4+x 3+x+1进行分析举例。为了方便编程先找规律,假设函数GMul(u,v)表示伽罗华域乘法,u、v不分左右,先看与2相乘的伽罗华域计算,即GMul(2,v): In this embodiment, an irreducible polynomial P(x)=x 8 +x 4 +x 3 +x+1 specified by the AES (Advanced Encryption Standard, symmetric encryption algorithm) algorithm is used for an analysis example. For the convenience of programming, first find the law, assuming that the function GMul(u, v) represents Galois field multiplication, u and v are not divided into left and right, first look at the Galois field calculation multiplied by 2, that is, GMul(2, v):
对于v=7,2*7=x*(x 2+x+1)=x 3+x 2+x,可以看出伽罗华域中一个数与2相乘等于这个数左移一位。假如v对应的多项式x的次数大于7,即v的最高位为 1,也就是v>>7==1的话就进行modP(x)化简,比如: For v=7, 2*7=x*(x 2 +x+1)=x 3 +x 2 +x, it can be seen that multiplying a number by 2 in the Galois Field is equivalent to shifting the number to the left by one bit. If the degree of the polynomial x corresponding to v is greater than 7, that is, the highest bit of v is 1, that is, if v>>7==1, modP(x) simplification is performed, for example:
2*129=x*(x 7+1)=x 8+x=(x 8+x)+P(x) 2*129=x*(x 7 +1)=x 8 +x=(x 8 +x)+P(x)
=(x 8+x)+(x 8+x 4+x 3+x+1)=x 4+x 3+1 =(x 8 +x)+(x 8 +x 4 +x 3 +x+1)=x 4 +x 3 +1
=00011001=0x19=00000010∧00011011=0x02∧0x1B=00011001=0x19=00000010∧00011011=0x02∧0x1B
=(129<<1)∧0x1B=(129<<1)∧0x1B
2*176=x*(x 7+x 5+x 4)=x 8+x 6+x 5=(x 8+x 6+x 5)+P(x)= 2*176=x*(x 7 +x 5 +x 4 )=x 8 +x 6 +x 5 =(x 8 +x 6 +x 5 )+P(x)=
(x 8+x 6+x 5)+(x 8+x 4+x 3+x+1)=x 6+x 5+x 4+x 3+x+1= (x 8 +x 6 +x 5 )+(x 8 +x 4 +x 3 +x+1)=x 6 +x 5 +x 4 +x 3 +x+1=
01111011=0x7B=01100000∧00011011=0x60∧0x1B=(176<<01111011=0x7B=01100000∧00011011=0x60∧0x1B=(176<<
1)∧0x1B1) ∧0x1B
从上面的几个例子可以总结计算规律,也即GMul(2,v)用于执行下述计算关系:From the above several examples, the calculation law can be summarized, that is, GMul(2, v) is used to perform the following calculation relationship:
Figure PCTCN2022102524-appb-000001
Figure PCTCN2022102524-appb-000001
通过GMul(2,v)推算可得:It can be calculated by GMul(2, v):
GMul(3,v)GMul(3,v)
3*v=(2+1)*v=GMul(2,v)+v=GMul(2,v)^v3*v=(2+1)*v=GMul(2, v)+v=GMul(2, v)^v
GMul(4,v)GMul(4,v)
4*v=2*2*v=GMul(2,GMul(2,v))4*v=2*2*v=GMul(2,GMul(2,v))
GMul(7,v)GMul(7,v)
7*v=(2*2+2+1)*v=GMul(2,GMul(2,v))^GMul(2,v)^v7*v=(2*2+2+1)*v=GMul(2,GMul(2,v))^GMul(2,v)^v
GMul(8,v)GMul(8,v)
8*v=(2*2*2)*v=GMul(2,GMul(2,GMul(2,v)))8*v=(2*2*2)*v=GMul(2,GMul(2,GMul(2,v)))
GMul(11111111b,v)GMul(11111111b,v)
11111111b=2 7+2 6+2 5+2 4+2 3+2 2+2+1 11111111b=2 7 +2 6 +2 5 +2 4 +2 3 +2 2 +2+1
所以相当于:GMul(2,v)循环7次^GMul(2,v)循环6次...GMul(2,v)循环1次 ^v。^表示异或操作,<<表示左移,>>表示右移。根据以上伽罗华域多项式乘法器的计算推导,将该模块设计为如图5及图6所示的流水化方法的伽罗华域乘法器架构。也即本实施例中,GF乘法器的运算公式为result=data_a*data_b,data_a和data_b分别为两个乘数,也即为GF乘法器的两个输入数据,GF乘法器对外的接口只有两个8bit的输入,初始化8‘d0是根据被乘数的位宽8bit定义的,如果位宽为16bit,则初始化数据为16‘d0,但是0这个初始化的值是不可变的,外部接口也看不到。输入数据的数据位宽为8比特,初始化数据为8’d0,伽罗华域乘法器包括7个串联连接的GMul(2,v)和8个串联cacu&sel的,GMul(2,v)为运算单元,cacu&sel为循环处理单元,8比特的输入数据的每个比特位输入至一个cacu&sel。So it is equivalent to: GMul(2, v) loops 7 times ^GMul(2, v) loops 6 times...GMul(2, v) loops 1 time ^v. ^ means XOR operation, << means left shift, >> means right shift. According to the calculation and derivation of the Galois field polynomial multiplier above, the module is designed as a Galois field multiplier architecture of the pipeline method as shown in Figure 5 and Figure 6 . That is to say, in this embodiment, the operation formula of the GF multiplier is result=data_a*data_b, data_a and data_b are two multipliers respectively, that is, two input data of the GF multiplier, and the external interface of the GF multiplier has only two An 8bit input, the initialization 8'd0 is defined according to the 8bit bit width of the multiplicand, if the bit width is 16bit, the initialization data is 16'd0, but the initialization value of 0 is immutable, see the external interface less than. The data bit width of the input data is 8 bits, and the initialization data is 8'd0. The Galois field multiplier includes 7 serially connected GMul(2, v) and 8 serially connected cacu&sel, and GMul(2, v) is the operation unit, cacu&sel is a loop processing unit, and each bit of 8-bit input data is input to one cacu&sel.
由上可知,本申请实施例实现的流水化方法的GF乘法器资源消耗小、性能快、吞吐率高和灵活性好,并具有新颖性、创造性和简单实用的特点。It can be seen from the above that the GF multiplier of the pipelined method implemented in the embodiment of the present application has low resource consumption, fast performance, high throughput and good flexibility, and has the characteristics of novelty, creativity, simplicity and practicality.
本申请实施例还针对伽罗华域乘法器提供了相应应用场景下的系统,进一步使得伽罗华域乘法器更具有实用性。下面对本申请实施例提供的纠删编解码系统进行介绍,请参阅图9,可包括下述内容:The embodiment of the present application also provides a system in a corresponding application scenario for the Galois field multiplier, which further makes the Galois field multiplier more practical. The following is an introduction to the erasure correction codec system provided by the embodiment of this application, please refer to Figure 9, which may include the following content:
纠删编解码系统可包括数据分发模块91、运算模块92和重定序模块93。数据分发模块91、运算模块92和重定序模块93彼此之间通过总线连接。The erasure correction codec system may include a data distribution module 91 , an operation module 92 and a reordering module 93 . The data distribution module 91 , the computing module 92 and the reordering module 93 are connected to each other through a bus.
其中,数据分发模块91可用于将待纠删数据进行数据分发,得到多行待计算数据。待纠删数据包括矩阵数据和数据,对于纠删编码过程来说,待纠删数据即为编码矩阵数据和原始磁盘数据,如图1中的B矩阵和D数据。对于纠删解码过程来说,待纠删数据即为正常数据所在行对应编码矩阵重新形成的矩阵对应的逆矩阵,以及存储罗盘发送异常之后剩余的正常数据,如图2中的B′ -1矩阵和Survivors数据。在整个纠删编解码中,可以理解为矩阵乘法计算,矩阵乘法的根本计算为乘累加。其中对应行均需要做乘累加计算,基于此,需要通过数据分发模块91将矩阵数据拆分为多行数据,对每行数据分别进行乘计算,最后累加各行乘计算结果得到最终结果。 Wherein, the data distribution module 91 can be used to perform data distribution on the data to be erased to obtain multiple rows of data to be calculated. The data to be erased includes matrix data and data. For the erasure coding process, the data to be erased is the encoded matrix data and the original disk data, as shown in the B matrix and D data in Figure 1. For the erasure correction decoding process, the data to be erasure correction is the inverse matrix corresponding to the matrix re-formed by the encoding matrix corresponding to the line where the normal data is located, and the remaining normal data after the abnormal transmission of the storage compass, as shown in B′ -1 in Figure 2 Matrix and Survivors data. In the entire erasure correction codec, it can be understood as matrix multiplication calculation, and the fundamental calculation of matrix multiplication is multiply-accumulate. The corresponding rows need to be multiplied and accumulated. Based on this, the matrix data needs to be split into multiple rows of data through the data distribution module 91, and the multiplication and calculation of each row of data are performed separately. Finally, the multiplication and calculation results of each row are accumulated to obtain the final result.
运算模块92可包括多个运算子模块,由于每个运算子模块基于伽罗华域乘法器执行乘法运算,而伽罗华域乘法器不会同时对每行数据的所有字节数进行计算,而是基于伽罗华域乘法器所支持的字节运算对相应字节数进行计算,故每个运算子模块都包括多个伽罗华域乘法器,和用于对每个伽罗华域乘法器的计算结果进行累加计算的加法器,也即每个运算子模块用于对一行待计算数据进行乘累加计算;每个运算子模块均包括加法 器和多个如上任意一个实施例的伽罗华域乘法器;伽罗华域乘法器的总个数根据总线字节数确定。可选的,若伽罗华域乘法器为单字节运算,则每个运算子模块中包含的伽罗华域乘法器的总个数与总线字节数相同;运算模块包含的伽罗华域乘法器的总个数为待纠删数据的矩阵行数和总线字节数的乘积值。加法器用于对每个伽罗华域乘法器输出的乘法计算结果进行累加操作。加法器例如可为伽罗华域加法器。重定序模块93用于按照数据分发模块91对待纠删数据的分发顺序将每个运算子模块输出的乘累加计算结果进行拼接处理。The operation module 92 may include a plurality of operation sub-modules, because each operation sub-module performs multiplication based on the Galois field multiplier, and the Galois field multiplier does not calculate all the bytes of each row of data at the same time, Instead, the corresponding number of bytes is calculated based on the byte operations supported by the Galois field multiplier, so each operator module includes multiple Galois field multipliers, and is used to calculate each Galois field multiplier The calculation results of the multiplier are accumulated and calculated by the adder, that is, each operation sub-module is used to multiply and accumulate a row of data to be calculated; each operation sub-module includes an adder and a plurality of gals in any one of the above embodiments Galois field multiplier; the total number of Galois field multipliers is determined according to the number of bus bytes. Optionally, if the Galois field multiplier is a single-byte operation, the total number of Galois field multipliers contained in each operation sub-module is the same as the number of bus bytes; the Galois field multiplier contained in the operation module The total number of field multipliers is the product value of the number of matrix rows of data to be erased and the number of bus bytes. The adder is used for accumulating the multiplication calculation results output by each Galois field multiplier. The adder may be, for example, a Galois field adder. The reordering module 93 is used to concatenate the multiplication and accumulation calculation results output by each operation sub-module according to the distribution order of the data to be erased by the data distribution module 91 .
进一步的,由于每个运算子模块需要执行多个乘加计算,为了保证运算子模块成功执行计算,基于上述实施例,每个运算子模块还可包括PE控制器,PE控制器分别与加法器和各伽罗华域乘法器相连;PE控制器用于根据乘累加计算总数,确定加法器的累加迭代次数和待纠删数据的使用次数。Further, since each operation sub-module needs to perform multiple multiplication and addition calculations, in order to ensure that the operation sub-module successfully executes the calculation, based on the above-mentioned embodiment, each operation sub-module can also include a PE controller, and the PE controller communicates with the adder respectively It is connected with each Galois field multiplier; the PE controller is used to calculate the total number according to the multiplication and accumulation, and determine the number of accumulation iterations of the adder and the number of times of use of the data to be erased.
由于运算模块包括多个运算子模块,重定序模块93用于输出最终的计算结果,为了保证结果输出无误,基于上述实施例,每个运算子模块还可包括EC块输出单元;EC块输出单元用于根据运算子模块的运算状态和重定序模块的输出状态,控制运算子模块的反压和重定序模块是否执行输出操作。Since the operation module includes a plurality of operation sub-modules, the reordering module 93 is used to output the final calculation result. In order to ensure that the result output is correct, based on the above-mentioned embodiment, each operation sub-module can also include an EC block output unit; the EC block output unit It is used to control whether the back pressure of the operation sub-module and the reordering module perform an output operation according to the operation state of the operation sub-module and the output state of the reordering module.
为了使所属领域技术人员更加清楚明白本申请的技术方案,本申请还结合图10给出了一个示意性例子,可包括下述内容:In order to make those skilled in the art more clearly understand the technical solution of this application, this application also provides a schematic example in conjunction with Figure 10, which may include the following content:
在本实施例中,总线为16字节,伽罗华域乘法器为单字节运算,加法器为伽罗华域加法器。在整个纠删编解码中,可以理解为矩阵乘法计算,矩阵乘法的根本计算为乘累加。其中对应行均需要做乘累加计算。具体再细分每一行的总线数据为16个字节,而当前乘法器为单字节运算,所以需要16个乘法器并行计算才能计算一次总线数据。也即每个运算子模块包括16个并列执行的伽罗华域乘法器,待纠删数据的矩阵行数×总线字节数。In this embodiment, the bus is 16 bytes, the Galois field multiplier is a single-byte operation, and the adder is a Galois field adder. In the entire erasure correction codec, it can be understood as matrix multiplication calculation, and the fundamental calculation of matrix multiplication is multiply-accumulate. The corresponding rows need to be multiplied and accumulated. Specifically, the bus data of each row is subdivided into 16 bytes, and the current multiplier is a single-byte operation, so 16 multipliers are needed for parallel calculation to calculate the bus data once. That is, each operation sub-module includes 16 Galois field multipliers executed in parallel, and the number of matrix rows of the data to be erased × the number of bus bytes.
在本实施例中,数据分发模块根据对应矩阵计算的行数分发纠删计算的矩阵和数据。具体的,待纠删数据为RS/RS -1,将4*16字节的RS/RS -1矩阵拆分为4行,则每一行为16个字节数据。将16字节的数据块送入所有行。那么每一行的输入为16个字节的数据块和16个字节的矩阵块。使用16个GF乘法器,将每个字节的数据块和每个字节的矩阵块分别作为GF乘法器data_a、data_b的两个输入。PE控制器根据乘累加的个数,计算累加迭代的次数,运算完成的时间。根据PE控制器的调度,确定RS/RS -1的使用次数和加法器的累加次数。GF加法器实际上GF加法器的本质是异或运算,在本实施例中作为 矩阵乘累加的累加计算使用。EC块输出根据前级运算模块的运算状态和后级运算模块的输出状态,控制前级反压和后级是否输出。EC数据块重定序模块每一行矩阵运算均有EC块输出模块,具体计算完成后行的位置需要确定位置并重新拼接送出。 In this embodiment, the data distribution module distributes the matrix and data of the erasure correction calculation according to the number of rows corresponding to the matrix calculation. Specifically, the data to be erased is RS/RS -1 , and the 4*16-byte RS/RS -1 matrix is divided into 4 rows, and each row contains 16 bytes of data. Feed 16-byte chunks of data into all rows. Then the input for each row is a data block of 16 bytes and a matrix block of 16 bytes. Using 16 GF multipliers, the data block of each byte and the matrix block of each byte are respectively used as two inputs of GF multipliers data_a and data_b. According to the number of multiplication and accumulation, the PE controller calculates the number of accumulation iterations and the time when the operation is completed. According to the scheduling of the PE controller, determine the use times of RS/RS -1 and the accumulation times of the adder. GF adder In fact, the essence of the GF adder is an XOR operation, which is used as an accumulation calculation of matrix multiplication and accumulation in this embodiment. The output of the EC block controls the back pressure of the front stage and whether the output of the latter stage is based on the operation state of the previous stage operation module and the output state of the subsequent stage operation module. The EC data block reordering module has an EC block output module for each row of matrix operations. After the specific calculation is completed, the position of the row needs to be determined and re-spliced and sent.
本申请实施例伽罗华域乘法器的各功能模块的功能可参照上述实施例中的实现过程的相关描述,此处不再赘述。For the functions of the functional modules of the Galois field multiplier in the embodiment of the present application, reference may be made to the relevant description of the implementation process in the foregoing embodiments, and details are not repeated here.
由上可知,本申请实施例从整体系统上看,流水化方法的GF乘法器完全适用于纠删编解码的功能要求,硬件实现较为容易,且能保证较高的计算效率及数据吞吐率。大幅降低了查表GF乘法器在硬件实现上资源消耗,且可以根据各种系统应用的应用需求,灵活地根据使用需求实时改变伽罗华域多项式进行计算。It can be seen from the above that, from the perspective of the overall system in the embodiment of the present application, the GF multiplier of the pipeline method is fully suitable for the functional requirements of erasure correction codec, and the hardware implementation is relatively easy, and can ensure high computing efficiency and data throughput. It greatly reduces the resource consumption of the look-up table GF multiplier in hardware implementation, and can flexibly change the Galois field polynomial in real time according to the application requirements of various system applications for calculation.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other. Professionals can further realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two. In order to clearly illustrate the possible Interchangeability, in the above description, the components and steps of each example have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. To make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered to be within the range described in this specification.
以上上述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation modes of the present application, and the description thereof is relatively specific and detailed, but it should not be construed as limiting the scope of the patent for the invention. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the scope of protection of the patent application should be based on the appended claims.

Claims (10)

  1. 一种伽罗华域乘法器,其特征在于,包括基本运算单元组和循环处理单元组;A Galois field multiplier, characterized in that it includes a group of basic arithmetic units and a group of cyclic processing units;
    所述基本运算单元组包括串联连接的起始运算单元、多个中间运算单元和终止运算单元,所述循环处理单元组包括多个串联连接的循环处理单元;所述基本运算单元组包含的运算单元总数和所述循环处理单元包含的循环处理单元总数根据伽罗华域乘法器的输入数据的数据位宽确定;The basic operation unit group includes a series-connected start operation unit, a plurality of intermediate operation units and a termination operation unit, and the cycle processing unit group includes a plurality of series-connected cycle processing units; the operations contained in the basic operation unit group The total number of units and the total number of cyclic processing units included in the cyclic processing unit are determined according to the data bit width of the input data of the Galois field multiplier;
    所述起始运算单元用于对第一输入数据和目标生成元进行伽罗华域乘法运算,并将乘法计算结果输出至下一个运算单元和相应的循环处理单元;各中间运算单元用于对接收到的数据和所述目标生成元进行伽罗华域乘法运算,并将乘法计算结果输出至下一个运算单元和相应的循环处理单元;所述终止运算单元用于对接收到的数据和所述目标生成元进行伽罗华域乘法运算,并将乘法计算结果输出至相应的循环处理单元;和The initial operation unit is used to perform Galois field multiplication on the first input data and the target generator, and output the result of the multiplication to the next operation unit and the corresponding loop processing unit; each intermediate operation unit is used to Galois field multiplication is performed on the received data and the target generator, and the result of the multiplication is output to the next operation unit and the corresponding loop processing unit; the termination operation unit is used to process the received data and the The target generator performs Galois field multiplication, and outputs the result of the multiplication to the corresponding cyclic processing unit; and
    所述循环处理单元组用于根据第二输入数据、初始化数据和所述基本运算单元组输出的伽罗华域乘法运算结果确定当前循环次数,并输出最终计算结果。The cycle processing unit group is used to determine the current cycle number according to the second input data, initialization data and the Galois field multiplication operation result output by the basic operation unit group, and output the final calculation result.
  2. 根据权利要求1所述的伽罗华域乘法器,其特征在于,所述循环处理单元组中包含的循环处理单元总数与所述伽罗华域乘法器的输入数据的数据位宽值相同;所述第二输入数据的每一个比特位数据对应一个循环处理单元。The Galois field multiplier according to claim 1, wherein the total number of loop processing units included in the loop processing unit group is the same as the data bit width value of the input data of the Galois field multiplier; Each bit of data in the second input data corresponds to a loop processing unit.
  3. 根据权利要求2所述的伽罗华域乘法器,其特征在于,所述基本运算单元组中包含的运算单元总数为所述伽罗华域乘法器的输入数据的数据位宽值和1的差值。The Galois field multiplier according to claim 2, wherein the total number of computing units included in the basic computing unit group is equal to the data bit width value and 1 of the input data of the Galois field multiplier difference.
  4. 根据权利要求1至3任意一项所述的伽罗华域乘法器,其特征在于,所述循环处理单元组的每个循环处理单元均包括寄存器、异或门和选择器;所述寄存器分别与所述异或门和所述选择器相连,所述异或门与所述选择器相连;The Galois field multiplier according to any one of claims 1 to 3, wherein each loop processing unit of the loop processing unit group includes a register, an exclusive OR gate and a selector; the registers are respectively connected with the XOR gate and the selector, and the XOR gate is connected with the selector;
    所述寄存器用于存储所属循环处理单元接收到的原始数据以及用于进行时序对齐;所述原始数据为前一个循环处理单元输出的前数据结果或初始化数据;The register is used to store the original data received by the associated loop processing unit and to perform timing alignment; the original data is the previous data result or initialization data output by the previous loop processing unit;
    所述异或门用于对所属循环处理单元对应的运算单元输出的乘法计算结果与所述原始数据进行异或计算,并将异或计算结果输出至所述选择器中;和The XOR gate is used to perform XOR calculation on the multiplication calculation result output by the operation unit corresponding to the loop processing unit and the original data, and output the XOR calculation result to the selector; and
    所述选择器用于根据所述第二输入数据的比特值从所述原始数据和所述异或计算结果中,选择作为输出结果的目标数据。The selector is used to select target data as an output result from the original data and the XOR calculation result according to the bit value of the second input data.
  5. 根据权利要求4所述的伽罗华域乘法器,其特征在于,所述寄存器为D类型触发器。The Galois field multiplier according to claim 4, wherein the register is a D-type flip-flop.
  6. 一种纠删编解码系统,其特征在于,包括数据分发模块、运算模块和重定序模 块;An erasure correction codec system is characterized in that it includes a data distribution module, an operation module and a reordering module;
    所述数据分发模块用于将待纠删数据进行数据分发,得到多行待计算数据;The data distribution module is used to distribute the data to be deleted to obtain multiple rows of data to be calculated;
    所述运算模块包括多个运算子模块,每个运算子模块用于对一行待计算数据进行乘累加计算;每个运算子模块均包括加法器和多个如权利要求1至5任意一项所述伽罗华域乘法器;所述伽罗华域乘法器的总个数根据总线字节数确定;所述加法器用于对每个所述伽罗华域乘法器输出的乘法计算结果进行累加操作;和The operation module includes a plurality of operation sub-modules, and each operation sub-module is used to multiply and accumulate a row of data to be calculated; each operation sub-module includes an adder and a plurality of operation sub-modules as described in any one of claims 1 to 5. The Galois field multiplier; the total number of the Galois field multipliers is determined according to the number of bus bytes; the adder is used to accumulate the multiplication calculation results output by each of the Galois field multipliers operation; and
    所述重定序模块用于按照分发顺序将每个运算子模块输出的乘累加计算结果进行拼接处理。The reordering module is used for splicing the multiplication and accumulation calculation results output by each operator module according to the distribution order.
  7. 根据权利要求6所述的纠删编解码系统,其特征在于,所述运算子模块还包括PE控制器,所述PE控制器分别与所述加法器和各所述伽罗华域乘法器相连;The erasure correction codec system according to claim 6, wherein the operation sub-module further includes a PE controller, and the PE controller is respectively connected to the adder and each of the Galois field multipliers ;
    所述PE控制器用于根据乘累加计算总数,确定所述加法器的累加迭代次数和所述待纠删数据的使用次数。The PE controller is configured to determine the number of accumulation iterations of the adder and the number of uses of the data to be erased according to the total number of multiplication and accumulation calculations.
  8. 根据权利要求7所述的纠删编解码系统,其特征在于,所述运算子模块还包括EC块输出单元;The erasure correction codec system according to claim 7, wherein the operation sub-module further includes an EC block output unit;
    所述EC块输出单元用于根据所述运算子模块的运算状态和所述重定序模块的输出状态,控制所述运算子模块的反压和所述重定序模块是否执行输出操作。The EC block output unit is used to control the back pressure of the operation sub-module and whether the re-sequence module performs an output operation according to the operation state of the operation sub-module and the output state of the re-sequence module.
  9. 根据权利要求6至8任意一项所述的纠删编解码系统,其特征在于,所述运算子模块中包含的伽罗华域乘法器的总个数与总线字节数相同;所述运算模块包含的伽罗华域乘法器的总个数为所述待纠删数据的矩阵行数和所述总线字节数的乘积值。The erasure correction encoding and decoding system according to any one of claims 6 to 8, wherein the total number of Galois field multipliers included in the operation sub-module is the same as the number of bus bytes; the operation The total number of Galois field multipliers included in the module is the product value of the number of matrix rows of the data to be erased and the number of bytes of the bus.
  10. 根据权利要求9所述的纠删编解码系统,其特征在于,所述加法器为伽罗华域加法器。The erasure correction codec system according to claim 9, wherein the adder is a Galois field adder.
PCT/CN2022/102524 2022-01-14 2022-06-29 Galois field multiplier and erasure coding and decoding system WO2023134130A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210039878.0 2022-01-14
CN202210039878.0A CN114063973B (en) 2022-01-14 2022-01-14 Galois field multiplier and erasure coding and decoding system

Publications (1)

Publication Number Publication Date
WO2023134130A1 true WO2023134130A1 (en) 2023-07-20

Family

ID=80230873

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/102524 WO2023134130A1 (en) 2022-01-14 2022-06-29 Galois field multiplier and erasure coding and decoding system

Country Status (2)

Country Link
CN (1) CN114063973B (en)
WO (1) WO2023134130A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114063973B (en) * 2022-01-14 2022-04-22 苏州浪潮智能科技有限公司 Galois field multiplier and erasure coding and decoding system
CN114416424B (en) * 2022-03-30 2022-06-17 苏州浪潮智能科技有限公司 RAID encoding and decoding method, device, equipment and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040153722A1 (en) * 2002-12-25 2004-08-05 Heng-Kuan Lee Error correction code circuit with reduced hardware complexity
TW201135477A (en) * 2010-04-01 2011-10-16 Ind Tech Res Inst Sequential Galois field multiplication architecture and method
CN106066784A (en) * 2015-04-23 2016-11-02 阿尔特拉公司 For realizing circuit and the method for galois field yojan
CN106533452A (en) * 2016-11-14 2017-03-22 中国电子科技集团公司第五十四研究所 Multi-ary LDPC coding method and coder
CN111694692A (en) * 2020-06-24 2020-09-22 山东云海国创云计算装备产业创新中心有限公司 Data storage erasure method, device and equipment and readable storage medium
CN114063973A (en) * 2022-01-14 2022-02-18 苏州浪潮智能科技有限公司 Galois field multiplier and erasure coding and decoding system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7177891B2 (en) * 2002-10-09 2007-02-13 Analog Devices, Inc. Compact Galois field multiplier engine
WO2006120691A1 (en) * 2005-05-06 2006-11-16 Analog Devices Inc. Galois field arithmetic unit for error detection and correction in processors
CN101650644B (en) * 2009-04-10 2012-07-04 北京邮电大学 Galois field multiplying unit realizing device
TWI529614B (en) * 2014-03-27 2016-04-11 衡宇科技股份有限公司 Serial multiply accumulator for galois field
DE102018113475A1 (en) * 2018-06-06 2019-12-12 Infineon Technologies Ag READY TO CALCULATE WITH MASKED DATA

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040153722A1 (en) * 2002-12-25 2004-08-05 Heng-Kuan Lee Error correction code circuit with reduced hardware complexity
TW201135477A (en) * 2010-04-01 2011-10-16 Ind Tech Res Inst Sequential Galois field multiplication architecture and method
CN106066784A (en) * 2015-04-23 2016-11-02 阿尔特拉公司 For realizing circuit and the method for galois field yojan
CN106533452A (en) * 2016-11-14 2017-03-22 中国电子科技集团公司第五十四研究所 Multi-ary LDPC coding method and coder
CN111694692A (en) * 2020-06-24 2020-09-22 山东云海国创云计算装备产业创新中心有限公司 Data storage erasure method, device and equipment and readable storage medium
CN114063973A (en) * 2022-01-14 2022-02-18 苏州浪潮智能科技有限公司 Galois field multiplier and erasure coding and decoding system

Also Published As

Publication number Publication date
CN114063973A (en) 2022-02-18
CN114063973B (en) 2022-04-22

Similar Documents

Publication Publication Date Title
WO2023134130A1 (en) Galois field multiplier and erasure coding and decoding system
KR101616478B1 (en) Implementation of Arbitrary Galois Field Arithmetic on a Programmable Processor
JP4460047B2 (en) Galois field multiplication system
CN101227194B (en) Circuit, encoder and method for encoding parallel BCH
US20200036517A1 (en) Secure hash algorithm implementation
Talapatra et al. Low Complexity Digit Serial Systolic Montgomery Multipliers for Special Class of ${\rm GF}(2^{m}) $
CN112039535A (en) Code rate compatible LDPC encoder based on quasi-cyclic generator matrix
CN107992283B (en) Method and device for realizing finite field multiplication based on dimension reduction
CN101296053A (en) Method and system for calculating cyclic redundancy check code
US9065482B1 (en) Circuit for forward error correction encoding of data blocks
CN114895870A (en) Efficient reconfigurable SM2 dot product method and system based on FPGA
KR100478974B1 (en) Serial finite-field multiplier
CN102891689B (en) A kind of error location polynomial method for solving and device
US8862968B1 (en) Circuit for forward error correction encoding of data blocks
US6912558B2 (en) Multiplication module, multiplicative inverse arithmetic circuit, multiplicative inverse arithmetic control method, apparatus employing multiplicative inverse arithmetic circuit, and cryptographic apparatus and error correction decoder therefor
Delgado-Mohatar et al. Performance evaluation of highly efficient techniques for software implementation of LFSR
CN112819168B (en) Ring polynomial multiplier circuit in encryption and decryption of lattice cipher
Drescher et al. VLSI architecture for non-sequential inversion over GF (2m) using the euclidean algorithm
CN113285725A (en) QC-LDPC encoding method and encoder
Shum et al. Network coding based on byte-wise circular shift and integer addition
CN108268243B (en) Composite domain multiplication device based on search
WO2019174263A1 (en) Multi-addend addition circuit for stochastic calculus
CN116893800A (en) Montgomery modular multiplier based on parallel structure and coding
US11750222B1 (en) Throughput efficient Reed-Solomon forward error correction decoding
CN114626537B (en) Irreducible polynomial and quantum secure hash value calculation method based on x86 platform SIMD

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22919767

Country of ref document: EP

Kind code of ref document: A1