AN ENHANCED CIRCUIT FOR PERFORMING ERROR CORRECTION AND DETECTION OF BCH CODEWORDS
HELD AND BACKGROUND OF THE INVENTION
The present invention relates generally to a syndrome computing apparatus used for the purpose of error detection and correction, and more particularly for the puipose of error detection and correction of data coded in accordance with a Bose-Chaudhυri- Hocquenghem (BCH) code.
In a digital data communication system, digital information is transmitted through a channel to a receiver. Due to noises and/or distortions, the received digital information often contains a number of bit errors. A bit error is considered to have happened when the value received is different from the value sent, for example if a logical '0' is received when the transmitted value was a logical '1'. One of the strategies used to overcome this problem is the introduction of error correcting codes. A well- known member of the large family of such codes is the BCH encoding and decoding technique. A BCH code can be used for correction of scattering of single error bits within an input data word. The BCH code is used in satellite communication links, optical networks, etc., where error correction codes are often employed to mitigate the effects of noise interference. An error correction procedure is described herein for a BCH code that corrects up to 't' bit errors, where 't' is a predetermined number. A selection of a large 't' leads to increased length of the redundancy bits in a codeword, and therefore to a more complex decoding process.
A typical BCH decoder accomplishes the following steps:
(a) calculation of the syndrome coefficient values Sj, for j= 0, 1, 2, . . . , t-1, where 't' is the maximum number of bit errors to be corrected, within each received word;
(b) determination of an error-locator polynomial σ(x) from the syndrome values Sj, for j=0,l,2, . . . , t-l ; and
(c) determination of the location of the erroneous bit(s) by finding the roots of σ(x)=0.
For step (b), a Berlekamp algorithm (US Patent Nos. 4,162,480 and 4,410,989) may be used. For step (c), a search algorithm proposed by Chien, as disclosed in US
Patents No. 3,278,729 and 3,418,629, is known to be one of the popular methods. Other relevant prior art may be found in US Patents 4,644,543 to Davis, 5,583,499 to Oh et al., 5,974,582 to Ly, 6,192,497 to Yang et al., 6,279,137 to Poeppelman et al., and 6,374,383 to Weng. In a BCH code, c(x) is a codeword if and only if α;°+1 , ah+3 , ai +5 , ..., ah+2t~l aιe roots of c(x), where α is a primitive element of a Galois field of power 2 (GF(2m)), and where 'm' is the length of the syndrome coefficient (Sj). A Galois field is an algebraic field having a finite number of elements. The number of elements is always of the form pm, where 'p' is a prime number and 'm' is a positive integer. The maxima] length of c(x), including the redundancy bits, is (2m-l) bits. The value jo is a positive integer between 0 and 2m-2. A detailed description of the Galois field may be found in "Error Correcting Codes" by W. Wesley Peterson and E. J. Weldon, Jr., MIT 1972, pages 155-160.
The mathematics underlying the BCH decoding are known to those skilled in the art of coding, and are described in detail in "Error Correcting Codes" by W. Wesley Peterson and E. J. Weldon, Jr., MIT 1972 pages 283-288.
As mentioned above, the determination of the location of the erroneous bit(s) is achieved by finding the roots of σ(x). The polynomial σ(x) is defined as follows:
σ{x) = l + γjσixi ; (1)
)'=!
where 't' is as defined above. The roots of the polynomial σ(x) are αJ , which are elements in the GF(2m).
Reference is now made to Fig. 1, which shows an exemplary block diagram of a conventional search circuit 100 for finding a set of error locations according to Chien above. Circuit 100 includes 't' registers 110-1 through 110-t, coupled to constant multipliers 120-1 through 120-t. A multiplier 120-j multiplies the content of a register 110-j by a corresponding constant αJ, and provides feedback into register 110-j. Circuit 100 further includes 't-1' adders 130-1 through 130-t, serially connected to each other. Adders 130 are also connected to the outputs of registers 110 and to a comparator 140. The corresponding
coefficients of the error locator polynomial (i.e., σ(x)) are parallel-loaded into their corresponding registers 110. Subsequently, each coefficient η is multiplied by the appropriate αJ using the corresponding multiplier 120? and the resulting value is stored in corresponding register 110. The stored values are summed up by the corresponding adder 130 and compared to zero by comparator 140. If the result of comparator 140 indicates a value of zero, then that value of x is a root and therefore corresponds to an error at location 2m-l-j bit in the received data word polynomial representation.
Circuitry 100 is clocked N times to evaluate the locator polynomial at all possible values of x, i.e., x = (αj)°, (αJ)!' (α )2 . . . , (αJ)n"!, for all powers of α. N is the length of the received input codeword. Thus, the next clocking of coefficients through the circuit multiplies αJ times the contents of the particular register 110, e.g., α1 is multiplied by the stored σι(α') to give σι(α2).
Circuit 100 is a serial implementation of a Chien search algorithm. Therefore, a single error bit is detected in each pass. In order to detect more than one bit simultaneously, a parallel Chine search circuit is used.
Reference is now made to Fig. 2, which shows a schematic block diagram of a parallel Chien search circuit 200. Circuit 200 includes 't' ranks 210-1 through 210-t, capable of detecting 'p' error bits concurrently over a different set of values of αj through αi+p . Each rank 210 includes a single register 220, 'p' constant multipliers 240-1 through 240-p, and 'p' adders 260-1 through 260-p. Initially, the coefficients of σ(x) are loaded to registers 220. The length of each σ(x) coefficient is 'm' bits. As mentioned above, each coefficient is also an element in GF(2m) .
In each rank, constant multiplier 240-1 is coupled to register 220, to receive the contents of register 220 for multiplication with the constant αJ, where j equals p, 2*p, 3*p,..., p*t. The product is fed back to register 220. In addition, the content of register 220 is fed to multipliers 240-2 through 240-p. Each of multipliers 240-2 through 240-p multiplies the content of register 220 with its corresponding value of αr'k, for V starting at '2' and ending at 'p\ where 'k' is the rank 210 index. The multiplication result of each of multipliers 240-2 through 240-p is fed to its respective adder 260. Namely, the product of multiplier 240-2 is fed to adder 260-2, the product of multiplier 240-3 is fed to adder 260-3, and so on.
Subsequently, the result of adders 260-1 in ranks 210-1 through 210-t are aggregated and sent to comparator 280-1, the result of adders 260-2 in ranks 210-1 through 210-t are aggregated and sent to comparator 280-2, etc. The same is true for the other comparators in circuit 200. Comparator 280-i indicates if the ith bit in the received block is an error bit. The parameters sp', 'm', and ct' mentioned above determine the decoder performance.
Specifically, parameter 'p' defines the number of error bits that can be detected in each cycle, while parameter 't' defines the total number of error bits that can be corrected in the entire received word. Hence, in order to ensure good performance, a selection of large 'p\ 'm' and 't' is essential. However, the number of logic gates required to implement a circuit such as circuit 200, for large values of 'p\ 't', and 'm', is practically impossible in current chip design and manufacturing technologies, due to the large number of gates involved. For example, the number of XOR gates required to implement the logic elements of circuit 200 is
at least: p - m - (m-\X - t XOR gates. Selecting 'p\ 't\ and 'm' equal to 128, 73, and 14 v 2 J respectively, requires close to one million XOR gates, which are more complex gates than regular logic gates, either requiring multiple transistors or a lower level implementation of several basic logic gates.
Therefore, it would be advantageous to provide a parallel error-locator (i.e., Chien search) circuit with a minimal number of logic gates. In particular, it would be advantageous to provide an apparatus and method capable of determining roots of an error-locator polynomial, and to concurrently detect 'p' error bits in a received codeword.
SUMMARY OF THE INVENTION
The present invention is of a Chien type search apparatus used for the purpose of error detection and correction, and more particularly for the purpose of error detection and correction of data coded in accordance with a Bose-Chaudhuri-Hocquenghem (BCH) code.
According to the present invention there is provided an error-locator circuit capable of detecting and correcting error bits in an input word, the circuit detecting, per cycle, an input block of 'p' bits belonging to the input word, the circuit comprising: a plurality of 't' logic ranks of index 'k' wherein k=l, 2,...t, each logic rank including a multiplication logic unit, each multiplication logic unit outputting a plurality of i=l,2,..p outputs; a plurality of p'
adders included in each logic rank 'k', each of the adders operative to receive a respective output Y(k, i) of the plurality of outputs; and a plurality of 'p' comparators, each of the comparators operative to receive an aggregated input from all adders of the 't' ranks having the same 'i', whereby each comparator indicates if the received il bit in the input block is an error bit.
According to one feature in the error-locator circuit of the present invention, each logic rank further includes a constant multiplier used for multiplying a register content with the value of α F to obtain a product fed to the multiplication logic unit, and a register for holding the product. According to another feature in the error-locator circuit of the present invention, each of the adders is a modulo-2 adder.
According to yet another feature in the error-locator circuit of the present invention, the α *pis an element in the Galois filed GF(2m).
According to yet another feature in the error-locator circuit of the present invention, each multiplication logic unit has 'm' inputs, wherein the plurality of outputs equals 'p*m' outputs, and wherein the outputs are constant functions of the inputs.
According to yet another feature in the error-locator circuit of the present invention, the multiplication logic unit includes at least a first combinational logic unit having at least m/2 inputs and at most 2mJ2 outputs, wherein m/2 is an integer number, a second combinational logic unit having at least m/2 inputs and at most 2 /2 outputs, wherein m/2 is an integer number and a third combinational logic unit connected to both the first combinational logic unit and the second combinational logic unit, the third combination logic including p*m outputs.
According to yet another feature in the error-locator circuit of the present invention, the multiplication logic unit comprises a combinational logic at least
• 2 + p * m of exclusive-or (XOR) gates.
According to yet another feature in the error-locator circuit of the present invention, the error-locator circuit is configured to operate with a Bose-Chaudhuri-Hocquenghem decoder.
According to the present invention there is provided an integrated circuit (IC) capable of detecting and correcting error bits in an input word, the integrated circuit detecting, per cycle, an input block of 'p' bits belonging to the input word, comprising: a plurality of 't' logic ranks of index 'k' wherein k=l, 2,...t, each of the logic ranks including a multiplication logic unit, each multiplication logic unit outputting a plurality of i=l,2,..p outputs, each logic rank further including a register for receiving a respective coefficient of an error correcting polynomial to produce a register content, the register coupled to the multiplication logic unit, a constant multiplier coupled to the register and used for multiplying the register content with an appropriate value to obtain a product fed to the multiplication logic unit, and a plurality of 'p' adders, each of the adders operative to receive a respective output Y(k, i) of the plurality of outputs; and a plurality of p' comparators, each of the comparators operative to receive an aggregated input from all adders of the 't' ranks having the same 'i', whereby each comparator indicates if the received ilh bit in the input block is an error bit.
According to a feature in the integrated circuit of the present invention, the error- locator circuit is configured to operate with a Bose-Chaudhuri-Hocquenghem decoder.
According to the present invention there is provided a multiplication logic unit (MLU) comprising: at least one first combinational logic unit having at least m/2 first inputs and at most 2m 2 first outputs, wherein m/2 is an integer number at least one second combinational logic unit having at least m/2 second inputs and at most 2m/2 second outputs, and at least one third combinational logic unit connected to the first and the second combinational logic units, the third combination logic including 'p*m' outputs, whereby the MLU is capable of multiplying between 'm' input bits and j= p-l ' different values of an element αk*j.
According to the present invention there is provided a method for detecting and correcting error bits in an input word, comprising the steps of: per cycle, detecting an input block of p' bits belonging to the input word, multiplying between a respective value αk*p and a register content to obtain a multiplication product, and saving the multiplication product in the register; providing a plurality of 't' logic ranks of index 'k' wherein k=l, 2,...t, each of the
logic ranks operative to output a plurality of p' outputs, each output including a respective multiplication product; adding a respective output Y(k, i) of the plurality of outputs in an adder 'i' of a plurality of p' adders included in each logic rank 'k'; forming an aggregate input from all adders of the 't' ranks having the same 'i'; and providing the aggregate input to a comparator operative to indicate if the received il bit in the input block is an error bit.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention is herein described, by way of example only, with reference to the accompanying drawings, wherein: Figure 1 is an exemplary block diagram of a conventional serial Chien search circuit (prior art);
Figure 2 is an exemplary block diagram of a conventional parallel serial Chien search circuit
(prior art);
Figure 3 is a block diagram of a preferred embodiment of an error-locator circuit, in accordance with one embodiment of the present invention; '
Figure 4 shows an exemplary block diagram of a multiplication logic unit in accordance with another embodiment of the present invention;
Figure 5 is a detailed example showing a preferred implementation of the multiplication logic unit.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present invention provides an apparatus and method capable of determining roots of an error-locator polynomial, where the possible roots are elements of a Galois field. The present invention can be utilized to concurrently detect p' error bits in a received codeword. In accordance with the present invention, the number of logic gates to implement the preferred apparatus may be significantly reduced by coupling parallel constant multipliers in each rank, and by replacing the constant multipliers with a unified combinational logic, hence reducing the number logic gates in each rank.
Reference is now made to Fig. 3, which shows a block diagram of an error-locator circuit 300, implemented in accordance with the present invention. Circuit 300 includes 't' ranks 310-1 through 310-t. Each rank 310 is correlated with a different value of αJ and
includes: a register 320, a multiplication logic unit (MLU) 330, and p' adders 340-1 through 340-p. Adders 340 are modulo-2 adders, i.e., capable of performing a XOR operation. Further, each of ranks 310 includes a single constant multiplier 360 used for multiplying the content of register 320 with the appropriate value of αp*\ where 'k' is the rank index. In order to detect an input codeword having 6N' bits, circuit 300 is clocked N/p times. Each clock, register 320 holds the product of the multiplication between the respective value of αk p and the previous content of register 320. At the first clock, register 320 loads the respective coefficient of an error correction polynomial σ;, i.e., the register is initially loaded with the value of σ_. MLU 330 replaces the 'p-l' parallel multipliers of the prior art circuit 200 in Fig. 2 (e.g. 240-2 through 240-t), with a unified combinational logic of XOR gates. MLU 330 performs a constant multiplication between the content of register 320 and the value of αk where 'j' is an integer starting at 0 and ending at 'p-l', and 'k' is the rank index. The number of XOR gates in MLU 330 is significantly lower than the number of gates required to implement multipliers 240-1 through 240-p. The inventors have found that replacing the 'p-l' constant multipliers by an improved combinational logic (i.e., MLU 330) is possible, since the outputs of MLU 330 are constant functions of its inputs. MLU 330 is a key innovative element of the system of the present invention and is described in greater detail below.
The products of MLU 330 are grouped to 'p' outputs Y(k, 1) through Y(k, p), where each output includes 'm' bits and is coupled to one of adders 340. As mentioned, parameter 'k' defines the rank index. Specifically, output Y(k, 1) is coupled to adder 340-1, output Y(k, 2) is coupled to adder 340-2 of rank 310-k and so on. The results from the respective adders 340 in each rank are aggregated and fed to the respective comparator 350. That is, outputs Y(l, 1) through Y(t, 1) are aggregated by adders 340-1 of ranks 310-1 through 310-t, and the result is fed to comparator 350-1; outputs Y(l, 2) through Y(t, 2) are aggregated using adders 340-2 of ranks 310-1 through 310-t, and the result is fed to comparator 350-2, and so on. Comparator 350-i indicates if the ith bit in the received block is an error bit.
Circuit 300 is specially designed to operate at speeds of 10 GBPS and beyond. Furthermore, the design is suitable for any input codeword length of size of at least 1,000 bits.
Reference is now made to Fig. 4, which shows a block diagram of MLU 330, in accordance with one embodiment of the present invention. MLU 330 is preferably comprised of three combinational logic units, 410, 420, and 430. MLU 330 has 'm' inputs marked as ai through a
m and 'p*m' outputs marked as i through b
p-
m. These outputs are divided into 'p' groups, each having 'm' bits. The inputs, i.e., ai through a
m are divided into two groups where the first group
through am
/2 is connected to logic 410, and the second group a„,
/2+ι through a
m is connected to logic 420. Each of logics 410 and 420 includes 2
(m/2) outputs marked as ci through c
r and di through d
r, respectively, where r=2
( '. Each of these outputs is a binary combination of the 'm/2' inputs. A combination is defined as a XOR operation from a subset of inputs. Hence, the number of XOR gates required to implement each of logics 410 and 420 is:
(or 420) inputs, i.e. the number of '1' in a truth table for the inputs i _ am/2- Since, two
inputs require a single XOR gate, the quantity
■ 1 is subtracted from the first quantity.
Notice that only the 1st row in the truth table does not have any '1'. This is the reason for the '-1' in the formula.
The outputs of combinational logic units 410 and 420 are connected to combinational logic unit 430 at its inputs. Each of combinational logic unit's 430 outputs is a combination of at most two inputs, one from logic 410 and the other from logic 420. That is, b = Cj + d] where 'bk', 'CJ', and 'di' are one of combinational logic units' 430, 410, and 420 outputs respectively. Since each output of combinational logic unit 430 requires one XOR gate, a total of at most 'p*m' XOR gates are required to utilize combinational logic unit 430. A detailed example describing a preferred implementation of MLU 330 is provided below.
The total number of XOR gates required to implement MLU is:
where 'p' and 'm' are as defined above.
Reference is now made to Fig. 5A, which shows a detailed example of an exemplary preferred implementation of MLU 330. The example shows an exemplary MLU 500 having six inputs (i.e., m=6) marked as aj through a6. With a 'p' equal to 10, there are 60 outputs (i.e. p*m= 10*6=60') marked as d] through d60.
The implementation of MLU 500 is performed as follows: first, dividing inputs ai through a6 into two groups aι_ a and a3) and a^ as and a^ where the first group is connected to combinational logic unit 510 and the second group is connected to combinational logic unit 520. Next, computing for each group of inputs all the possible combinations within the group. For each group there are eight possible combinations shown in Fig. 5B. Fig 5B shows the outputs of combinational logic unit 510 (i.e., C] through cg) and combinational logic unit 520 (i.e., dj through d8) as a function of their inputs. The XOR operations derived from the truth table shown in Fig. 5B are as follows:
For combinational logic unit 510,
cι = 0; c
2 = a
3; c
3 = a
2; c
4= a
2 θ a
3; c
6= a] © a
3; c
7 = aι © a
2;
For combinational logic unit 520,
d4= 0; d5 = a6;
6 = a5; d = a5 θ a6; 5= ^; d6= a4 θ a6; d7= a4θ a5; d8 = a4θ a5θ ag.
Hence, in order to implement combinational logic units 510 and 520 ten XOR gates are required. Finally, combinational logic unit 530 is composed. As mentioned above, each of combinational logic unit's 530 outputs is a combination of two elements, one from combinational logic unit's 510 and the other from combinational logic unit's 520 outputs. For example, b may be a combination of al , a3, a5 and a6, i.e., b = aj θ a3 θ a5 θ a6. As, &\ θ a3 equals to c6 and a5 θ a6 equals d , b may be presented as c6 Φ d , i.e., b = c6 θ d . This process repeats for all the outputs of combinational logic unit 530. The number of XOR gates used to implement combinational logic unit 530 is at most the number of outputs, i.e., equals 60. The total number of XOR gates required to implement a single MLU is at most 70 (i.e., 5*2+60) XOR gates.
In order to implement an error-locator circuit (e.g., circuit 300) using MLU 500, only 70*t + C XOR gates are required, as opposed to prior art approaches where [10*6*(6- l)/2]*t + C = 150*t + C XOR gates. The parameter 'C is a constant number of logic gates required to implement the other components of the circuit, e.g., registers, adders, and comparators. As can be seen the total saving is about 80*t XOR gates. It should be appreciated by a person skilled in the art that the total saving of XOR gates is extensively increased for larger values of p', 'm' and 't\
In another embodiment of the present invention, an error locator circuit 300 is implemented for 'p', 'm', and 't' equaling 128, 14, and 73 respectively. In such an embodiment the total number of XOR gates for circuit 300 is: 2434*t + C = 177,682 + C. In order to implement circuit 300 using the approach shown in the prior art
p - m - fm -lλ \ - t + C = 850, 304 +C XOR gates are required. Obviously, the apparatus
\ ' ) disclosed herein significantly reduces the required XOR gates count.
All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention.