CN106941356B

CN106941356B - Circuit arrangement and method for a decomposable decoder

Info

Publication number: CN106941356B
Application number: CN201610440126.XA
Authority: CN
Inventors: M·朗哈默
Original assignee: Altera Corp
Current assignee: Altera Corp
Priority date: 2016-01-04
Filing date: 2016-06-17
Publication date: 2020-10-27
Anticipated expiration: 2036-06-17
Also published as: CN106941356A

Abstract

An input channel decoder circuit arrangement for use with an input channel having a data rate, wherein a codeword on the input channel comprises a plurality of symbols, includes an option to provide a first output channel having the data rate and an option to provide a plurality of second output channels having a slower data rate. The decoder circuit arrangement includes syndrome computation circuit arrangement, polynomial computation circuit arrangement, and search and correction circuitry. The syndrome computation circuitry includes a finite field multiplier for multiplying each symbol by a power of the root of the field. Each multiplier except the first multiplier multiplies the symbol by a higher power of the root than the adjacent multipliers. The first stage of adders adds the outputs of several multiplier groups. The second stage adder adds the outputs of the first stage adder to be accumulated as a syndrome of the first output channel. The other plurality of accumulators accumulates the output of the first stage adder, which after scaling, is the syndrome of the second output channel.

Description

Circuit arrangement and method for a decomposable decoder

Cross Reference to Related Applications

This patent document claims the benefit of co-pending, commonly assigned U.S. provisional patent application No.62/181,470 filed on 18/6/2015, which is incorporated herein by reference in its entirety.

Technical Field

The present invention relates to decomposing a Forward Error Correction (FEC) decoder into a plurality of slower FEC decoders.

Background

In some applications of FEC decoders including BCH type decoders, for example, Solomon (Reed-Solomon) decoders, decoders of different sizes or total processing power may be required. These decoders may have different numbers of parity checks or symbol checks for each codeword. Heretofore, a different decoder "engine" or circuit has been required for each different size decoder.

Disclosure of Invention

Consistent with embodiments of the present invention, a faster FEC decoder, such as a Reed-Solomon (Reed-Solomon) decoder or other BCH type decoder, may be decomposed into multiple slower FEC decoders. For example, a system may require multiple different FEC decoders, such as at a rate at which the system receives data but internally processes data at a different, slower rate. In this example, the system may have a faster FEC decoder for its outer interface and a slower inner FEC decoder, with the faster FEC decoder being decomposed into parallel slower FEC decoders using a common decoding engine when a common decoder engine is used for all FEC decoders.

The number of parity or symbol checks supported per codeword may vary between larger (i.e., faster) and smaller (i.e., slower) FEC decoders. The number of check symbols, and thus the number of syndromes to be calculated, may also be different. The opposite is possible, although it is generally assumed that the codewords of larger FEC decoders have a larger number of check symbols than the codewords of smaller FEC decoders.

The present invention provides a structure that can be used for any combination of larger and smaller FEC decoders having different size codewords and different numbers of check symbols per codeword, and where the boundaries of the codewords may not coincide with the clock boundaries. Although the structure is flexible, any particular implementation of the structure will be fixed for any combination of decoder sizes, and should include the resources of the maximum number of check symbols supported by the implementation.

It should be noted that the invention is most suitable for implementations where the field size (number of bits in the galois field) and the irreducible polynomial (which defines the field order) are the same for all decoder decompositions. While implementations of the invention may also be used where the field definitions are variable between decoder types, in such implementations requiring a greater amount of resources may result in a decoder that is larger than a separate decoder for a simple implementation of a different case.

Different decoder implementations will depend on the situation. In one example, a 400 gigabits per second (bps) ethernet channel may be connected to devices that do not support more than 100 gbits per second. A solution may be to decompose a 400 gbit per second channel into 4 100 gbit per second channels. However, the implementation of the present invention is scalable. Thus, a 400 Gbit per second channel can also be decomposed into 8 50 Gbit per second channels or 16 25 Gbit per second channels. In a 400 gigabit ethernet scenario, where a 400 gigabit per second channel is provided as two parallel 200 gigabit per second channels, a two-to-one decomposition would produce 2 100 gigabit per second channels from each of the 200 gigabit per second channels.

Thus, consistent with an embodiment of the present invention, decoder circuit means are provided for an input channel having a first data rate, wherein a codeword on the input channel comprises a plurality of symbols in parallel. The decoder channel includes both a first output channel selected to provide a data rate having a first data rate and a plurality of second output channels selected to provide a data rate having a data rate less than the first data rate. The decoder circuit means includes syndrome computation circuit means, polynomial computation circuit means, and search and correction circuitry. The syndrome calculation circuit means includes a plurality of finite field multipliers corresponding in number to the plurality of symbol quantities for multiplying symbols by powers of roots of the finite fields, each respective one of the plurality of multipliers except the first multiplier multiplying a corresponding one of the plurality of symbols by powers of the roots higher in the plurality of multipliers than in adjacent ones of the plurality of multipliers. The first stage adder circuit adds the outputs of several multiplier groups in the plurality of multipliers. The second stage adder adds the outputs of the first stage adder circuits. The first accumulator accumulates the output of the second stage adder as a syndrome for the first output channel. A plurality of second accumulators equal in number to the sets of multipliers accumulate the outputs of the first stage adder circuits. The corresponding scaling multipliers operate in all but one of the second accumulators. The output of each second accumulator is a syndrome of the second output channel.

A method of operating such a circuit arrangement is also provided.

Drawings

Other features, nature, and various advantages of the present invention will be apparent from the following detailed description taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which,

FIG. 1 illustrates a typical codeword input pattern for a first channel and four slower channels where the codeword boundaries coincide with clock cycle boundaries;

FIG. 2 shows a comparison of codeword input patterns for different speed first channels, where the codeword boundaries at the different speed first channels do not coincide with clock cycle boundaries;

fig. 3 shows a schematic diagram of a generic FEC decoder implemented according to the present invention;

FIG. 4 is a schematic diagram illustrating one embodiment for computing a syndrome, according to an embodiment of the present invention;

FIG. 5 is a schematic representation of a first summing device;

FIG. 6 is a schematic representation of a second summing device;

FIG. 7 is a schematic representation of a shift circuit arrangement for calculating codeword boundary positions in accordance with an embodiment of the present invention;

FIG. 8 illustrates a first arrangement of a plurality of shift circuit devices corresponding to a plurality of lanes;

FIG. 9 illustrates a second arrangement of a plurality of shift circuit devices corresponding to a plurality of lanes;

FIG. 10 is a simplified block diagram of an exemplary system employing a programmable logic device configured in accordance with the present invention; and

FIG. 11 is a flow chart of the syndrome calculation portion of the method according to the present invention.

Detailed Description

As noted above, the present disclosure describes a structure that can be decomposed into multiple slower FEC decoders from a faster FEC decoder, such as a BCH decoder, and in particular a Solomon (Reed-Solomon) decoder. For example, a system may require multiple different FEC decoders, such as a system that receives data at one rate but internally processes data at a different, slower rate. In this example, the system may have a faster FEC decoder for its outer interface and an inner slower FEC decoder. Both the faster outer decoder and the slower inner decoder may be configured to use a common slower decoder engine, provided that the faster FEC decoder is decomposed into parallel slower FEC decoders.

Fig. 1 shows an example of the decomposition between faster channels 101 in the upper half of the figure and four slower parallel channels 102 in the lower half of the figure. For simplicity, the codewords are identical (e.g., RS (544, 528)). However, it is also possible that the codewords in the faster channel 101 have one size (e.g., RS (544, 514)) and the codewords in the slower channel 102 have a different size (e.g., RS (528, 514)).

It can be seen that each individual codeword 111 in the faster channel 101 is an input on four clocks 100 (e.g., 136 symbols per clock). The codewords 112 in the parallel slower channels 102 are inputs on 16 clocks 100 (e.g., 34 symbols per clock). The codeword boundary coincides with the clock boundary. It can be seen that a lower speed configuration will have a longer input delay than in the case of a higher throughput, even if the aggregate throughput is the same.

Fig. 2 only shows the input lanes for a more complex case where the codeword length is not equally divisible by the number of inputs per clock symbol and therefore the start and end points of the codeword vary with respect to the clock boundary. For example, lane 201 in the upper portion of the figure may correspond to RS (544, 514) codewords 211 being input 128 symbols at a time, and lane 202 in the lower portion of the figure may show the same codewords 212 being input 64 symbols at a faster clock rate at a time. The slower and faster clock rates may correspond to, for example, different circuit implementations, such as Field Programmable Gate Array (FPGA) implementations and Application Specific Integrated Circuit (ASIC) implementations, respectively. In summary, the decomposed slower channels are not shown in this figure.

An implementation for handling codeword lengths that do not correspond to clock boundaries is disclosed in co-pending, commonly assigned, co-filed on 3/9/2015, U.S. patent application No.14/844,551, which is incorporated herein by reference in its entirety. After applying techniques (such as those disclosed in co-pending) to the inputs shown in FIG. 2, those can be processed as disclosed in the remainder of this disclosure. In particular, the following discussion details processing multiple codewords each with potentially different end/start positions.

Fig. 3 shows an overall decoder structure 300 in accordance with an embodiment of the invention, in which one incoming lane 301 is broken up into four outgoing lanes 302 in order to allow four smaller (i.e., slower) decoders to replace one larger (i.e., faster) decoder. The structure 300 includes a syndrome computation stage 310, a key equation solver stage 320, and a search and correction stage 330.

The critical equation solver stage 320 includes several critical equation solver blocks 321 that compute the misalignment location polynomial λ and the error correction polynomial Ω. The critical equation solver block 321 may be conventional. Further, although the number of critical equation solver blocks 321 illustrated in the figure is equal to the number of output lanes 302, the number of critical equation solver blocks 321 may be different from the number of output lanes 302, depending on the total processing capacity of the critical equation solver blocks 321.

For example, if the critical equation solver block 321 is twice as fast as needed for a one-to-one correspondence between the number of critical equation solver blocks 321 and the number of output lanes 302, the number of critical equation solver blocks 321 may be half the number of output lanes 320, provided that the appropriate buffers (not shown) are provided. Conversely, as another example, if the critical equation solver blocks 321 are only half as fast as needed for a one-to-one correspondence between the number of critical equation solver blocks 321 and the number of output traces 302, the number of critical equation solver blocks 321 may be twice the number of output traces 320.

Syndrome computation stage 310 may include parallel syndrome computation circuitry, such as that in commonly assigned U.S. patent 8,347,192, which is incorporated herein by reference in its entirety. The circuit multiplies the incoming symbol by the increasing power α to provide a term that is then added.

Consistent with embodiments of the present invention, the summation may be implemented as a two-stage process. The first level counts entries into several subgroups corresponding to the number of lanes to be decomposed by the decoder. The number will vary from decoder to decoder and therefore any particular implementation will have to provide several subgroups equal to the maximum number of independent lanes that the decoder can decompose. The subgroups may be used individually for individual lanes or may be added if the decoder is not decomposed.

An implementation 400 for computing the syndrome is shown in fig. 4. There will be as many instances of circuit 400 in the decoder as the maximum number of syndromes that may be encountered, which is determined by the number of parity checks, symbol checks in the codeword. In particular, one instance of the circuit 400 would be for each syndrome encountered in a particular codeword. S in circuit 400 is an index for the particular syndrome that the instance of circuit 400 is being used for.

As depicted, circuit 400 shows three subgroups, but ellipses 401 indicate additional subgroups that are not shown. For example, assume four subgroups and 12 symbol inputs per clock cycle. Taking the third syndrome s to 2, the input coefficient to the multiplier 402 will be α⁰，α²，α⁴，α⁶，α⁸，α¹⁰，α¹²，α¹⁴，α¹⁶，α¹⁸，α²⁰And alpha²². Each adder 403 adds the multiplier terms for one of the subgroups. For the individual lane case, and are added by adder 404 (note that if there are additional subgroups represented by ellipses 401, they are also added at adder 404). The sum of the individual lanes 404 is accumulated with the running sum of the syndrome in accumulator 405 and scaled by the product shift value of α raised to the parallel p and syndrome exponent s in 415. In this example where p-12 (12 parallel input symbols per clock) s-2, the shift value is α²⁴. The result is s for higher speed lanes^thSyndrome of check, representing S_s.n.1Where "1" indicates the number of lanes (only one lane in the higher speed case) and n is the speed multiple (which is the same as the lane multiple). In an example where S is 2 and there are four lanes and the higher speed lane is four times the speed of the lower speed lane, the indication is S_2.4.1。

For the subgroups, the outputs of the respective adders 403 are accumulated at respective accumulators 413 and scaled at 423 by a shift value, which is a product of increased to subgroup parallel p/n (where n is the number of subgroups) and the syndrome exponent s. Thus, for the four subgroups, the shift value is α^(p/4)s＝α^ps/4. In addition to the first subgroup, the items must be removed (subvedback down) so that each item is α⁰Initially, therefore, at each accumulator 413 except the first, the multiplier 433 sums the sums prior to accumulationMultiplying by an appropriate inverse syndrome power alpha^-xs,…,α^-(p-3)s. The result is a corresponding s for each lower speed lane^thSyndrome of S_s.1.mIndicating that "1" (one of n) indicates a lower lane speed and m is the number of lanes. In the example, S is 2 and m is 1, …,4, and the syndrome is denoted S_2.1.1,S_2.1.2(not shown), S_2.1.3And S_2.1.4。

Any number of subgroups may be so decomposed. As another example, if the input is 64 symbols wide, one lane, four lanes (four subgroups which are sixteen symbols wide), eight lanes (eight subgroups which are eight symbols wide), sixteen lanes (sixteen subgroups which are four symbols wide), 32 lanes (32 subgroups which are two symbols wide) may be implemented. Other combinations or decompositions may also be created. In this case, the subgroup appends may be nested.

For illustration purposes, a simple summing arrangement 500, which may be referred to as "nested," is shown in fig. 5 using eight symbol wide inputs. Each symbol 501 is input to a multiplier 502. The corresponding pair of multipliers 502 is summed by a first stage adder 503, the output of which may provide a two symbol wide output 513. The output of the first stage adder 503 may also be added by the second stage adder 504 to provide a four symbol wide output 514. The output of the second stage adder 504 may also be added by a third stage adder 505 to provide an eight symbol wide output 515. The output is labeled by the following notation:

index_of_a_certain_group_size:group_size

thus, for example, 2:4 indicates that the second group has a group size of four.

In some cases, such as the arrangement of fig. 5, a recursive nested arrangement may not be used, and instead a fully or partially independent addition is used. The twelve symbol input case of fig. 6, has groups of three, four and twelve symbol widths. Again, each symbol 601 is input to a multiplier 602. The corresponding pairs of multipliers 602 are added by a first stage adder 603. It should be appreciated that in addition to the indicated three, four and twelve symbol wide groups, these adders can support two symbol wide groups (not shown) if desired. The outputs of the respective pairs of first stage adders 604 are then added at the respective second stage adders 604 to provide respective groups 614 of four symbol widths. The output of each first stage adder 603 is also added to the output of a multiplier 602 at a third stage adder 605 to provide a group 615 of three symbols in width. The outputs of the two second-stage adders 604 are added to each other at the fourth-stage adder 606, and then the output of the fourth-stage adder 606 is added to the output of one second-stage adder 604 at the fifth-stage adder 607 to provide a twelve-symbol wide group 617.

The number of critical equation solver blocks 321 will depend on the number of cycles required to solve the polynomial. As noted above, the aggregate total processing capacity of the critical equation solver 321 in the critical equation solver portion should be equal to or greater than the total processing capacity of the lanes having the greatest number of check symbols. In the example, there is one 400 Gbit per second parity subset 322 and four 100 Gbit per second parity subsets 342. The 400 gbits per second syndrome 322 is distributed to each of the critical equation solver blocks 321 in a round-robin fashion via multipliers 352. Each 100 gbits per second syndrome 342 is sent to only one (always the same one in this embodiment) of the critical equation solver block 321. The multiplexing pattern used to map the syndrome to the critical equation solver block 321 will be different for different implementations, but can be calculated by one of ordinary skill in the art.

Similarly, the output polynomial of the key equation solver block 321 must be distributed to the search and correction block 331. As noted above, each key equation solver block 321 outputs both the misalignment locating polynomial λ and the error correction polynomial Ω. However, to avoid cluttering the diagram, only the misalignment-locating polynomial λ is shown in fig. 3. For each omega_mThe circuit is repeated (not shown). For n lower speed traces, as in the case of the syndrome set, each polynomial (λ)_{100_ lane}) Is sent to only one of the search and correction blocks 331 (in this embodiment)Always the same in the examples). For higher speed traces, each respective key equation solver block 321 outputs a respective polynomial segment (λ @)_{Segment 400_ c}) And these segments are multiplexed at 362 and distributed to the search and correction block 331 via the shift circuit arrangement 700 and multiplier 323. For each search and correction block 331, the multiplier 323 selects a corresponding polynomial segment (λ) for a separate higher speed trace case_{Segment 400_ c}) Or selecting a polynomial (λ) for a number of lower speed lane cases_{100_ lane}). The mapping is again different for different implementations for a single higher speed lane case, but can be calculated by one of ordinary skill in the art.

The shift circuit arrangement 700 is used to align each polynomial into the correct starting position depending on which of the different search and correction blocks 331 receives the polynomial. In the 4:1 example used above, for the case of multiple lower speed lanes, one search and correction circuit per lane, with a constant mapping, so no shifting is required. But for the higher speed single lane case, one of the four search and correction blocks 331 will be used for the start of the codeword, and every quarter of the codeword width will be mapped to the next block 331, modulo the number of blocks 331.

Furthermore, in most cases the codeword will be shortened, i.e. it will have fewer symbols than the maximum number supported by the field size. This requires that the polynomial be shifted to the beginning of the first search position before use. For the first search polynomial coefficient, will be αⁱ. For the second, third, and subsequent coefficients, will be a²ⁱ，α³ⁱ，α⁴ⁱAnd so on. Because the search and correction circuits are p-parallel and there are four possible start/end positions (see upper part of fig. 2), the input to each circuit must be further p/4 shifted to correlate with the previous circuit for each circuit. For n^thCoefficient q^thInput of the circuit, with total number of p parallel signals distributed over g groups, the shift value will be a^{n(i+(q-1)p/g)}。

An implementation of the shift circuit arrangement 700 is shown in fig. 7. Each coefficient will have one of the g values selected from. These may be selected by multipliers. The choice of multiplier is unambiguous. As shown in fig. 7, for each codeword type, counter 701 counts to the number of symbol inputs per lane for that codeword, modulo the codeword length n. The result will have a small number of values that can be decoded by lane select circuit 702 to generate a multiplier selection.

There is one shift select circuit 703 per lane. The shift select circuit 703 includes m 4-input multipliers 710, one for each of the m polynomials λ_mEach of (a). The circuit for each omega_mRepeat (not shown). The select control signal 702 will select the same input for each multiplier 710, shifted by the same multiple of p/4. Fig. 8 shows how a separate circuit 700 is provided for each lane with multiplier selection for all lanes generated by the same counter 701. However, the generation of the selection control signals by lane selection circuitry 702 will be different to accommodate different mappings for different lanes.

In some cases, the number of start/end locations is less than the number of lanes. For example, the bottom pattern in fig. 2 has only two obvious possibilities, but nevertheless four lanes are still required due to the ratio between the faster and slower speed codewords. In such a case, illustrated in fig. 9, the selection control circuitry 902 may be shared between the two lane shift selection blocks, although the mapping input to the respective multipliers sharing one selection control circuit or lane selection circuit 902 in the two lanes will be different.

Each search and correction block 331 may perform a search in any known manner, such as a Chiensearch search (Chiensearch). For example, a method for initializing multiple chien search groups for varying codeword start positions is described in commonly assigned U.S. patent No. 8,621,331, which is hereby incorporated by reference in its entirety. As is well known, in a Solomon (Reed-Solomon) decoder, the search and correction block 331 will also include a Forney algorithm (Forney algorithm) to calculate the correction values.

The circuit arrangement described above may be implemented in fixed circuitry, such as an ASIC, while in a Programmable Logic Device (PLD), such as an FPGA, each user instance may be adapted to specific requirements. However, such circuitry may be provided as hard logic blocks on an FPGA or other PLD. An integrated circuit device such as PLD140 configured to include circuitry in accordance with implementations of the invention may be used in a wide variety of electronic devices. One possible use is in the exemplary data processing system 1400 shown in FIG. 10. Data processing system 1400 may include one or more of the following components: a processor 1401; a memory 1402; I/O circuit 1403; and an external device 1404. These components are coupled together by a system bus and included on a circuit board 1406 in an end user system 1407.

The system 1400 may be used in a wide variety of applications, such as computer networks, data networks, instrumentation, video processing, digital signal processing, Remote Radio Heads (RRHs), or other applications where the advantages of using programmable or changeable programs are desirable. PLD140 may be used to perform a variety of different logic functions. For example, a PLD may be configured as a processor or controller that works in cooperation with processor 1401. PLD140 may also be used as an arbitrator to arbitrate access to the shared resource in system 1400. In another example, PLD140 can be configured as an interface between processor 1401 and one of the other components in system 1400. It should be noted that system 1400 is merely exemplary and that the true scope and spirit of the invention should be indicated by the following claims.

As described above and incorporating the present invention, different techniques may be used to implement PLD 140.

The syndrome computation portion 1100 of the method of an embodiment of the present invention is illustrated in fig. 11. At 1101, a plurality of finite field multiplication operations corresponding to a plurality of symbol quantities are performed, with each multiplication operation including multiplying one of the symbols by a power of a root of the finite field, wherein each multiplication operation other than the first multiplication operation multiplies a respective symbol of the plurality of symbols by a higher power of the root than an adjacent multiplication operation. At 1102, multiply operations are grouped into multiply groups and a first stage add operation is performed to add together the results of the multiplications in each multiply group. At 1103, a second stage addition operation is performed to add the results of the first stage addition operation together. At 1104, the output of the second stage of addition operations is accumulated as a syndrome for the first output channel. At 1105, the output of the first stage addition operation is accumulated in a plurality of additional accumulation operations equal in number to the set of multiplications. At 1106, the outputs of all but one of the additional accumulation operations are scaled, along with the output of one of the additional accumulation operations, to be the respective syndrome of the respective second output channel of the second output channel.

In accordance with one aspect, decoder circuitry for an input channel having a first data rate, a codeword on the input channel having a plurality of symbols, whereby the decoder channel includes both the option of providing a first output channel having a first data rate and the option of providing a plurality of second output channels having a data rate less than the first data rate, whereby the decoder circuit means comprises syndrome computation circuit means, polynomial computation circuit means, and search and correction circuit means, and wherein the syndrome computation circuitry includes a plurality of finite field multipliers, a first stage adder circuit, a second stage adder, a first accumulator, a plurality of second accumulators, and respective scaling multipliers for all but one of the second accumulators, whereby the output of each of the second accumulators is a syndrome of one of the second output channels. A number of finite field multipliers corresponding in number to the number of symbols are used to multiply the symbols by powers of the roots of the finite fields. Each multiplier of the plurality of multipliers, other than the first multiplier, multiplies a respective symbol of the plurality of symbols by a higher power of the root than an adjacent multiplier of the plurality of multipliers. A first stage adder circuit adds outputs of several multiplier groups of the plurality of multipliers, a second stage adder adds outputs of the first stage adder circuit, and a first accumulator accumulates outputs of the second stage adder as a syndrome of a first output channel. A number of second accumulators equal in number to the multiplier banks are used to accumulate the outputs of the first stage adder circuits.

In some embodiments, the decoder circuitry includes a respective instance of syndrome calculation circuitry for each syndrome of the codeword and each respective finite field multiplier of the plurality of finite field multipliers multiplies a respective one of the symbols by the root of the product of the exponent related to the one of the symbols and the exponent of the syndrome.

In some embodiments, prior to a respective one of the second accumulators, the respective scaling multiplier decimates the minimum output of the one of the plurality of finite field multipliers contributing to the respective one of the second accumulators.

In some embodiments, the first accumulator applies a first scaling factor equal to the root of the product of the total number of symbols and the exponent of the syndrome.

In some embodiments, each of the second accumulators applies a second scaling factor equal to the first scaling factor divided by the number of multiplier banks.

In some embodiments, the plurality of multipliers are equally divided into multiplier groups.

In some embodiments, the plurality of multipliers are divided unequally into multiplier groups.

In some embodiments, the search and correction circuitry comprises a plurality of search and correction circuits equal in number to the plurality of second channels, and the decoder circuitry further comprises mapping circuitry for conducting outputs of the polynomial computation circuitry to the search and correction circuitry, the mapping circuitry comprising shifting circuitry to compensate for codeword boundaries that vary in position with respect to clock cycle boundaries.

In some embodiments, a shift circuit apparatus includes: a respective shift selection circuit for each of the search and correction circuits; a modulo word length counter; for generating shift control signals for the respective shift selection circuits; and a plurality of shift select decoders for decoding the shift control signals to control the search and correction circuits.

In some embodiments, the plurality of shift select decoders are equal in number to the plurality of search and correction circuits.

In some embodiments, the codeword boundaries at locations outside the clock boundaries are fewer in number than the plurality of search and correction circuits, the plurality of shift select decoders are fewer in number than the plurality of search and correction circuits, and at least one shift select decoder of the plurality of shift select decoders is shared by more than one shift select circuit.

In some embodiments, each of the shift select decoders is shared by two of the shift select circuits.

According to one aspect, a method of decoding an input channel having a first data rate, and a codeword on the input channel comprising a plurality of symbols, the method comprising both the option of providing a first output channel having the first data rate and the option of providing a plurality of second output channels having data rates less than the first data rate is presented. The decoding method further comprises calculating a syndrome, calculating a polynomial, and performing search and correction operations, whereby calculating the syndrome comprises performing a plurality of finite field multiplication operations corresponding in number to the plurality of symbols, each multiplication operation comprising multiplying one of the symbols by a power of a root of the finite field, each multiplication operation other than the first multiplication operation multiplying a respective one of the plurality of symbols by a power higher than the adjacent multiplication operation. Further, computing the syndrome includes grouping the multiplication operations into multiplication sets and performing a first stage addition operation to add together the results of the multiplications in each multiplication set. Calculating the syndrome further includes performing a second-stage addition operation to add together the results of the first-stage addition operation, accumulating the output of the second-stage addition operation as the syndrome for the first output channel, accumulating the output of the first-stage addition operation among a plurality of additional accumulation operations equal in number to the multiplication set, and scaling the outputs of all the additional accumulation operations except one additional accumulation operation. The scaled outputs of all but one of the additional accumulation operations, along with the output of one of the additional accumulation operations, are the respective syndromes of the respective second output channels of the second output channel.

In some embodiments, each respective finite field multiplication operation of the plurality of finite field multiplication operations multiplies a respective one of the symbols by the root of the product of the exponent and the exponent of the syndrome raised to that associated with the one of the symbols.

In some embodiments, scaling comprises extracting a minimum output of one of the plurality of finite field multiplication operations that contributes to the respective one of the additional accumulation operations prior to the respective one of the additional accumulation operations.

In some embodiments, the accumulation operation applies a first scaling factor equal to the root of the product of the total number of symbols and the exponent of the syndrome.

In some embodiments, each of the additional accumulation operations applies a second scaling factor equal to the first scaling factor divided by the number of multiplication groups.

In some embodiments, the plurality of multiplication operations are equally divided into sets of multiplication operations.

In some embodiments, the plurality of multiplication operations are divided unequally into sets of multiplication operations.

In some embodiments, performing the search and correction operations comprises performing a plurality of search and correction operations equal in number to the plurality of second channels, and the decoding method further comprises mapping the output of the polynomial computation operation to the search and correction operations, the mapping comprising a shift to compensate for codeword boundaries that vary in position about clock cycle boundaries.

In some embodiments, the shifting includes generating a shift control signal from a modulo code word length counter operation and decoding the shift control signal to control the search and correction operations.

It will be understood that the foregoing is only illustrative of the principles of the invention and that various modifications can be made by those skilled in the art without departing from the scope and spirit of the invention. For example, the various elements of the invention may be provided on a PLD in any desired number and/or arrangement. Those skilled in the art will appreciate that the present invention can be practiced by other than the described embodiments, which are presented for purposes of illustration rather than of limitation, and the present invention is limited only by the claims that follow.

Claims

1. A decoder circuit apparatus for an input channel having a first data rate, a codeword on the input channel comprising a plurality of symbols, wherein:

the input channel includes both an option to provide a first output channel having the first data rate and an option to provide a plurality of second output channels having data rates less than the first data rate;

the decoder circuit means includes syndrome computation circuit means, polynomial computation circuit means, and search and correction circuit means; and

the syndrome computation circuitry includes:

a plurality of finite field multipliers corresponding in number to the plurality of symbols for multiplying the symbols by powers of a root of the finite field, each respective multiplier of the plurality of finite field multipliers other than a first multiplier multiplying a respective symbol of the plurality of symbols by a higher power of the root than an adjacent multiplier of the plurality of finite field multipliers,

first stage adder circuit means for adding the outputs of a number of multiplier groups of said plurality of finite field multipliers,

a second stage adder for adding outputs of said first stage adder circuit means,

a first accumulator for accumulating an output of said second stage adder as a syndrome for said first output channel,

a plurality of second accumulators equal in number to said multiplier banks for accumulating outputs of said first stage adder circuit means, an

A respective scaling multiplier for all but one of said second accumulators, whereby the output of each of said second accumulators is a syndrome of one of said second output channels.

2. The decoder circuitry of claim 1, comprising a respective instance of the syndrome computation circuitry for each syndrome of the codeword; wherein:

each respective one of the plurality of finite field multipliers multiplies a respective one of the symbols by the root raised to the product of the exponent associated with the one of the symbols and the exponent of the syndrome.

3. The decoder circuit arrangement of claim 2, wherein the respective scaling multiplier decimates a minimum output of one of the plurality of finite field multipliers contributing to a respective one of the second accumulators prior to the respective one of the second accumulators.

4. The decoder circuit means of claim 1, wherein the first accumulator applies a first scaling factor equal to the root raised to the product of the total number of symbols and the exponent of the syndrome.

5. The decoder circuit device of claim 4, wherein each of the second accumulators applies a second scaling factor equal to the first scaling factor divided by the number of multiplier banks.

6. The decoder circuit arrangement of claim 1, wherein the plurality of finite field multipliers are equally divided into the set of multipliers.

7. The decoder circuit arrangement of claim 1, wherein the plurality of finite field multipliers are unequally divided into the multiplier banks.

8. The decoder circuitry of claim 1 wherein the search and correction circuitry comprises a plurality of search and correction circuits equal in number to the plurality of second channels; the decoder circuit arrangement further comprises:

mapping circuitry for conducting the output of said polynomial computation circuitry to said search and correction circuitry, said mapping circuitry including shifting circuitry to compensate for codeword boundaries that vary in position about clock cycle boundaries.

9. The decoder circuit device of claim 8, wherein the shift circuit device comprises:

a respective shift selection circuit for each of the search and correction circuits;

a modulo word length counter for generating a shift control signal for the corresponding shift selection circuit; and

a plurality of shift select decoders for decoding the shift control signals to control the search and correction circuits.

10. The decoder circuit device of claim 9, wherein the plurality of shift select decoders are equal in number to the plurality of search and correction circuits.

11. The decoder circuit arrangement of claim 9, wherein codeword boundaries at locations outside of clock boundaries are fewer in number than the plurality of search and correction circuits, the plurality of shift select decoders are fewer in number than the plurality of search and correction circuits, at least one shift select decoder of the plurality of shift select decoders is shared by more than one shift select circuit.

12. The decoder circuit arrangement of claim 11, wherein each of the shift select decoders is shared by two of the shift select circuits.

13. A method of decoding an input channel having a first data rate, a codeword on the input channel comprising a plurality of symbols, wherein:

the method includes both the option of providing a first output channel having the first data rate and the option of providing a plurality of second output channels having data rates less than the first data rate;

the method includes calculating a syndrome, calculating a polynomial, and performing search and correction operations;

the calculating the syndrome comprises:

performing a plurality of finite field multiplication operations corresponding in number to the plurality of symbols, each multiplication operation comprising multiplying one of the symbols by a power of a root of the finite field, each multiplication operation other than a first multiplication operation multiplying a respective symbol of the plurality of symbols by a higher power of the root than an adjacent multiplication operation,

grouping the plurality of finite field multiplication operations into multiplication sets and performing a first stage addition operation to add together the results of the multiplications in each multiplication set,

performing a second stage addition operation to add together the results of the first stage addition operation,

accumulating an output of the second stage of addition as a syndrome of the first output channel in an accumulation operation,

accumulating the output of said first stage addition operation in a plurality of additional accumulation operations equal in number to said set of multiplications, and

scaling the output of all but one additional accumulation operation; and is

The scaled outputs of all but one of the additional accumulation operations, along with the output of the one of the additional accumulation operations, are the respective syndromes of the respective second output channels of the second output channel.

14. The method of claim 13, wherein:

each respective finite field multiplication operation of the plurality of finite field multiplication operations multiplies a respective one of the symbols by the root of the product of the exponent associated with the one of the symbols and the exponent of the syndrome.

15. The method of claim 14, wherein the scaling comprises: extracting a minimum output of one of the plurality of finite field multiplication operations that contributes to a respective one of the additional accumulation operations prior to the respective one of the additional accumulation operations.

16. The method of claim 13, wherein the accumulation operation applies a first scaling factor equal to the root raised to the product of the total number of symbols and the exponent of the syndrome.

17. The method of claim 16, wherein each of the additional accumulation operations applies a second scaling factor equal to the first scaling factor divided by the number of multiplication sets.

18. The method of claim 13, wherein the plurality of finite field multiplication operations are equally divided into the multiplication groups.

19. The method of claim 13, wherein the plurality of finite field multiplication operations are divided unequally into the multiplication sets.

20. The method of claim 13, wherein said performing search and correction operations comprises performing a plurality of search and correction operations equal in number to a plurality of second channels; the method further comprises the following steps:

mapping the output of the polynomial computation operation to the search and correction operation, the mapping including a shift to compensate for codeword boundaries that vary in position with respect to clock cycle boundaries.

21. The method of claim 20, wherein the shifting comprises:

generating a shift control signal from a modulo word length counter operation; and

decoding the shift control signal to control the search and correction operations.