WO2023129261A1 - Utilisation et/ou réduction de chaînes de report sur un matériel programmable - Google Patents

Utilisation et/ou réduction de chaînes de report sur un matériel programmable Download PDF

Info

Publication number
WO2023129261A1
WO2023129261A1 PCT/US2022/047954 US2022047954W WO2023129261A1 WO 2023129261 A1 WO2023129261 A1 WO 2023129261A1 US 2022047954 W US2022047954 W US 2022047954W WO 2023129261 A1 WO2023129261 A1 WO 2023129261A1
Authority
WO
WIPO (PCT)
Prior art keywords
logic
bit
input
vector
adder
Prior art date
Application number
PCT/US2022/047954
Other languages
English (en)
Inventor
Skand Hurkat
Aaron Michael Landy
Original Assignee
Microsoft Technology Licensing, Llc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/740,831 external-priority patent/US20230214180A1/en
Application filed by Microsoft Technology Licensing, Llc. filed Critical Microsoft Technology Licensing, Llc.
Publication of WO2023129261A1 publication Critical patent/WO2023129261A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting

Definitions

  • FPGAs field programmable gate arrays
  • FIG. 1A illustrates an example environment including an example carry chain logic system in accordance with one or more embodiments.
  • FIG. IB illustrates an example logic chain implemented within the carry chain logic system in accordance with one or more embodiments.
  • FIG. 2 illustrates an example implementation of a logic unit implementable within a logic chain in accordance with one or more embodiments.
  • FIG. 3-1 illustrates a bit vector and an input vector in accordance with one or more embodiments.
  • FIG. 3-2 illustrates a generic carry chain in accordance with one or more embodiments.
  • FIG. 3-3 illustrates example implementation of an OR reduce logic chain.
  • FIG. 3-4 illustrates an example implementation of an AND reduce logic chain.
  • FIG 4-1 through FIG. 4-3 illustrates other example implementations of logic chains, including an OR reduce logic chain and an AND reduce logic chain in accordance with one or more embodiments.
  • FIG. 5 illustrates another example implementation of an OR reduce logic chain and an AND reduce logic chain in accordance with one or more embodiments.
  • FIG. 6 illustrates an example configuration of logic functions implemented in combination with AND/OR reduce chain(s) in accordance with one or more embodiments.
  • FIG. 7 illustrates a flowchart of a method implemented on programmable hardware in accordance with one or more embodiments.
  • the present disclosure relates to features and functionality of a carry chain logic system that leverages carry in and carry out signals from logic blocks to implement AND/OR logic functions on programmable hardware (e.g., FPGA hardware).
  • programmable hardware e.g., FPGA hardware
  • embodiments of the carry chain logic system described herein enables implementation of large (e.g., high number of inputs) AND/OR logic gates within the framework of additional logic functions without incurring a significant delay as a result of routing inputs between multiple series of logic modules (e.g., multiple logic levels).
  • the carry chain logic system may involve a method or series of acts being implemented on a carry chain of logic modules that are implemented on programmable hardware.
  • the carry chain logic system may receive an input vector including a plurality of input bits.
  • the carry chain logic system may further receive a bit vector, separate from the input vector, including a plurality of vector bits.
  • the carry chain logic system may cause an input bit to be an input to a first adder in the carry chain of logic modules.
  • the bit vector includes a primer bit (e.g., a least significant bit (LSB) of the bit vector) that is used as a first or primer input to the first adder to begin the logic chain.
  • LSB least significant bit
  • the carry chain logic system may then provide each vector bit and an associated bit from the plurality of inputs as inputs to additional adders in the carry chain.
  • carry out signals from the adders can be provided as carry in signals to subsequent adders in the carry chain to generate an output based on a carry out signal from a last adder in the carry chain.
  • the carry chain logic system may include a carry chain logic function being implementable on programmable hardware.
  • the carry chain logic system may include a first logic module having a first adder and a second adder, a second logic module having a third adder and a fourth adder, and any number of additional logic modules having additional adder components.
  • the first logic module may be configured to receive, at a first adder, a first input and a primer bit of a bit vector to generate a carry out signal that is provided to the second adder.
  • the first logic module may be further configured to receive, at a second adder, the carry out signal from the first adder in combination with a second input and an associated bit value from the bit vector to generate another carry out signal.
  • the carry out signal(s) may be fed to additional adders on other logic modules in conjunction with addition inputs and additional bit values of bit vectors to implement a logic function in accordance with one or more embodiments described herein.
  • the present disclosure includes a number of practical applications that provide benefits and/or solve problems associated with configuring logic functions on programmable hardware. Examples of some of these benefits are discussed in further detail below.
  • features of the carry chain logic system enable large input logic gates without incurring propagation delays as a result of multiple logic levels being coupled together in series on programmable hardware.
  • conventional logic module configurations often involve logic modules that receive inputs and feed outputs from different logic levels in a typical hardware configuration. These additional logic levels result in longer propagation delays that require additional pipelining or a lower clock frequency.
  • the carry chain logic system additionally provides features that enable logic modules to be combined on the same logic level to create AND/OR logic of a higher number of inputs than any individual logic module(s).
  • a logic module may include hardware that limits a specific logic module to receive and process six inputs.
  • the carry chain logic system utilizes carry in and carry out signals to increase the number of inputs that can be considered by a given logic gate without routing input and output signals between logic levels.
  • a lookup table (LUT) in a conventional logic module may only support a fixed number of inputs (e.g., six inputs)
  • any operation such as an AND/OR reduce over more than six inputs may require multiple logic levels if implemented in the LUTs alone.
  • implementing an operation that requires more than the fixed number of inputs that an LUT is preconfigured to receive may involve routing inputs between logic levels and incurring delays caused as a result of routing inputs via routing fabric of the programmable logic.
  • the carry chain logic system improves upon this limitation of conventional systems by utilizing the carry in and carry out signals as inputs to additional logic functions (e.g., on a same logic level as the logic modules associated with the respective carry in and carry out functions). Indeed, as will be discussed in further detail herein, the carry chain logic system may decrease a reliance on input and output signals being routed via routing fabric of the programmable logic hardware. As routing inputs between logic levels can take 100s of picoseconds or even nanoseconds, this delay can cause logic functions to fail or require complicated pipeline configurations and/or lower clock cycles.
  • the carry chain logic system may treat an input to be reduced as a bit vector, which can be implemented within an OR reduce function by adding the input bit vector to a bit vector of all Is.
  • a 1 may not be seen as the carry out signal unless all bits on the input are 1, which may provide an AND reduce operation.
  • carry chains produce faster signals than LUTs, this enables performance of reductions on a larger number of inputs using carry chains rather than LUTs. This example and additional variations will be discussed below in connection with FIGS. 3-5.
  • a “logic module” may refer to any discrete component of hardware capable of receiving a plurality of inputs and producing an output based on a logic function implemented thereon.
  • a logic module may include components such as LUTs, adders, registers, multiplexors, and routing components.
  • a logic module refers that allows configuration of an N-input (e.g., 4 input, 6 input, 8 input) logic function from any of a number of manufacturers that can be implemented in combination with additional logic modules on a programmable hardware device.
  • a programmable hardware device e.g., an FPGA device
  • a “logic level” may refer to a grouping of one or more logic modules separated by another grouping of one or more logic modules via a routing fabric between the modules.
  • a logic level may refer to a carry chain of logic modules having carry signals fed as carry signal inputs between the logic modules.
  • Logic modules of the logic level may provide output signals to additional logic layers directly or via registers capable of capturing data from one clock cycle to the next.
  • a logic chain of logic modules may refer to logic modules within a common logic level.
  • a series of logic modules or logic module series may refer to logic modules that feed signals to one another across different logic levels (e.g., via a routing fabric).
  • a bit vector may be a pre-determined vector or set of bit values that may be used as an input to one or more logic modules.
  • the bit vector is composed of multiple vector bits.
  • the bit values of the vector bits may be based on the type of logic function the carry chain logic system is performing.
  • a carry chain logic system that is configured to perform an OR reduction may include a bit vector having vector bits with bit values of one.
  • a carry chain logic system that is configured to perform an AND reduction may include a bit vector having vector bits with bit values of zero, and a first vector bit with a bit value of one.
  • a first vector bit is referred to as a primer bit.
  • a primer bit may be an initial bit that may be added to the first logic module of the carry chain logic system.
  • the primer bit may have a bit value that is based on the configuration of the carry chain logic system. For example, for an AND or an OR reduction, the primer bit may have a bit value of one to allow the first logic module to perform the AND or the OR reduction.
  • the primer bit may be included as a vector bit in the bit vector.
  • an input vector may include a plurality of input bits for the carry chain logic system. Each logic module may receive an input bit from the input vector. As discussed in further detail herein, the input bits of the bit vector may be received from other functions in the programmable hardware. For example, the input bits may be outputs from a logic function on a different logic level than the carry chain logic system. In some examples, the input bits may be outputs from other logic functions, combinations, subtractions, or any other function performed on the programmable hardware. In some embodiments, the input vector has a quantity of input bits. The number of logic modules in the carry chain logic system may be equal to the quantity of input bits.
  • FIG. 1 A illustrates an example environment showing a programmable hardware device(s) 100 having a carry chain logic system 102 implemented thereon.
  • the carry chain logic system 102 may include multiple logic chains 104, which may individually include any number of logic modules 106 chained together in accordance with one or more examples discussed herein.
  • the programmable hardware device refers to an FPGA device having any number of logic modules 106 implemented thereon.
  • FIG. IB illustrates an example implementation of a logic chain 104 on the carry chain logic system 102 shown in FIG. IB.
  • the logic chain 104 may refer to any of the logic chains 104 on the carry chain logic system 102 implemented on the programmable hardware device(s) 100 shown in FIG. 1A.
  • the example logic chain 104 may include any number of logic modules (collectively 106) configured to receive an input (collectively 110) (e.g., multiple input signals) and generate an output (collectively 112) (e.g., one or multiple outputs).
  • the logic modules 106 may be configured to receive and provide a carry signal (collectively 108) between the respective logic modules 106.
  • a first logic module 106-1 may receive a first input 110- 1, a first carry signal 108-1, and produce a first output 112-1 and a second carry signal 108-2.
  • the first input(s) 110-1 and/or the first carry signal 108-1 may be received from any source, such as an input vector or another logic module 106 (e.g., from a different logic stage).
  • the first output 112-1 may also be provided to another logic module.
  • the logic modules 106 may additionally receive as input a bit vector with a first bit referring toa primer bit and additional bits that cause the logic modules 106 to simulate a particular logic gate.
  • the logic modules 106 may receive and provide carry signals 108 between logic modules 106 of the logic chain 104.
  • each logic module 106 may receive a carry in signal and provide a carry out signal.
  • a carry signal 1068 may refer to both a carry out signal for one logic module 106 and a carry in signal for another logic module 106.
  • the second carry signal 108-2 may be a carry out signal for the first logic module 106-1 and a carry in signal for the second logic module 106-2.
  • the carry in signal is provided as an input for a logic module 106 within the same or different logic chain 104.
  • each of the carry signals 108 may feed as a carry input to another logic module 106, one or more embodiments may involve the carry signal being used as an input to another logic function.
  • the carry signal 108 may serve as an AND/OR reduce function output 114 to expand a number of inputs of a logic gate implemented by the logic chain.
  • the carry signal 108 may be fed as an input to another logic function altogether. Additional examples will be discussed in connection with FIGS. 3-5.
  • the logic chain 104 may be of any length, with the carry signal 108 being propagated through any number of logic modules 106.
  • the logic chain 104 shown in FIG. IB has a first logic module 106-1, receiving a first carry signal 108-1 and a first input 110- 1.
  • the first input 110-1 may include a first vector bit from a bit vector and a first input bit from an input vector.
  • the first logic module 106-1 may receive three inputs.
  • the first logic module 106-1 may generate a first output 112-1 and a second carry signal 108-2 (e.g., the carry out signal for the first logic module 106-1).
  • the second carry signal 108-2 may be the LSB of the results of the first logic module 106-1 and the first output 112-1 may be the MSB of the results.
  • the second logic module 106-2 may receive as inputs the second carry signal 108-2 (e.g., the carry in signal) and a second input 110-2.
  • the second input 110-2 may include a second vector bit from the bit vector and a second input from the input vector.
  • the second logic module 106- 2 may receive three inputs.
  • the second logic module 106-2 may generate a second output 112-2 and a carry signal 108.
  • the logic chain 104 may be continued indefinitely, producing n number of carry signals 108-n. N number of logic modules 106-n may receive the n-carry signals 108-n, n-inputs 110-n, and n-outputs 112-n. This may allow for long logic chains 104 to process large amounts of inputs on the same logic level with reduced processing delays.
  • FIG. 2 illustrates a more detailed implementation of an example logic module in accordance with one or more embodiments.
  • the logic module may include a variety of components being configurable to provide a logic gate for a predetermined number of inputs.
  • the logic module can be configured to process up to eight inputs.
  • a logic module 206 may include a look up table (LUT) being programmable to provide any of a variety of Boolean functions for N-number of inputs up to the limit of the logic module.
  • the logic module 206 may include two logic portions (collectively 218), including a first logic portion 218-1 and a second logic portion 218-2. Each logic portion 218 may utilize the LUT 216 to process a four-input LUT table function. In this manner, two four- input LUT functions may process the respective groupings of inputs and generate two outputs.
  • the outputs may be fed from the LUT 216 to two separate logic modules.
  • the logic modules are adders (collectively 220) which may be used to add the two bits.
  • the adders 220 may be one-bit adders, and configured to process three one-bit inputs, including a carry in bit 222, an input bit 224, and a vector bit 226.
  • the adders 220 may process the carry in bit 222, the input bit 224, and the vector bit 226 to produce an output (collectively 212) and a carry out bit (collectively 228).
  • the output 212 may be passed to a register 230, where it may be routed to a different logic level or other portion of the programmable hardware device.
  • the carry out bit 228 may be used as a carry in bit 222 for another adder in the logic chain.
  • the logic module 206 may receive six LUT inputs 215 at the LUT 216.
  • the LUT 216 may output a first input 224-1 to the first adder 220- 1 and provide the first adder 220-1 with a first vector bit 226-1.
  • the first adder 220-1 may receive a first carry in bit 222-1.
  • the first adder 220-1 may process the first carry in bit 222-1, the first input 224-1, and the first vector bit 226-1 and generate a first output 212-1 and a first carry out bit 228-1.
  • the LUT 216 may generate a second input 224-2.
  • the LUT 216 may provide the second adder 220-2 with the second input 224-2 and a second vector bit 226-2.
  • the second adder 220-2 may receive the first carry out bit 228-1 as a second carry in bit 222-2.
  • the second adder 220-2 may process the second carry in bit 222-2, the second input 224-2, and the second vector bit 226-2 to generate a second output 212-2 and a second carry out bit 228-2.
  • the second carry out bit 228-2 may be used at another logic module 206 on the same logic level as a carry in bit 222.
  • the logic module 206 may be configured as an AND gate or and OR gate and be configured to add inputs from the respective logic portions of the logic module.
  • the logic module 206 may include a plurality of registers 230. As mentioned above, the registers 230 can capture data from clock cycle to the next clock cycle. In one or more embodiments, the logic module 206 may be configured to use the registers 230 to achieve a higher frequency than would be possible bypassing the adders 220 and/or registers 230.
  • the logic module 206 may be combined with one or more additional logic modules 206 to create a logic gate having a greater number of inputs than are available for a single logic module 206.
  • the logic module 206 may be combined with a plurality of additional logic modules 206 to provide a large AND gate having a significantly higher number of inputs that would be available in conventional configurations (e.g., thirty, fifty, or hundreds of inputs).
  • the logic module 206 may implement the larger logic gate by implementing a logic chain having a plurality of logic modules 206 that are configured to receive a bit vector, an input vector, and carry signal(s).
  • the bit vector may include a vector of Is and/or 0s in addition to a primer bit fed to a first logic module 206 of plurality of logic modules 206 in the logic chain.
  • the bit vector may be fed as inputs to the logic modules 206 of the logic chain together with an input vector of input bits for which the logic gate is to be applied.
  • the logic modules 206 may use the carry inputs fed to the adders to create a ripple adder and produce two bit values having a most significant bit (MSB) and a least significant bit (LSB) based on a combination of a relevant bit vector value, input value, and carry signal provided to the adder(s).
  • MSB most significant bit
  • LSB least significant bit
  • the MSB becomes the carry out bit for an adder or logic module while the LSB becomes the output for the respective adder or logic module.
  • MSB and/or LSB may be used differently depending on the specific logic gate the logic modules are programmed to provide.
  • specific values of the bit vector, including the set of 0s and/or Is as well as the primer bit may be determined to be specific values based on the logic gate that the logic modules have been programmed to provide.
  • FIG. 3 illustrates a first example OR-reduce logic chain and a first example AND- reduce logic chain.
  • FIG. 3-1 is a representation of a bit vector 332, including a plurality of vector bits 326 and an input vector 334 including a plurality of input bits 324.
  • the bit vector 332 includes four vector bits 326, including bl, b2, b3, and b4, with bl being referred to herein as a primer bit (Pr).
  • the primer bit Pr may be the initial bit used in the first logic module of the logic chain.
  • the input vector 334 includes four input bits, including al, a2, a3, and a4.
  • FIG. 3-2 is a representation of a generic logic chain 304 including four adders (collectively 320), according to at least one embodiment of the present disclosure.
  • a first adder 320-1 receives the primer bit Pr and a first input bit al from the input vector 334.
  • the first adder outputs a first carry signal Cl.
  • a second adder 320-2 receives as inputs the first carry signal Cl, a second vector bit bl from the bit vector 332, and a second input bit al from the input vector 334.
  • the second adder 320-2 may generate a second carry signal C2.
  • the first adder 320-1 and the second adder 320-2 may be part of a first logic module.
  • a third adder 320-3 may receive as inputs the second carry signal C2, a third vector bit b3 of the bit vector 332, and a third input bit a3 from the input vector 334.
  • the third adder 320-3 may generate a third carry signal C3.
  • a fourth adder 320-4 may receive as inputs the third carry signal C3, a fourth vector bit b4 of the bit vector 332, and a fourth input bit of the input vector 334.
  • the fourth adder 320-4 may generate a fourth carry signal C4.
  • the third adder 320-3 and the fourth adder 320-4 may be part of a second logic module.
  • the implementation shown in FIG. 3-2 may represent two logic modules, each including two adders. Other implementations may include different groupings of adders and modules depending on unique hardware and specifications of the logic modules.
  • a reduction output 336 may be provided as an output of the logic chain 304.
  • a reduction output 336 may receive the fourth carry signal C4 and perform a reduction function on the fourth carry signal C4 and/or any other outputs of the logic chain 304. As discussed herein, because each of the adders 320 are on the same logic level, the reduction output 336 may be produced on the fourth carry signal C4 with fewer processing delays.
  • the logic chain 304 may include more or fewer than four adders 320.
  • an OR-reduce logic chain 304 includes four adders 320 that may be implemented within logic modules in accordance with one or more embodiments described herein.
  • a first adder 320-1 and/or second adder 320-2 may be implemented within a similar logic module as discussed above in connection with FIG. 2.
  • a third adder 320-3 and/or fourth adder 320-4 may be implemented within another logic module having similar features and functionality as discussed above in connection with FIG. 2.
  • an input vector 334 having the values al, a2, a3, and a4 may be provided as input to the logic modules.
  • al and a2 may be provided as inputs to a first logic module while a3 and a4 may be provided as inputs to a second logic module.
  • a predefined bit vector 332 may be provided as inputs to the logic module.
  • the individual bits of the bit vector 332 may be provided to each of the adders in conjunction with a corresponding bit value of the input vector 334.
  • a first bit of the bit vector 332 may be provided as an input to a first adder 320-1 (e.g., on a first logic module) in conjunction with al while a second bit of the bit vector 332 may be provided as an input to a second adder 320-2 (e.g., on the first logic module) in conjunction with a2.
  • a third bit of the bit vector 332 may be provided as an input to a third adder 320-3 (e.g., on a second logic module) in conjunction with a3 while a fourth bit of the bit vector 332 may be provided as an input to a fourth adder 320-4 (e.g., on the second logic module) in conjunction with a4.
  • Other configurations may include any number of input bits and corresponding bit vector bits provided to additional adders on additional logic modules.
  • the specific values of the bit vectors 332 may be determined based on the specific logic gate to be implemented by the logic modules of the logic chain 304.
  • the bit vector 332 may include a first primer input (e.g., a “1” bit value) provided to the first adder in combination with additional “1” inputs provided to the additional adders of the OR-reduce logic chain.
  • the adders 320 will generate two bit outputs. If either of the input bits (e.g., the input bit and corresponding vector bit) are true, then the adder 320 would propagate the input carry. If both are true, then the adder 320 generates the output carry. In one or more embodiments, this configuration includes a ripple carry adder. In one or more embodiments, the LSBs can be ignored and the MSB are provided as carry in signals to a next adder in the OR- reduce logic chain. In one or more embodiments, the LSBs are provided as outputs to one or more additional logic functions on the programmable hardware device(s).
  • FIG. 3-4 further shows an example AND-reduce logic chain 304.
  • the AND-reduce logic chain 304 includes four adders 320 similar to the four adders 320 described above in connection with the OR-reduce logic chain. These adders 320 may be implemented within logic modules similar to the OR-reduce logic chain.
  • the logic modules may receive an input vector 334 in combination with a bit vector 332 having specific values based on an intent to implement an AND gate functionality (as opposed to the OR gate).
  • the bit vector 332 may include a vector of all “0” values with a first vector bit of “1”.
  • the AND-reduce framework may generate the output of an AND gate for any number of inputs using a plurality of logic modules on the same logic chain of logic modules (e.g., on a common logic lane).
  • FIG. 4-1 illustrates an example implementation showing a variation of the example discussed above in connection with FIG. 3, according to at least one embodiment of the present disclosure.
  • the example function described in FIG. 4-1 illustrates similar principles discussed in connection with similar components shown in FIG. 3.
  • FIG. 4-1 shows a generic logic chain 404 having a plurality of adders (collectively 420).
  • Each adder 420 may receive a plurality of inputs, including a carry in signal, an input bit al, a2, a3, a4, and a vector bit bl, b2, b3, b4, from a bit vector.
  • the input bit may include the output of an LUT reduce function 438.
  • the LUT reduce function 438 may receive multiple bits and reduce the multiple inputs to a single input for the adder 420.
  • the LUT reduce function 438 may reduce six bits from a bit vector at a time to generate the various inputs al, a2, a3, a4.
  • the LUT reduction function 428 may perform an initial reduction of the input bits, thereby reducing the total number of logic modules that may be used when performing a given operation.
  • the output of the adders 420 may then be used in a reduce output 436.
  • FIG. 4-2 illustrates an example OR-reduce logic function in a logic chain 404.
  • the OR-reduce logic chain 404 may include a chain of adders 420 implemented within logic modules, which may include similar features as the logic module discussed above in connection with FIG. 2.
  • each of the adders 420 may receive an input bit al, a2, a3, a4, and a bit vector bit as well as a carry in value cl, c2, c3, c4, from another adder component.
  • the input bits may refer to outputs from LUTs. More specifically, in one or more embodiments, LUTs may be used to reduce six bits of a bit vector at a time and use adder chains to reduce the results. Thus, rather than having a single input bit for each bit of the bit vector, the OR-reduce logic chain may make use of LUTs to reduce up to six bits (or other predetermined number of inputs based on capabilities of the logic modules) for each bit from the bit vector. This can significantly reduce the number of logic modules in a given carry chain when configuring a logic gate having a large number of inputs.
  • a first LUT OR reduce may reduce inputs Al through A6 to generate the first input bit al.
  • a first adder 420-1 may receive the first input bit al and a vector bit (e.g., a primer bit) having a value of 1.
  • the output of the first adder 420-1 may include a first carry bit CL
  • a second LUT OR reduce may reduce inputs A7 through Al 2 to generate the second input bit a2.
  • a second adder 420-2 may receive the second input bit a2, a bit vector having a value of 1, and the first carry bit CL The output of the second adder 420-2 may include a second carry bit C2.
  • a third LUT OR reduce may reduce inputs Al 3 through Al 8 to generate the third input bit a3.
  • a third adder 420-3 may receive the third input bit a3, a bit vector having a value of 1, and the second carry bit C2. The output of the third adder 420-3 may include a third carry bit C3.
  • a fourth LUT OR reduce may reduce inputs Al 9 through A24 to generate the fourth input bit a4.
  • a fourth adder 420-4 may receive the third input bit a4, a bit vector having a value of 1, and the third carry bit C3. The output of the fourth adder 420-4 may include a fourth carry bit C4.
  • An OR reduce may receive the fourth carry bit C4 and perform an OR reduce.
  • the carry chain 404 may perform an OR reduce on a 24 bit input vector in a single logic level, thereby reducing the total number of logic modules used to perform the OR reduce and increasing the speed of the OR reduce.
  • FIG. 4-3 is a representation of an example AND-reduce carry chain 404 logic function in accordance with one or more embodiments described herein. Similar features to the OR-reduce logic chain 404 may be applied to the AND-reduce logic chain 404.
  • LUTs may be used to reduce bits of a bit vector having values that are specific to the AND gate logic.
  • the four adders 420 may be used to process an AND-reduce gate for four LUTs that each receive up to six inputs (or other number of inputs based on the capabilities of the logic modules).
  • a first LUT AND reduce may reduce inputs Al through A6 to generate the first input bit al.
  • a first adder 420-1 may receive the first input bit al and a bit vector having a value of 1 (e.g., the primer bit).
  • the output of the first adder 420-1 may include a first carry bit CL
  • a second LUT AND reduce may reduce inputs A7 through Al 2 to generate the second input bit a2.
  • a second adder 420-2 may receive the second input bit a2, a bit vector having a value of 0, and the first carry bit CL The output of the second adder 420-2 may include a second carry bit C2.
  • a third LUT AND reduce may reduce inputs Al 3 through Al 8 to generate the third input bit a3.
  • a third adder 420-3 may receive the third input bit a3, a bit vector having a value of 0, and the second carry bit C2. The output of the third adder 420-3 may include a third carry bit C3.
  • a fourth LUT AND reduce may reduce inputs Al 9 through A24 to generate the fourth input bit a4.
  • a fourth adder 420-4 may receive the third input bit a4, a bit vector having a value of 0, and the third carry bit C3. The output of the fourth adder 420-4 may include a fourth carry bit C4.
  • An OR reduce may receive the fourth carry bit C4 and perform an AND reduce.
  • the carry chain 404 may perform an AND reduce on a 24 bit input vector in a single logic level, thereby reducing the total number of logic modules used to perform the AND reduce and increasing the speed of the AND reduce.
  • FIG. 5 is a representation of a carry chain 504 logic system, according to at least one embodiment of the present disclosure.
  • FIG. 5 illustrates an example configuration of a logic chain 504 in which the input vector includes inputs originating from any LUT combinational logic.
  • the logic chain 504 may utilize the combinational logic of the LUTs and feed the output of the LUTs to the logic chain 504. This enables the logic chain 504 to include the combinational logic and the AND/OR reduce into a single logic level instead of across multiple logic levels.
  • An example implementation of this configuration could include a counter configured to increment or decrement based on some detected condition.
  • a registered function could be configured and driven as an input to an adder (e.g., on the chain of adders). Where this would conventionally be performed by providing an output from a logic module of another logic layer, this framework enables feeding the carry signal as an input to the logic chain within the same logic layer and significantly reduce latency that would otherwise be caused by routing the signal via a routing fabric between logic layers.
  • this combination of any combinational logic with a framework of the reduce logic function may apply to both the AND-reduce and the OR-reduce configurations.
  • FIG. 6 illustrates another example logic chain 604 framework in accordance with one or more embodiments described herein.
  • FIG. 6 illustrates an example in which an AND/OR reduce is fed into an adder 520 (e.g., a counter).
  • the carry out signal may be fed from an adder 520 as the carry in value for an adder 520 of a counter.
  • a first logic function e.g., a subtractor function
  • a carry out indicating the result of a compare operation This may be provided in conjunction with a different set of combinational signals that are used to provide a carry-in value to another counter.
  • this framework may be utilized to implement an entire logic of multiple logic functions using the carry chain configuration described herein.
  • T and R signals may refer to two inputs that refer to pointers of a set of values (e.g., a queue of inputs having a particular order, such as inputs issued by a processor).
  • the carry out signal may refer to a carry out of the most significant stage of the subtraction and may be combined with another signal to generate a logic function. This logic function may be fed as an input to another function.
  • the carry out signal may be used to drive additional logic of a common logic level (e.g., without routing signals via routing fabric between logic lanes).
  • the illustrated example shows a first logic function (e.g., a subtraction function).
  • the result of the logic function is fed to an abort stage (e.g., a second logic function), which is fed to an adder function (e.g., a third logic function).
  • an abort stage e.g., a second logic function
  • an adder function e.g., a third logic function.
  • This implementation enables a logic chain to be implemented as a more complex function that would normally involve a multi-plexor (MUX) or other function that involves multiple logic stages that has the potential of causing an unacceptable amount of delay.
  • MUX multi-plexor
  • the configurations described herein enables a logic chain to consider multiple conditions within a single logic stage.
  • Example implementations may include incrementing a queue, taking a comparison and using it as a logic function without a routing penalty, updating a read pointer, an abort function, or any other combinational logic.
  • FIG. 7 is a flowchart of a method 740 implemented on programmable hardware, according to at least one embodiment of the present disclosure.
  • the method 740 includes receiving an input vector comprising a plurality of input bits at a logic module at 742.
  • the logic module receives a bit vector at 744.
  • the bit vector includes a primer bit and a plurality of vector bits.
  • the logic module provides the primer bit, a first vector bit of the plurality of vector bits, and a first input bit of the plurality of input bits to a first adder in the carry chain at 746.
  • the first adder generates a first carry out bit based on the primer bit, the first vector bit, and the first input bit at 748.
  • the logic module provides each vector bit from the plurality of vector bits and an associated input bit from the plurality of input bits as inputs to additional adders in the carry chain at 750.
  • the programmable hardware provides a carry out bit from each adder to a next adder in the carry chain to generate an output based on a last carry out bit from a last adder in the carry chain at 752.
  • the input vector includes input values based on outputs of combinational logic from additional logic modules implemented on programmable hardware.
  • additional logic modules are implemented on the same logic level as the carry chain of logic modules.
  • the input values include carry out signals from adders of the additional logic modules.
  • values of the bit vector are based on a configuration of the logic modules to act as a corresponding logic function.
  • the values of the bit vector include the primer bit and a set of one bit values based on the logic modules being configured to act as an AND-reduce logic function.
  • the values of the bit vector include the primer bit, a one bit value for the first vector bit, and a set of zero bit values based on the logic modules being configured to act as an OR-reduce logic function.
  • the primer bit is a one bit value.
  • any features described as modules, components, or the like may also be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, the techniques may be realized at least in part by a non-transitory processor-readable storage medium comprising instructions that, when executed by at least one processor, perform one or more of the methods described herein.
  • the instructions may be organized into routines, programs, objects, components, data structures, etc., which may perform particular tasks and/or implement particular data types, and which may be combined or distributed as desired in various embodiments.
  • determining encompasses a wide variety of actions and, therefore, “determining” can include calculating, computing, processing, deriving, investigating, looking up (e.g., looking up in a table, a database or another data structure), ascertaining and the like. Also, “determining” can include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory) and the like.
  • determining can include resolving, selecting, choosing, establishing and the like.
  • the terms “comprising,” “including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements. Additionally, it should be understood that references to “one embodiment” or “an embodiment” of the present disclosure are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features. For example, any element or feature described in relation to an embodiment herein may be combinable with any element or feature of any other embodiment described herein, where compatible.

Abstract

La présente divulgation concerne un système logique de chaîne de report qui tire profit des signaux d'entrée et de sortie à partir de blocs logiques pour mettre en œuvre des fonctions logiques sur un matériel programmable (par exemple, un matériel FPGA). En particulier, des mises en œuvre du système logique de chaîne de report facilitent la mise en œuvre de portes logiques (par exemple, des portes ET/OU) présentant un nombre élevé de signaux d'entrée sans encourir de retards de routage provoqués par le routage de signaux de sortie entre des composants logiques mis en œuvre à travers différents étages logiques. Par exemple, des mises en œuvre décrites dans la description impliquent l'alimentation des signaux de sortie entre des additionneurs d'une chaîne logique à travers de multiples composants logiques sur un étage logique commun, réduisant ainsi les pénalités de routage provoquées par des signaux de routage par l'intermédiaire d'un tissu de routage du matériel programmable.
PCT/US2022/047954 2021-12-30 2022-10-27 Utilisation et/ou réduction de chaînes de report sur un matériel programmable WO2023129261A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202163295323P 2021-12-30 2021-12-30
US63/295,323 2021-12-30
US17/740,831 US20230214180A1 (en) 2021-12-30 2022-05-10 Using and/or reduce carry chains on programmable hardware
US17/740,831 2022-05-10

Publications (1)

Publication Number Publication Date
WO2023129261A1 true WO2023129261A1 (fr) 2023-07-06

Family

ID=84358214

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/047954 WO2023129261A1 (fr) 2021-12-30 2022-10-27 Utilisation et/ou réduction de chaînes de report sur un matériel programmable

Country Status (2)

Country Link
TW (1) TW202331575A (fr)
WO (1) WO2023129261A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7167022B1 (en) * 2004-03-25 2007-01-23 Altera Corporation Omnibus logic element including look up table based logic elements
US20190288688A1 (en) * 2019-06-06 2019-09-19 Intel Corporation Logic circuits with augmented arithmetic densities

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7167022B1 (en) * 2004-03-25 2007-01-23 Altera Corporation Omnibus logic element including look up table based logic elements
US20190288688A1 (en) * 2019-06-06 2019-09-19 Intel Corporation Logic circuits with augmented arithmetic densities

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KIM JIN HEE ET AL: "FPGA Architecture Enhancements for Efficient BNN Implementation", 2018 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), IEEE, 10 December 2018 (2018-12-10), pages 214 - 221, XP033561523, DOI: 10.1109/FPT.2018.00039 *

Also Published As

Publication number Publication date
TW202331575A (zh) 2023-08-01

Similar Documents

Publication Publication Date Title
EP3513357B1 (fr) Opérations de tenseur et accélération
AU2016292791B2 (en) Reciprocal quantum logic (RQL) circuit simulation system
US10776078B1 (en) Multimodal multiplier systems and methods
WO2015121713A1 (fr) Architecture de fpga et automatisation de conception par le biais d'un placement restreint
JP2017535845A (ja) レシプロカル量子論理(rql)回路合成
JPH07210368A (ja) 算術演算の結果として生じる正および負のオーバーフローのハードウェアによる効率的な取り扱い方法
US9146707B2 (en) Generating a fast 3x multiplicand term for radix-8 booth multiplication
Hormigo et al. Multioperand redundant adders on FPGAs
US20200042321A1 (en) Low power back-to-back wake up and issue for paired issue queue in a microprocessor
US9904512B1 (en) Methods and apparatus for performing floating point operations
Demmel et al. Numerical reproducibility and accuracy at exascale
Arun Sekar et al. Implementation of FIR filter using reversible modified carry select adder
US7269616B2 (en) Transitive processing unit for performing complex operations
US4796219A (en) Serial two's complement multiplier
JPH08212058A (ja) 加算オーバフロ検出回路
WO2023129261A1 (fr) Utilisation et/ou réduction de chaînes de report sur un matériel programmable
US20230214180A1 (en) Using and/or reduce carry chains on programmable hardware
US11276223B2 (en) Merged data path for triangle and box intersection test in ray tracing
CN109596976B (zh) Fpga内部dsp模块的测试方法
US20160173071A1 (en) Clock-distribution device and clock-distribution method
CN110506255B (zh) 节能型可变功率加法器及其使用方法
US11113028B2 (en) Apparatus and method for performing an index operation
Zhabin et al. Asynchronous On-Line Float-Point Computations in Systems with Direct Connections between Computation Units
US9069612B2 (en) Carry look-ahead adder with generate bits and propagate bits used for column sums
US7606848B2 (en) Detector in parallel with a logic component

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22803129

Country of ref document: EP

Kind code of ref document: A1