US20240028295A1 - Efficient logic blocks architectures for dense mapping of multipliers - Google Patents
Efficient logic blocks architectures for dense mapping of multipliers Download PDFInfo
- Publication number
- US20240028295A1 US20240028295A1 US18/473,870 US202318473870A US2024028295A1 US 20240028295 A1 US20240028295 A1 US 20240028295A1 US 202318473870 A US202318473870 A US 202318473870A US 2024028295 A1 US2024028295 A1 US 2024028295A1
- Authority
- US
- United States
- Prior art keywords
- inputs
- logic block
- configurable
- outputs
- receive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013507 mapping Methods 0.000 title description 56
- 238000012545 processing Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 15
- 238000000034 method Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000013461 design Methods 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000001902 propagating effect Effects 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 238000013144 data compression Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000004744 fabric Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002250 progressing effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/57—Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
- G06F7/575—Basic arithmetic logic units, i.e. devices selectable to perform either addition, subtraction or one of several logical operations, using, at least partially, the same circuitry
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/02—Digital function generators
- G06F1/03—Digital function generators working, at least partly, by table look-up
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/34—Circuit design for reconfigurable circuits, e.g. field programmable gate arrays [FPGA] or programmable logic devices [PLD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/57—Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
Definitions
- This disclosure generally relates to integrated circuits, such as field-programmable gate arrays (FPGAs). More particularly, the present disclosure relates to performing mathematical operations, such as multiplication, implemented using circuitry elements of an integrated circuit (e.g., programmable logic of an FPGA).
- FPGAs field-programmable gate arrays
- Integrated circuits increasingly carry out functions such as encryption and machine leaning. Encryption and machine learning, as well as many other operations that may take place on integrated circuitry, may utilize multiplier circuitry (e.g., multipliers). For example, multiplier may be programmed onto logic of an integrated circuit and utilized to determine products of numbers being multiplied. However, more multiplier circuitry may be used than desired in some instances, which can result in a limited number of multiplication operations being performed. For instance, when too many logic blocks may be used to perform multiplication, the resources of the integrated circuitry may be inefficiently used, and the integrated circuitry may not be able to perform a desired number of multiplication operations. Moreover, multiplication operations may take more than desired to perform.
- multiplier circuitry e.g., multipliers
- multiplier may be programmed onto logic of an integrated circuit and utilized to determine products of numbers being multiplied.
- more multiplier circuitry may be used than desired in some instances, which can result in a limited number of multiplication operations being performed. For instance, when too
- FIG. 1 is a block diagram of a system for implementing arithmetic operations, in accordance with an embodiment
- FIG. 2 is a block diagram of an integrated circuit in which addition circuitry may be implemented, in accordance with an embodiment
- FIG. 3 is a schematic diagram of a logic block that may be implemented on the integrated circuit device of FIG. 1 , in accordance with an embodiment
- FIG. 4 illustrates an example of unsigned multiplication, in accordance with an embodiment
- FIG. 5 illustrates an example of signed multiplication, in accordance with an embodiment
- FIG. 6 is a flow diagram of a process for carrying out multiplication operations, in accordance with an embodiment
- FIG. 7 illustrated symbols used to discuss multiplication operations discussed herein, in accordance with an embodiment
- FIG. 8 illustrates two patterns associated with multiplication operations, in accordance with an embodiment
- FIG. 9 illustrates two additional patterns associated with multiplication operations, in accordance with an embodiment
- FIG. 10 illustrates four patterns associated with multiplication operations, in accordance with an embodiment
- FIG. 11 illustrates four additional patterns associated with multiplication operations, in accordance with an embodiment
- FIG. 12 illustrates a mapping of a 3 ⁇ 3 multiplication operation, in accordance with an embodiment
- FIG. 13 A illustrates a first stage of mapping of a 4 ⁇ 4 multiplication operation, in accordance with an embodiment
- FIG. 13 B illustrates a second stage of mapping of a 4 ⁇ 4 multiplication operation, in accordance with an embodiment
- FIG. 14 A is a illustrates a first stage of mapping of a 5 ⁇ 5 multiplication operation, in accordance with an embodiment
- FIG. 14 B is a illustrates a second stage of mapping of a 5 ⁇ 5 multiplication operation, in accordance with an embodiment
- FIG. 15 A is a illustrates a first stage of mapping of a 6 ⁇ 6 multiplication operation, in accordance with an embodiment
- FIG. 15 B is a illustrates a second stage of mapping of a 6 ⁇ 6 multiplication operation, in accordance with an embodiment
- FIG. 16 A is a illustrates a first stage of mapping of a 7 ⁇ 7 multiplication operation, in accordance with an embodiment
- FIG. 16 B is a illustrates a second stage of mapping of a 7 ⁇ 7 multiplication operation, in accordance with an embodiment
- FIG. 17 A is a illustrates a first stage of mapping of a 8 ⁇ 8 multiplication operation, in accordance with an embodiment
- FIG. 17 B is a illustrates a second stage of mapping of a 8 ⁇ 8 multiplication operation, in accordance with an embodiment
- FIG. 18 A is a illustrates a first stage of mapping of a 9 ⁇ 9 multiplication operation, in accordance with an embodiment
- FIG. 18 B is a illustrates a second stage of mapping of a 9 ⁇ 9 multiplication operation, in accordance with an embodiment
- FIG. 19 A is a illustrates a first stage of mapping of a 6 ⁇ 6 multiplication operation, in accordance with an embodiment
- FIG. 19 B is a illustrates a second stage of mapping of a 6 ⁇ 6 multiplication operation, in accordance with an embodiment
- FIG. 19 C is a illustrates a third stage of mapping of a 6 ⁇ 6 multiplication operation, in accordance with an embodiment
- FIG. 20 A is a illustrates a first stage of mapping of a 7 ⁇ 7 multiplication operation, in accordance with an embodiment
- FIG. 20 B is a illustrates a second stage of mapping of a 7 ⁇ 7 multiplication operation, in accordance with an embodiment
- FIG. 20 C is a illustrates a third stage of mapping of a 7 ⁇ 7 multiplication operation, in accordance with an embodiment
- FIG. 21 A is a illustrates a first stage of mapping of a 8 ⁇ 8 multiplication operation, in accordance with an embodiment
- FIG. 21 B is a illustrates a second stage of mapping of a 8 ⁇ 8 multiplication operation, in accordance with an embodiment
- FIG. 21 C is a illustrates a third stage of mapping of a 8 ⁇ 8 multiplication operation, in accordance with an embodiment
- FIG. 22 A is a illustrates a first stage of mapping of a 9 ⁇ 9 multiplication operation, in accordance with an embodiment
- FIG. 22 B is a illustrates a second stage of mapping of a 9 ⁇ 9 multiplication operation, in accordance with an embodiment
- FIG. 22 C is a illustrates a third stage of mapping of a 9 ⁇ 9 multiplication operation, in accordance with an embodiment
- FIG. 23 is a schematic diagram of circuitry that may be included in the logic block of FIG. 3 , in accordance with an embodiment
- FIG. 24 is a schematic diagram of circuitry that may be included in the logic block of FIG. 3 , in accordance with an embodiment
- FIG. 25 is a schematic diagram of a logic block that can be implemented on the integrated circuit device of FIG. 1 , in accordance with an embodiment
- FIG. 26 illustrates two patterns associated with multiplication operations, in accordance with an embodiment
- FIG. 27 illustrates a mapping of a 4 ⁇ 4 multiplication operation, in accordance with an embodiment
- FIG. 28 A illustrates a first stage of mapping of a 5 ⁇ 5 multiplication operation, in accordance with an embodiment
- FIG. 28 B illustrates a second stage of mapping of a 5 ⁇ 5 multiplication operation, in accordance with an embodiment
- FIG. 29 A illustrates a first stage of mapping of a 6 ⁇ 6 multiplication operation, in accordance with an embodiment
- FIG. 29 B illustrates a second stage of mapping of a 6 ⁇ 6 multiplication operation, in accordance with an embodiment
- FIG. 30 A illustrates a first stage of mapping of a 7 ⁇ 7 multiplication operation, in accordance with an embodiment
- FIG. 30 B illustrates a second stage of mapping of a 7 ⁇ 7 multiplication operation, in accordance with an embodiment
- FIG. 31 A illustrates a first stage of mapping of a 8 ⁇ 8 multiplication operation, in accordance with an embodiment
- FIG. 31 B illustrates a second stage of mapping of a 8 ⁇ 8 multiplication operation, in accordance with an embodiment
- FIG. 32 A illustrates a first stage of mapping of a 9 ⁇ 9 multiplication operation, in accordance with an embodiment
- FIG. 32 B illustrates a second stage of mapping of a 9 ⁇ 9 multiplication operation, in accordance with an embodiment
- FIG. 33 is a block diagram of a data processing system, in accordance with an embodiment, in accordance with an embodiment
- the articles “a,” “an,” and “the” are intended to mean that there are one or more of the elements.
- the terms “comprising,” “including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.
- references to “one embodiment” or “an embodiment” of the present disclosure are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features.
- the phrase A “based on” B is intended to mean that A is at least partially based on B.
- the term “or” is intended to be inclusive (e.g., logical OR) and not exclusive (e.g., logical XOR). In other words, the phrase A “or” B is intended to mean A, B, or both A and B.
- Integrated circuits such as programmable logic devices, may be utilized to perform mathematical operations, such as addition and multiplication.
- logic e.g., reconfigurable logic
- programmable logic devices can be programmed to perform the mathematical operations.
- programmed logic utilized to perform multiplication can be referred to as a “multiplier.”
- Logic blocks which may include particular circuit elements (e.g., look-up tables, adders, multiplexers, etc.) may be utilized to perform multiplication.
- the amount of logic blocks of the programmable logic device used to perform multiplication may be undesirably large, which may reduce the amount of the programmable logic device that is available to be programmed (e.g., to perform other functions).
- the present application is generally directed to more efficient techniques for performing multiplication on programmable logic devices such as, but not limited to, field programmable gate arrays (FPGAs).
- FPGAs field programmable gate arrays
- various architectures for logic blocks are provided that enable fewer logic blocks to be utilized to perform multiplication operations, thereby enabling more multiplication operations to be performed on programmable logic devices.
- FIG. 1 illustrates a block diagram of a system 10 that may implement arithmetic operations.
- a designer may desire to implement functionality, such as the arithmetic operations of this disclosure, on an integrated circuit device 12 (e.g., a programmable logic device such as a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)).
- the designer may specify a high-level program to be implemented, such as an OpenCL program, which may enable the designer to more efficiently and easily provide programming instructions to configure a set of programmable logic cells for the integrated circuit device 12 without specific knowledge of low-level hardware description languages (e.g., Verilog or VHDL).
- Verilog Verilog or VHDL
- OpenCL is quite similar to other high-level programming languages, such as C++, designers of programmable logic familiar with such programming languages may have a reduced learning curve than designers that are required to learn unfamiliar low-level hardware description languages to implement new functionalities in the integrated circuit device 12 .
- the designers may implement their high-level designs using design software 14 , such as a version of Intel® Quartus® by INTEL CORPORATION.
- the design software 14 may use a compiler 16 to convert the high-level program into a lower-level description.
- the compiler 16 may provide machine-readable instructions representative of the high-level program to a host 18 and the integrated circuit device 12 .
- the host 18 may receive a host program 22 which may be implemented by the kernel programs 20 .
- the host 18 may communicate instructions from the host program 22 to the integrated circuit device 12 via a communications link 24 , which may be, for example, direct memory access (DMA) communications or peripheral component interconnect express (PCIe) communications.
- DMA direct memory access
- PCIe peripheral component interconnect express
- the kernel programs 20 and the host 18 may enable configuration of a logic block 26 on the integrated circuit device 12 .
- the logic block 26 may include circuitry and/or other logic elements and may be configured to implement arithmetic operations, such as addition and multiplication.
- the designer may use the design software 14 to generate and/or to specify a low-level program, such as the low-level hardware description languages described above.
- the system 10 may be implemented without a separate host program 22 .
- the techniques described herein may be implemented in circuitry as a non-programmable circuit design. Thus, embodiments described herein are intended to be illustrative and not limiting.
- FIG. 2 illustrates an example of the integrated circuit device 12 as a programmable logic device, such as a field-programmable gate array (FPGA).
- the integrated circuit device 12 may be any other suitable type of programmable logic device (e.g., an ASIC and/or application-specific standard product).
- integrated circuit device 12 may have input/output circuitry 42 for driving signals off device and for receiving signals from other devices via input/output pins 44 .
- Interconnection resources 46 such as global and local vertical and horizontal conductive lines and buses, may be used to route signals on integrated circuit device 12 .
- interconnection resources 46 may include fixed interconnects (conductive lines) and programmable interconnects (i.e., programmable connections between respective fixed interconnects).
- Programmable logic 48 may include combinational and sequential logic circuitry.
- programmable logic 48 may include look-up tables, registers, and multiplexers.
- the programmable logic 48 may be configured to perform a custom logic function.
- the programmable interconnects associated with interconnection resources may be considered to be a part of programmable logic 48 .
- Programmable logic devices such as the integrated circuit device 12 may contain programmable elements 50 with the programmable logic 48 .
- a designer e.g., a customer
- some programmable logic devices may be programmed by configuring their programmable elements 50 using mask programming arrangements, which is performed during semiconductor manufacturing.
- Other programmable logic devices are configured after semiconductor fabrication operations have been completed, such as by using electrical programming or laser programming to program their programmable elements 50 .
- programmable elements 50 may be based on any suitable programmable technology, such as fuses, antifuses, electrically-programmable read-only-memory technology, random-access memory cells, mask-programmed elements, and so forth.
- the programmable elements 50 may be formed from one or more memory cells.
- configuration data is loaded into the memory cells using pins 44 and input/output circuitry 42 .
- the memory cells may be implemented as random-access-memory (RAM) cells.
- RAM random-access-memory
- CRAM configuration RAM cells
- These memory cells may each provide a corresponding static control output signal that controls the state of an associated logic component in programmable logic 48 .
- the output signals may be applied to the gates of metal-oxide-semiconductor (MOS) transistors within the programmable logic 48 .
- MOS metal-oxide-semiconductor
- a user may utilize the design software 14 to implement the logic block 26 on the programmable logic 48 of the integrated circuit device 12 .
- the designer may specify in a high-level program that mathematical operations such as addition and multiplication be performed.
- the compiler 16 may convert the high-level program into a lower-level description that is used to program the programmable logic 48 to perform addition.
- FIG. 3 illustrates a logic block 26 A that may be utilized to perform mathematical operations such as multiplication.
- the logic block 26 A includes four lookup tables (LUTs) 60 (e.g., LUTs 60 A- 60 D) that may be four-input LUTs.
- each of the LUTs 60 may have four inputs (e.g., four single bit inputs), and the LUTs 60 may output one or more values (e.g., bit values) based on how each of the LUTs is programmed.
- a first LUT 60 A and second LUT 60 B may each receive inputs A, B, C 0 , and DO and output values based on the inputs A, B, C 0 , DO.
- the outputted value may be partial products determined while performing multiplication.
- a third LUT 60 C and fourth LUT 60 D may each receive inputs A, B, D 1 and either C 0 or C 1 .
- the third LUT 60 C and fourth LUT 60 D may each output a bit value based on the input values received.
- a multiplexer 62 that can receive inputs C 0 , C 1 , and a control signal.
- the integrated circuit device 12 may send control signals to cause the multiplexer 62 to output one value (e.g., C 0 or C 1 ) to be used as inputs for the LUTs 60 C, 60 D.
- the LUTs 60 may be utilized to perform various mathematical operations and logic operations.
- the LUTs 60 may perform logic operations on inputted values (e.g., A, B, C 0 , C 1 , D 0 , D 1 ) while operating as carry-lookahead logic (e.g., performing addition).
- the outputs e.g., at P 0 and P 1
- the outputs may be utilized as propagating carries, and the outputs may be utilized as generating carries (e.g., at G 0 and G 1 ).
- the outputs may also be partial products of multiplication operations.
- the values generated by the LUTs 60 may be utilized as inputs into other circuitry included in the logic block 26 A, such as a multiplexer 64 , multiplexer 66 , multiplexer 68 , and multiplexer 70 .
- the multiplexer 64 may receive the output values from LUTs 60 A, 60 B as well as an input E.
- the multiplexer 64 may output a value (e.g., O 5 _ 0 ).
- the multiplexer 66 may receive the output values from the LUTs 60 C, 60 D as well as input E.
- the multiplexer 66 may output a value that may be used an input of the multiplexer 68 , which may also receive the output of the multiplexer 64 and an input F.
- the multiplexer 68 may generate an output value O 6 .
- the multiplexer 70 may receive the outputs of the LUTs 60 C, 60 D as well as input F and generate an output O 5 _ 1 based on the values of these inputs.
- Outputs of the LUTs 60 may also be used as inputs into adder circuitry 80 , which may include two adders that are communicatively coupled to one another (e.g., two carry-lookahead adders in which one of the adders receives one or more outputs from the other adder).
- the adder circuitry 80 may also receive a carry-in value (e.g., Cin), for example, from other circuitry included in the integrated circuit device 12 , such as another logic block 26 . More specifically, inputs A, B, C 0 , D 0 , and D 1 may be used in generating four partial products, propagating carries (e.g., P 0 and P 1 ), and generating carries (e.g., G 0 and G 1 ). As discussed below, the adder circuitry 80 may reduce the partial products. Furthermore, the adder circuitry 80 may generate a carry-out value (e.g., Cout) that may be provided to other circuitry included in the integrated circuit, such as another logic block 26
- the logic block 26 A also includes circuitry 82 that, as illustrated, has gates 84 (e.g., logical AND gates, logical NAND gates) and programmable inverters.
- the circuitry 82 may use inputs E, D 0 , and C 1 to generate two partial products in addition to the four partial products generated by the adding circuitry 80 .
- the circuitry 82 may reduce these two partial products as well as the four partial products generated by the adder circuitry 80 . As such, the logic block 26 may generate and reduce six partial products.
- the gate 84 A may receive input E and an input En.
- the input En may be an enable/disable signal provided by the integrated circuit device 12 to disable the circuitry 82 when the circuitry 82 is not used and enable the circuitry 82 when the circuitry 82 is to be used. For example, by disabling the circuitry 82 , power usage that may otherwise be caused by toggling signals may be reduced or eliminated.
- the gate 84 A may generate an output, which may be used as an input for both of the gates 84 B, 84 C.
- the gate 84 B may receive input C 1 and output a value (e.g., that is provided to a logical NOT gate 86 A and a multiplexer 88 A), and the gate 84 C may receive input D 0 and output a value (e.g., that is provided to a logical NOT gate 86 B and a multiplexer 88 B).
- the multiplexers 88 may receive inverter signals (e.g., Inv 0 for multiplexer 88 A and Inv 1 for multiplexer 88 B) from the integrated circuit device 12 and be utilized when performing signed multiplication (e.g., multiplication that may include positive and negative values).
- signed multiplication e.g., multiplication that may include positive and negative values.
- FIG. 3 includes programmable inverters ((e.g., Inv 0 and Inv 1 ) of FIG. 3 )) other types of inverters may be utilized in other embodiments.
- the logic block 26 A includes two additional adders 90 A, 90 B.
- the adders 90 may each receive three inputs and output two values.
- adder 90 A may receive an output from the adding circuitry 80 , an output from the multiplexer 88 A, and a carry-in value Cin 2 to produce a value S 0 and a carry out value.
- the carry-in value Cin 2 may be a carry-out value generated by another logic block 26 .
- the adder 90 B may receive the carry out value from the adder 90 A, an output from the adding circuitry 80 , and an output from the multiplexer 88 B to determine a value S 1 and a carry-out value Cout 2 .
- the logic block 26 A may be utilized to perform multiplication operations. However, before progressing to specific examples of multiplication operations carried out by the logic block 26 A, a general discussion of multiplication and mapping is provided. As discussed below, mapping may be undertaken in order to determine how to program the programmable logic 48 of the integrated circuit device 12 to perform multiplication.
- FIG. 4 illustrates a diagram 100 showing an input 102 being multiplied by another input 104 .
- the input 102 is a five bit input (e.g., input A having bits 0 - 4 )
- the input 104 is a three bit input (e.g., input B having bits 0 - 2 ).
- Multiplying each bit of the input 102 with each bit of the input 104 results in partial products 106 .
- the partial products 106 can be summed (e.g., using adders) to determine an output 108 that is the sum of the partial products 106 and the product of the inputs 102 , 104 .
- FIG. 5 is a diagram 130 illustrating an example of signed multiplication. Most significant bits 132 , 134 may respectively indicate whether inputs 136 , 138 are positive or negative.
- partial products 140 are determined. Some of the partial products 140 may be inverted, as indicated by shading in FIG. 5 . For instance, a partial product may be inverted when it is a partial product involving one of the most significant bits 132 , 134 . A constant “1” may also be added to a first row of the partial products 140 . Additionally, as illustrated in FIG. 5 , a most significant bit 142 of an output 144 (i.e., the sum of the partial products 140 ) may be inverted.
- FIG. 6 illustrates a flow diagram of a process 160 that may be performed using the integrated circuit device 12 using one or more logic blocks 26 , for instance, to carry out multiplication operations.
- inputs may be received.
- the inputs may include one or more bits that are to be multiplied.
- the integrated circuit device 12 may determine a mapping for the inputs. In other words, the integrated circuit device 12 may determine how to carry out the multiplication operation involving the inputs. To determine a mapping, the integrated circuit device 12 may determine one or more patterns among the inputs as well as partial products that may be generated while determining a product of the two inputs. Examples of specific patterns are discussed below in more detail.
- the integrated circuit device 12 may multiply the two inputs based on the mapping.
- circuitry in the integrated circuit device 12 e.g., programmable logic 48 , lookup tables 60
- components of each pattern e.g., bits of an input or partial products
- the logic blocks 26 may determine a product of two values being multiplied.
- FIG. 7 is provided to show types of symbols that are used to discuss the patterns.
- FIG. 7 includes a partial product 180 (indicated by a circle), a non-carry bit 182 (indicated by a square), and a carry bit 184 (indicated by a pentagon).
- the illustrated partial product 180 is a partial product of B j and A i , where i and j are bits of inputs A and B, respectively (e.g., the i th bit of input A and the j th bit of input B).
- i and j are bits of inputs A and B, respectively (e.g., the i th bit of input A and the j th bit of input B).
- FIG. 8 illustrates two patterns 200 A, 200 B.
- the patterns 200 A, 200 B respectively include symbols 202 A, 202 B that can be used and/or determined by a single logic block 26 (e.g., logic block 26 A) to generate outputs 204 A, 204 B.
- a single logic block 26 e.g., logic block 26 A
- carry-in values may be utilized, while partial products may be generated.
- two carry-in values may be received, six partial products may be determined (based on values of bits of inputs A and B), and the output 204 A that includes two non-carry bits and two carry bits may be the output from the logic block 26 A.
- the logic block 26 A may receive two carry-in values (e.g., from another logic block 26 communicatively coupled to the logic block 26 A) and up to seven inputs, and the logic block 26 A may generate up to two carry bits and two non-carry bits. Table 1 is provided below to indicate how inputs A and B may be routed using the logic block 26 A.
- FIG. 9 illustrates two patterns 200 C, 200 D.
- the patterns 200 C, 200 D respectively include symbols 202 C, 202 D that can be used and/or determined by a single logic block 26 (e.g., logic block 26 A) to generate outputs 204 C, 204 D.
- a single logic block 26 e.g., logic block 26 A
- four partial products may be determined (based on values of bits of inputs A and B), and the output 204 C that includes up to four non-carry bits may be the output from the logic block 26 A.
- three partial products i.e., symbols 202 D
- Table 2 is provided below to indicate how inputs A and B may be routed using the logic block 26 A when utilizing pattern 200 C
- Table 3 is provided to indicate how inputs A and B may be routed using the logic block 26 A when utilizing pattern 200 D.
- the output 204 C generated using the pattern 200 C may be at S 0 , S 1 , O 5 _ 0 and O 5 _ 1 .
- the output 204 D generated using the pattern 200 D may be at S 0 , S 1 , and O 5 _ 1 .
- FIG. 10 illustrates four patterns 200 E, 200 F, 200 G, 200 H that, in general, may be used to turn one or more carry-in signals into non-carry signals and generate a partial product. Additionally, partial products (e.g., produced by LUTs 60 or adder circuitry 80 ) may be inputs.
- the patterns 200 E, 200 F, 200 G, 200 H respectively include symbols 202 E, 202 F, 202 G, 202 H that can be used and/or determined by a single logic block 26 (e.g., logic block 26 A) to generate outputs 204 E, 204 F, 204 G, 204 H.
- one carry-in bit may be used as an input, and one partial product may be determined (based on values of bits of inputs A and B), and the output 204 E includes up to two non-carry bits.
- pattern 200 F two carry-in bits are received and one partial product is determined.
- the output 204 F may include up to three non-carry bits.
- pattern 200 G one carry bit may be received and two partial products may be determined; the output 204 G may include up to three non-carry bits.
- pattern 200 H two carry bits may be received and two partial products may be generated.
- the output 204 H may include up to three non-carry bits.
- partial product inputs may be connected to inputs C 0 and D 0 and input E may be set to “1.”
- the output will be generated at output O 5 _ 0 .
- the incoming carry bit (e.g., received from another logic block 26 ) may be received via carry line Cin, and a resulting output may be at S 0 .
- utilizing the pattern 200 E only uses a half logic block. In other words, when utilizing the pattern 200 E, only two adjacent LUTs 60 , one adder of the adder circuitry 80 , and one of the adders 90 A, 90 B is utilized to determine the output 204 E. Accordingly, the other half of the logic block 26 A may be utilized for other determinations (e.g., using the pattern 200 E on another set of inputs).
- partial product inputs may be connected to inputs C 1 and D 1 , and input F is set to “1.”
- the portion of the output 204 F arising from a partial product input may be output via output O 5 _ 1 .
- Carry bits may be received (e.g., from another logic block 26 ) via carry lines Cin and Cin 2 , and the corresponding portions of the output 204 F are output via outputs S 0 and S 1 .
- one of the partial product inputs is connected to inputs C 0 and D 0
- the other partial product input is connected to inputs C 1 and D 1 .
- Inputs E and F are set to “1.”
- the portions of the output 204 G associated with the partial products will be generated at outputs 050 and O 5 _ 1 .
- the portion of the output 204 G associated with a carry bit may turn into a portion of the output 204 that is output at S 0 .
- Pattern 200 H may be used to generate a partial product at the same bit position as an incoming carry bits and reduce the partial product.
- one partial product input is connected to inputs C 0 and D 0 .
- the outputs will be in S 0 and S 1 .
- the second partial product is connected to inputs C 1 and D 1 , and input F is set to “1”
- the corresponding portion of the output 204 H will be at output O 5 _ 1 .
- Carry bits may be received (e.g., from another logic block 26 ) via carry lines Cin and Cin 2 , and the corresponding portions of the output 204 H are output via outputs S 0 and S 1 .
- FIG. 11 illustrates four patterns 200 I, 200 J, 200 K, 200 L that, in general, may be used to add single bits together. Additionally, partial products (e.g., produced by LUTs 60 or adder circuitry 80 ) may be inputs.
- the patterns 200 I, 200 J, 200 K, 200 L respectively include symbols 202 I, 202 J, 202 K, 202 L that can be used by a single logic block 26 (e.g., logic block 26 A) to generate outputs 2041 , 204 J, 204 K, 204 L.
- a single logic block 26 e.g., logic block 26 A
- the output 204 I may include up to four non-carry bits.
- pattern 200 J four non-carry bits are received, and the output 204 J may include up to three non-carry bits.
- the output 204 K two carry bits and six non-carry bits may be received.
- the output 204 K may include up to two non-carry bits and two carry bits.
- pattern 200 L one carry bit and one non-carry bit may be received, and the output 204 L may include one non-carry bit.
- two inputs i.e., two of the non-carry bits included in the symbols 202 J
- the outputs will be at outputs S 0 , S 1 , O 5 _ 0 , and O 5 _ 1 .
- the inputs will be connected in the same manner as the inputs when using the pattern 200 I.
- the bits of the output 2004 J will be at outputs S 0 , S 1 , and O 5 _ 1 .
- inputs may be connected to according to Table 4 below:
- the received carry bits may be received via carry lines Cin and Cin 2 from another logic block 26 that is communicatively coupled to the logic block 26 A inputs C 1 and D 1 , and input F is set to “1.”
- the non-carry bits of the output 204 K may be generated at outputs S 0 and S 1 , and the carry bits may be output via Cout and Cout 2 .
- Pattern 200 L may be used to generate when a carry bit and a non-carry bit are in the most significant bit of an output. In this situation, because both the carry bit and non-carry bit will not be equal to one, meaning an output generating by summing the carry bit and non-carry bit will generate a non-carry bit and no carry bits.
- the carry bit may be received via carry line Cin, and the non-carry bit may be connected to input C 0 .
- the outputs may be generated at S 0 .
- the pattern 200 L only uses half of a logic block 26 , meaning the other half of the logic block may be utilized to perform other determinations.
- FIGS. 9 and 12 - 22 C will be discussed to show mappings of various examples of N ⁇ N multiplication operations.
- the mappings may include one or more of the patterns 200 discussed above.
- N is an integer ranging in value from two to nine indicative of the number of bits included in an input.
- a 2 ⁇ 2 multiplication operation involves multiplying two inputs that each include two bits.
- the integrated circuit device 12 may utilize pattern 200 C to generate the output 204 C as discussed above. Accordingly, 2 ⁇ 2 multiplication operations may be carried out using a single logic block 26 .
- FIG. 12 illustrates a 3 ⁇ 3 multiplication operation.
- the 3 ⁇ 3 multiplication operation may be carried out using the pattern 200 A twice and the pattern 200 E once.
- partial product A 0 B 0 has been moved from the top right position to the left.
- the output bits may be moved around to create the correct output.
- 3 ⁇ 3 multiplication operations may be performed using two and one-half logic blocks 26 .
- N ⁇ N multiplication operations in which N is greater than 3 may be performed using more than one stage.
- a “stage” generally refers to the number rows (or column, depending on orientation) of logic blocks 26 used to perform a multiplication operation. For example, the 2 ⁇ 2 and 3 ⁇ 3 multiplication operations discussed above can be done with a single stage. As discussed below, N ⁇ N multiplication operations in which N ranges from 4 to 9 may be performed in two stages.
- bits may be determined using a first stage of logic blocks, and the bits may be provided as inputs to logic blocks 26 included in a second stage of logic blocks 26 (e.g., one or more logic blocks communicatively coupled to the logic blocks 26 of the first stage of logic blocks).
- FIG. 13 A illustrates a first stage of a 4 ⁇ 4 multiplication operation.
- pattern 200 C is used (which will generate non-carry bits of i 0 , i 1 , and i 2 ).
- the pattern 200 A may be used once, pattern 200 B may be used twice, and pattern 200 E may be used once. Accordingly, a total of four and one-half logic blocks 26 across two stages may be utilized to carry out 4 ⁇ 4 multiplication operations.
- FIG. 14 A illustrates a first stage of a 5 ⁇ 5 multiplication operation in which the pattern 200 A is used three times and pattern 200 E is used once.
- FIG. 14 B partial products not generated during the first stage can be determined and summed with the bits generated during a second stage using the pattern 200 B three times and the pattern 200 E once. Accordingly, 5 ⁇ 5 multiplication operations can be performed using seven logic blocks 26 across two stages.
- N is two, three, four, or five
- the examples provided above include both the fewest number of stages and logic blocks 26 that can be used to complete multiplication operations.
- the mapping for a particular multiplication operation may be utilized to use the fewest number of stages or the fewest number of logic blocks 26 .
- FIGS. 15 A- 18 B relate to N ⁇ N multiplication operations in which the fewest number of stages is used
- FIGS. 19 A- 22 C relate to N ⁇ N multiplication operation in which the fewest number of logic blocks 26 is utilized.
- FIG. 15 A shows a first stage of a 6 ⁇ 6 multiplication operation in which the pattern 200 A is used five times and the pattern 200 H is used twice.
- patterns 200 B, 200 K may each be used twice, and pattern 200 E may be used once.
- 6 ⁇ 6 multiplication operations can be performed using eleven and one-half logic blocks 26 across two stages.
- FIG. 16 A shows a first stage of a 7 ⁇ 7 multiplication operation in which the pattern 200 A is used seven times, and patterns 200 F, 200 H are each used once.
- pattern 200 B is used twice
- pattern 200 K is used three times
- pattern 200 E is used once.
- 7 ⁇ 7 multiplication operations can be performed using fourteen and one-half logic blocks 26 across two stages.
- FIG. 17 A shows a first stage of an 8 ⁇ 8 multiplication operation in which the pattern 200 A is used twelve times, pattern 200 E is used twice, and pattern 200 H is used twice.
- pattern 200 B is used once
- pattern 200 K is used five times
- pattern 200 I is used once.
- 8 ⁇ 8 multiplication operations can be performed using twenty and one-half logic blocks 26 across two stages.
- FIG. 18 A shows a first stage of a 9 ⁇ 9 multiplication operation in which the pattern 200 A is used fifteen times and the pattern 200 F is used three times.
- pattern 200 K is used six times, and pattern 200 L is used once.
- 9 ⁇ 9 multiplication operations can be performed using twenty-five and one-half logic blocks 26 across two stages.
- FIGS. 19 A- 22 C provide examples of mappings for performing N ⁇ N multiplication operations in which the fewest number of logic blocks 26 is used.
- FIG. 19 A shows a first stage of a 6 ⁇ 6 multiplication operation in which the pattern 200 C is used once.
- pattern 200 A and pattern 200 B are each used twice, and pattern 200 E is used once.
- FIG. 19 C illustrates a third stage of a 6 ⁇ 6 multiplication operation in which the pattern 200 B is used four times and the pattern 200 E is used once.
- 6 ⁇ 6 multiplication operations can be performed using ten logic blocks 26 across three stages.
- FIG. 20 A shows a first stage of a 7 ⁇ 7 multiplication operation in which the pattern 200 A is used three times and the pattern 200 H is used once.
- pattern 200 A and pattern 200 H are each used once, and pattern 200 B is used four times.
- FIG. 20 C illustrates a third stage of a 7 ⁇ 7 multiplication operation in which the pattern 200 B is used four times and the pattern 200 E is used once.
- 7 ⁇ 7 multiplication operations can be performed using thirteen and one-half logic blocks 26 across three stages.
- FIG. 21 A shows a first stage of an 8 ⁇ 8 multiplication operation in which the pattern 200 A is used three times and pattern 200 H is used once.
- pattern 200 A and pattern 200 B are used each four times, and pattern 200 H is used twice.
- FIG. 21 C illustrates a third stage of a 8 ⁇ 8 multiplication operation in which pattern 200 B and pattern 200 K are each used twice, and the pattern 200 E is used once.
- 8 ⁇ 8 multiplication operations can be performed using nineteen and one-half logic blocks 26 across three stages.
- FIG. 22 A shows a first stage of a 9 ⁇ 9 multiplication operation in which the pattern 200 A is used four times and the pattern 200 H is used once.
- pattern 200 A is used six times
- pattern 200 B is used five times
- pattern 200 H is used twice.
- FIG. 22 C illustrates a third stage of a 9 ⁇ 9 multiplication operation in which pattern 200 B and pattern 200 L are each used once, and pattern 200 K is used five times.
- 9 ⁇ 9 multiplication operations can be performed using twenty-four logic blocks 26 across three stages.
- FIG. 23 is a schematic diagram of circuitry 82 A that can be used as an alternative to the circuitry 82 and the adders 90 A, 90 B illustrated in FIG. 3 . More specifically, compared to the circuitry 82 illustrated in FIG. 3 , in FIG. 19 , the adders 90 A, 90 B have been replaced with XOR gates 220 A, 220 B and AND gates 222 A, 222 B that enables circuitry 82 A to generate propagating and generating signals that can be used, for example, with carry-propagate adders. Additionally, the XOR gate 220 A may receive an input from the adding circuitry 80 via line 224 , and the AND gate 222 B may receive an input from the adding circuitry via line 226 .
- FIG. 24 is a schematic diagram of circuitry 82 B that can be used as an alternative to the circuitry 82 included in FIG. 3 . More specifically, compared to the circuitry 82 of FIG. 3 , the NOT gates 86 and multiplexers 88 are not included, and the gates 84 B, 84 C (e.g., NAND gates) have been replaced with gates 84 D, 84 E (e.g., AND gates). Additionally, while inputs for gate 84 D are the same as the gate 84 B, gate 84 E has inputs of the output of gate 84 A and input F compared to the output of gate A and input E for the gate 84 C of the circuitry 82 of FIG. 3 .
- the circuitry 82 C may be used, for example, if the logic block 26 is used for unsigned multiplication. Additionally, it should be noted that, in some embodiments, input D 0 may be utilized instead of input F.
- logic block 26 A (and logic block 26 B discussed below) may be utilized to add three two-bit numbers together.
- the bits of one number may be provided as inputs D 0 and C 1
- the bits of another number may be C 0 and B
- the bits of the last number may be A and D 1 .
- input E may be set to “1.”
- FIG. 25 is a schematic diagram of a logic block 26 B.
- the logic block 26 B is generally similar to the logic block 26 A, but the logic block 26 B includes additional circuitry 250 as well as an additional carry line (e.g., Cin 3 , Cout 3 ).
- the additional circuitry 250 includes gates 252 (e.g., AND gate 252 A and NAND gates, 252 B, 252 C), gates 254 (e.g., NOT gates 254 A, 254 B), multiplexers 256 (e.g., multiplexers 256 A, 256 B), and adders 90 C, 90 D.
- the logic block 26 B is able to use nine input bits to generate and reduce eight partial products using two non-carry outputs (e.g., S 0 and S 1 ) and up to three carry outputs (e.g., Cout, Cout 2 , Cout 3 ).
- the gate 252 A may receive input F and an input En 2 . Similar to input En, input En 2 may be an enable/disable signal provided by the integrated circuit device 12 to disable the additional circuitry 250 when the additional circuitry 250 is not used. For example, by disabling the additional circuitry 250 , power usage that may otherwise be caused by toggling signals may be reduced or eliminated.
- the gate 252 A may generate an output, which may be used as an input for both of the gates 252 B, 252 C.
- the gate 252 B may also receive input LSIM and output a value (e.g., that is provided to NOT gate 254 A and multiplexer 256 A), and the gate 252 C may also receive input C 1 and output a value (e.g., that is provided to a logical NOT gate 254 B and a multiplexer 256 B).
- the multiplexers 256 A, 256 B may receive inverter signals (e.g., Inv 2 for multiplexer 256 A and Inv 3 for multiplexer 256 B) from the integrated circuit device 12 and be utilized when performing signed multiplication.
- FIG. 3 includes programmable inverters ((e.g., Inv 2 and Inv 3 ) of FIG. 25 )) other types of inverters may be utilized in other embodiments.
- Adder 90 A may receive an output from the adder 90 A, an output from the multiplexer 256 A, and a carry-in value Cin 3 to produce a value S 0 and a carry out value.
- the carry-in value Cin 3 may be a carry-out value generated by another logic block 26 (e.g., logic block 26 B).
- the adder 90 D may receive the carry out value from the adder 90 C, an output from the adder 90 B, and an output from the multiplexer 256 B to determine a value S 1 and a carry-out value Cout 3 .
- the logic block 26 B can generate and reduce eight partial products.
- the logic block 26 B may utilize patterns 200 M, 200 N illustrated in FIG. 26 .
- the patterns 200 M, 200 N respectively include symbols 202 M, 202 N that can be used and/or determined by a single logic block 26 (e.g., logic block 26 B) to generate outputs 204 M, 204 N, which may each include up to two non-carry bits and up to three carry bits.
- carry-in e.g., values received via Cin, Cin 2 , Cin 3 in FIG. 25
- partial products may be generated.
- pattern 200 M three carry-in values may be received, eight partial products may be determined (based on values of bits of inputs A and B), and the output 204 M that includes up to two non-carry bits and up to three carry bits may be generated.
- the pattern 200 N three carry-in bits and two non-carry bits may be received, six partial products may be generated, and output 204 that includes up to two non-carry bits and up to three carry bits is generated.
- Table 5 is provided below to indicate how inputs A and B may be routed using the logic block 26 B.
- any patterns 200 discussed above with respect to the logic block 26 A may be used with the logic block 26 B in the same manner as described above with respect to logic block 26 A.
- patterns 200 M, 200 N may respectively be utilized to perform multiplication operations (e.g., generating partial products) described above with respect to patterns 200 A, 200 B.
- mappings that can be utilized to perform N ⁇ N multiplication operations using the logic block 26 B will now be discussed.
- the logic block 26 B may perform N ⁇ N multiplication operations in which N is equal to two or three (i.e., 2 ⁇ 2 and 3 ⁇ 3 multiplication operations) using the pattern 200 C and the mapping illustrated in FIG. 12 , respectively. Accordingly, to perform 2 ⁇ 2 multiplication operations, a single logic block 26 B may be used. Also, to perform a 3 ⁇ 3 multiplication operation, two and one-half logic blocks 26 B may be used.
- FIG. 27 illustrates a mapping of a 4 ⁇ 4 multiplication operation. As illustrated, pattern 200 M is used three times, and pattern 200 E is used once. Accordingly, a total of three and one-half logic blocks 26 B in a single stage may be utilized to carry out 4 ⁇ 4 multiplication operations.
- FIG. 28 A illustrates a first stage of a 5 ⁇ 5 multiplication operation in which the pattern 200 M is used twice and pattern 200 H is used once.
- partial products not generated during the first stage can be determined and summed with the bits (e.g., i 0 , i 1 , i 2 , i 3 ) generated during a second stage using the pattern 200 N twice, pattern 200 M once, and pattern 200 E once.
- 5 ⁇ 5 multiplication operations can be performed using six and one-half logic blocks 26 B across two stages.
- FIG. 29 A shows a first stage of a 6 ⁇ 6 multiplication operation in which the pattern 200 M is used three times and pattern 200 H is used once.
- pattern 200 N may be used twice, and patterns 200 E, 200 M may each be used once.
- 6 ⁇ 6 multiplication operations can be performed using eight and one-half logic blocks 26 B across two stages.
- FIG. 30 A shows a first stage of a 7 ⁇ 7 multiplication operation in which the pattern 200 M is used four times, and the pattern 220 H is used once.
- pattern 200 M is used four times, and pattern 200 H is used twice.
- 7 ⁇ 7 multiplication operations can be performed using ten logic blocks 26 B across two stages.
- FIG. 31 A shows a first stage of an 8 ⁇ 8 multiplication operation in which the pattern 200 M is used nine times, pattern 200 E is used once, and pattern 200 H is used once.
- pattern 200 M is used once
- pattern 200 K is used three times.
- 8 ⁇ 8 multiplication operations can be performed using sixteen and one-half logic blocks 26 B across two stages.
- FIG. 32 A shows a first stage of a 9 ⁇ 9 multiplication operation in which the pattern 200 M is used twelve times, pattern 200 E is used once, and pattern 200 G is used once. In a second stage, as illustrated in FIG. 32 B , pattern 200 K is used seven times. As such, 9 ⁇ 9 multiplication operations can be performed using twenty and one-half logic blocks 26 B across two stages.
- mappings discussed above with respect to the logic block 26 B utilize either one or two stages, the mappings use the fewest number of logic blocks 26 B and stages.
- Table 6 is provided.
- the technical effects of the techniques discussed herein enable limited space on integrated circuit devices to be more efficiently utilized by including high density circuitry that can be used to perform multiplication operations.
- the logic blocks 26 discussed herein enable many multiplication operations to be performed simultaneously.
- reduced amounts of stages may be used to perform certain multiplication operations. Accordingly, the techniques described herein enable integrated circuits to perform multiplication operations quickly and efficiently.
- the integrated circuit device 12 be a data processing system or a component of a data processing system.
- the integrated circuit device 12 may be a component of a data processing system 450 , shown in FIG. 33 .
- the data processing system 450 may include a host processor 452 , memory and/or storage circuitry 454 , and a network interface 456 .
- the data processing system 450 may include more or fewer components (e.g., electronic display, user interface structures, application specific integrated circuits (ASICs)).
- ASICs application specific integrated circuits
- the host processor 452 may include any suitable processor, such as an INTEL® Xeon® processor or a reduced-instruction processor (e.g., a reduced instruction set computer (RISC), an Advanced RISC Machine (ARM) processor) that may manage a data processing request for the data processing system 450 (e.g., to perform encryption, decryption, machine learning, video processing, voice recognition, image recognition, data compression, database search ranking, bioinformatics, network security pattern identification, spatial navigation, or the like).
- the memory and/or storage circuitry 454 may include random access memory (RAM), read-only memory (ROM), one or more hard drives, flash memory, or the like. The memory and/or storage circuitry 454 may hold data to be processed by the data processing system 450 .
- the memory and/or storage circuitry 454 may also store configuration programs (bitstreams) for programming the integrated circuit device 12 .
- the network interface 456 may allow the data processing system 450 to communicate with other electronic devices.
- the data processing system 450 may include several different packages or may be contained within a single package on a single package substrate.
- the data processing system 450 may be part of a data center that processes a variety of different requests.
- the data processing system 450 may receive a data processing request via the network interface 456 to perform encryption, decryption, machine learning, video processing, voice recognition, image recognition, data compression, database search ranking, bioinformatics, network security pattern identification, spatial navigation, or some other specialized task.
- the host processor 452 may cause the programmable logic fabric of the integrated circuit device 12 to be programmed with circuitry suitable to implement a requested task. For instance, the host processor 452 may instruct that a configuration data (bitstream) stored on the memory and/or storage circuitry 454 to be programmed into the programmable logic fabric of the integrated circuit device 12 .
- bitstream configuration data
- the configuration data may represent a circuit design for performing multiplication operations that utilize one or more of the logic blocks 26 , which may be mapped to the programmable logic according to the techniques described herein.
- the integrated circuit device 12 may assist the data processing system 450 in performing the requested task, such as performing multiplication operations.
- each DSP circuitry and/or DSP architecture may include any suitable number of elements (e.g., adders, multipliers 64, routing, and/or the like). Accordingly, it should be understood that the disclosure is not intended to be limited to the particular forms disclosed. The disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure as defined by the following appended claims.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- Logic Circuits (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
Abstract
An integrated circuit includes a logic block configured to perform multiplication operations. The logic block includes a plurality of lookup tables configured to receive a plurality of inputs and generate a first plurality of outputs. Additionally, the logic block includes adding circuitry configured to receive the first plurality of outputs and generate a second plurality of outputs. Furthermore, the logic block includes circuitry configured to receive a portion of the plurality of inputs, determine one or more partial products, and generate a third plurality of outputs.
Description
- This application is a continuation of U.S. application Ser. No. 16/729,256, filed Dec. 27, 2019, entitled “Efficient Logic Block Architectures for Dense Mapping of Multipliers,” which is hereby incorporated by reference in its entirety for all purposes.
- This disclosure generally relates to integrated circuits, such as field-programmable gate arrays (FPGAs). More particularly, the present disclosure relates to performing mathematical operations, such as multiplication, implemented using circuitry elements of an integrated circuit (e.g., programmable logic of an FPGA).
- This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present disclosure, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it may be understood that these statements are to be read in this light, and not as admissions of prior art.
- Integrated circuits increasingly carry out functions such as encryption and machine leaning. Encryption and machine learning, as well as many other operations that may take place on integrated circuitry, may utilize multiplier circuitry (e.g., multipliers). For example, multiplier may be programmed onto logic of an integrated circuit and utilized to determine products of numbers being multiplied. However, more multiplier circuitry may be used than desired in some instances, which can result in a limited number of multiplication operations being performed. For instance, when too many logic blocks may be used to perform multiplication, the resources of the integrated circuitry may be inefficiently used, and the integrated circuitry may not be able to perform a desired number of multiplication operations. Moreover, multiplication operations may take more than desired to perform.
- Various aspects of this disclosure may be better understood upon reading the following detailed description and upon reference to the drawings in which:
-
FIG. 1 is a block diagram of a system for implementing arithmetic operations, in accordance with an embodiment; -
FIG. 2 is a block diagram of an integrated circuit in which addition circuitry may be implemented, in accordance with an embodiment; -
FIG. 3 is a schematic diagram of a logic block that may be implemented on the integrated circuit device ofFIG. 1 , in accordance with an embodiment; -
FIG. 4 illustrates an example of unsigned multiplication, in accordance with an embodiment; -
FIG. 5 illustrates an example of signed multiplication, in accordance with an embodiment; -
FIG. 6 is a flow diagram of a process for carrying out multiplication operations, in accordance with an embodiment; -
FIG. 7 illustrated symbols used to discuss multiplication operations discussed herein, in accordance with an embodiment; -
FIG. 8 illustrates two patterns associated with multiplication operations, in accordance with an embodiment; -
FIG. 9 illustrates two additional patterns associated with multiplication operations, in accordance with an embodiment; -
FIG. 10 illustrates four patterns associated with multiplication operations, in accordance with an embodiment; -
FIG. 11 illustrates four additional patterns associated with multiplication operations, in accordance with an embodiment; and -
FIG. 12 illustrates a mapping of a 3×3 multiplication operation, in accordance with an embodiment; -
FIG. 13A illustrates a first stage of mapping of a 4×4 multiplication operation, in accordance with an embodiment; -
FIG. 13B illustrates a second stage of mapping of a 4×4 multiplication operation, in accordance with an embodiment; -
FIG. 14A is a illustrates a first stage of mapping of a 5×5 multiplication operation, in accordance with an embodiment; -
FIG. 14B is a illustrates a second stage of mapping of a 5×5 multiplication operation, in accordance with an embodiment; -
FIG. 15A is a illustrates a first stage of mapping of a 6×6 multiplication operation, in accordance with an embodiment; -
FIG. 15B is a illustrates a second stage of mapping of a 6×6 multiplication operation, in accordance with an embodiment; -
FIG. 16A is a illustrates a first stage of mapping of a 7×7 multiplication operation, in accordance with an embodiment; -
FIG. 16B is a illustrates a second stage of mapping of a 7×7 multiplication operation, in accordance with an embodiment; -
FIG. 17A is a illustrates a first stage of mapping of a 8×8 multiplication operation, in accordance with an embodiment; -
FIG. 17B is a illustrates a second stage of mapping of a 8×8 multiplication operation, in accordance with an embodiment; -
FIG. 18A is a illustrates a first stage of mapping of a 9×9 multiplication operation, in accordance with an embodiment; -
FIG. 18B is a illustrates a second stage of mapping of a 9×9 multiplication operation, in accordance with an embodiment; -
FIG. 19A is a illustrates a first stage of mapping of a 6×6 multiplication operation, in accordance with an embodiment; -
FIG. 19B is a illustrates a second stage of mapping of a 6×6 multiplication operation, in accordance with an embodiment; -
FIG. 19C is a illustrates a third stage of mapping of a 6×6 multiplication operation, in accordance with an embodiment; -
FIG. 20A is a illustrates a first stage of mapping of a 7×7 multiplication operation, in accordance with an embodiment; -
FIG. 20B is a illustrates a second stage of mapping of a 7×7 multiplication operation, in accordance with an embodiment; -
FIG. 20C is a illustrates a third stage of mapping of a 7×7 multiplication operation, in accordance with an embodiment; -
FIG. 21A is a illustrates a first stage of mapping of a 8×8 multiplication operation, in accordance with an embodiment; -
FIG. 21B is a illustrates a second stage of mapping of a 8×8 multiplication operation, in accordance with an embodiment; -
FIG. 21C is a illustrates a third stage of mapping of a 8×8 multiplication operation, in accordance with an embodiment; -
FIG. 22A is a illustrates a first stage of mapping of a 9×9 multiplication operation, in accordance with an embodiment; -
FIG. 22B is a illustrates a second stage of mapping of a 9×9 multiplication operation, in accordance with an embodiment; -
FIG. 22C is a illustrates a third stage of mapping of a 9×9 multiplication operation, in accordance with an embodiment; -
FIG. 23 is a schematic diagram of circuitry that may be included in the logic block ofFIG. 3 , in accordance with an embodiment; -
FIG. 24 is a schematic diagram of circuitry that may be included in the logic block ofFIG. 3 , in accordance with an embodiment; -
FIG. 25 is a schematic diagram of a logic block that can be implemented on the integrated circuit device ofFIG. 1 , in accordance with an embodiment; -
FIG. 26 illustrates two patterns associated with multiplication operations, in accordance with an embodiment; -
FIG. 27 illustrates a mapping of a 4×4 multiplication operation, in accordance with an embodiment; -
FIG. 28A illustrates a first stage of mapping of a 5×5 multiplication operation, in accordance with an embodiment; -
FIG. 28B illustrates a second stage of mapping of a 5×5 multiplication operation, in accordance with an embodiment; -
FIG. 29A illustrates a first stage of mapping of a 6×6 multiplication operation, in accordance with an embodiment; -
FIG. 29B illustrates a second stage of mapping of a 6×6 multiplication operation, in accordance with an embodiment; -
FIG. 30A illustrates a first stage of mapping of a 7×7 multiplication operation, in accordance with an embodiment; -
FIG. 30B illustrates a second stage of mapping of a 7×7 multiplication operation, in accordance with an embodiment; -
FIG. 31A illustrates a first stage of mapping of a 8×8 multiplication operation, in accordance with an embodiment; -
FIG. 31B illustrates a second stage of mapping of a 8×8 multiplication operation, in accordance with an embodiment; -
FIG. 32A illustrates a first stage of mapping of a 9×9 multiplication operation, in accordance with an embodiment; -
FIG. 32B illustrates a second stage of mapping of a 9×9 multiplication operation, in accordance with an embodiment; and -
FIG. 33 is a block diagram of a data processing system, in accordance with an embodiment, in accordance with an embodiment; - One or more specific embodiments will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It may be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it may be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
- When introducing elements of various embodiments of the present disclosure, the articles “a,” “an,” and “the” are intended to mean that there are one or more of the elements. The terms “comprising,” “including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements. Additionally, it should be understood that references to “one embodiment” or “an embodiment” of the present disclosure are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features. Furthermore, the phrase A “based on” B is intended to mean that A is at least partially based on B. Moreover, unless expressly stated otherwise, the term “or” is intended to be inclusive (e.g., logical OR) and not exclusive (e.g., logical XOR). In other words, the phrase A “or” B is intended to mean A, B, or both A and B.
- Integrated circuits, such as programmable logic devices, may be utilized to perform mathematical operations, such as addition and multiplication. For example, logic (e.g., reconfigurable logic) on programmable logic devices can be programmed to perform the mathematical operations. For instance, programmed logic utilized to perform multiplication can be referred to as a “multiplier.” Logic blocks, which may include particular circuit elements (e.g., look-up tables, adders, multiplexers, etc.) may be utilized to perform multiplication. In some cases, the amount of logic blocks of the programmable logic device used to perform multiplication may be undesirably large, which may reduce the amount of the programmable logic device that is available to be programmed (e.g., to perform other functions). The present application is generally directed to more efficient techniques for performing multiplication on programmable logic devices such as, but not limited to, field programmable gate arrays (FPGAs). For example, as discussed below, various architectures for logic blocks are provided that enable fewer logic blocks to be utilized to perform multiplication operations, thereby enabling more multiplication operations to be performed on programmable logic devices.
- With the foregoing in mind,
FIG. 1 illustrates a block diagram of asystem 10 that may implement arithmetic operations. A designer may desire to implement functionality, such as the arithmetic operations of this disclosure, on an integrated circuit device 12 (e.g., a programmable logic device such as a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)). In some cases, the designer may specify a high-level program to be implemented, such as an OpenCL program, which may enable the designer to more efficiently and easily provide programming instructions to configure a set of programmable logic cells for theintegrated circuit device 12 without specific knowledge of low-level hardware description languages (e.g., Verilog or VHDL). For example, because OpenCL is quite similar to other high-level programming languages, such as C++, designers of programmable logic familiar with such programming languages may have a reduced learning curve than designers that are required to learn unfamiliar low-level hardware description languages to implement new functionalities in theintegrated circuit device 12. - The designers may implement their high-level designs using
design software 14, such as a version of Intel® Quartus® by INTEL CORPORATION. Thedesign software 14 may use acompiler 16 to convert the high-level program into a lower-level description. Thecompiler 16 may provide machine-readable instructions representative of the high-level program to ahost 18 and theintegrated circuit device 12. Thehost 18 may receive ahost program 22 which may be implemented by the kernel programs 20. To implement thehost program 22, thehost 18 may communicate instructions from thehost program 22 to theintegrated circuit device 12 via acommunications link 24, which may be, for example, direct memory access (DMA) communications or peripheral component interconnect express (PCIe) communications. In some embodiments, thekernel programs 20 and thehost 18 may enable configuration of alogic block 26 on theintegrated circuit device 12. Thelogic block 26 may include circuitry and/or other logic elements and may be configured to implement arithmetic operations, such as addition and multiplication. - While the techniques described herein relate to the application of a high-level program, in some embodiments, the designer may use the
design software 14 to generate and/or to specify a low-level program, such as the low-level hardware description languages described above. Further, in some embodiments, thesystem 10 may be implemented without aseparate host program 22. Moreover, in some embodiments, the techniques described herein may be implemented in circuitry as a non-programmable circuit design. Thus, embodiments described herein are intended to be illustrative and not limiting. - Turning now to a more detailed discussion of the
integrated circuit device 12,FIG. 2 illustrates an example of theintegrated circuit device 12 as a programmable logic device, such as a field-programmable gate array (FPGA). Further, it should be understood that theintegrated circuit device 12 may be any other suitable type of programmable logic device (e.g., an ASIC and/or application-specific standard product). As shown, integratedcircuit device 12 may have input/output circuitry 42 for driving signals off device and for receiving signals from other devices via input/output pins 44.Interconnection resources 46, such as global and local vertical and horizontal conductive lines and buses, may be used to route signals onintegrated circuit device 12. Additionally,interconnection resources 46 may include fixed interconnects (conductive lines) and programmable interconnects (i.e., programmable connections between respective fixed interconnects).Programmable logic 48 may include combinational and sequential logic circuitry. For example,programmable logic 48 may include look-up tables, registers, and multiplexers. In various embodiments, theprogrammable logic 48 may be configured to perform a custom logic function. The programmable interconnects associated with interconnection resources may be considered to be a part ofprogrammable logic 48. - Programmable logic devices, such as the
integrated circuit device 12, may containprogrammable elements 50 with theprogrammable logic 48. For example, as discussed above, a designer (e.g., a customer) may program (e.g., configure) theprogrammable logic 48 to perform one or more desired functions. By way of example, some programmable logic devices may be programmed by configuring theirprogrammable elements 50 using mask programming arrangements, which is performed during semiconductor manufacturing. Other programmable logic devices are configured after semiconductor fabrication operations have been completed, such as by using electrical programming or laser programming to program theirprogrammable elements 50. In general,programmable elements 50 may be based on any suitable programmable technology, such as fuses, antifuses, electrically-programmable read-only-memory technology, random-access memory cells, mask-programmed elements, and so forth. - Many programmable logic devices are electrically programmed. With electrical programming arrangements, the
programmable elements 50 may be formed from one or more memory cells. For example, during programming, configuration data is loaded into the memorycells using pins 44 and input/output circuitry 42. In one embodiment, the memory cells may be implemented as random-access-memory (RAM) cells. The use of memory cells based on RAM technology is described herein is intended to be only one example. Further, because these RAM cells are loaded with configuration data during programming, they are sometimes referred to as configuration RAM cells (CRAM). These memory cells may each provide a corresponding static control output signal that controls the state of an associated logic component inprogrammable logic 48. For instance, in some embodiments, the output signals may be applied to the gates of metal-oxide-semiconductor (MOS) transistors within theprogrammable logic 48. - Keeping the discussion of
FIG. 1 andFIG. 2 in mind, a user (e.g., designer) may utilize thedesign software 14 to implement thelogic block 26 on theprogrammable logic 48 of theintegrated circuit device 12. In particular, the designer may specify in a high-level program that mathematical operations such as addition and multiplication be performed. Thecompiler 16 may convert the high-level program into a lower-level description that is used to program theprogrammable logic 48 to perform addition. With this in mind,FIG. 3 illustrates alogic block 26A that may be utilized to perform mathematical operations such as multiplication. - As illustrated, the
logic block 26A includes four lookup tables (LUTs) 60 (e.g.,LUTs 60A-60D) that may be four-input LUTs. In other words, each of the LUTs 60 may have four inputs (e.g., four single bit inputs), and the LUTs 60 may output one or more values (e.g., bit values) based on how each of the LUTs is programmed. For example, afirst LUT 60A andsecond LUT 60B may each receive inputs A, B, C0, and DO and output values based on the inputs A, B, C0, DO. For instance, the outputted value may be partial products determined while performing multiplication. Somewhat similarly, athird LUT 60C andfourth LUT 60D may each receive inputs A, B, D1 and either C0 or C1. Thethird LUT 60C andfourth LUT 60D may each output a bit value based on the input values received. Additionally, amultiplexer 62 that can receive inputs C0, C1, and a control signal. Theintegrated circuit device 12 may send control signals to cause themultiplexer 62 to output one value (e.g., C0 or C1) to be used as inputs for theLUTs - The LUTs 60 may be utilized to perform various mathematical operations and logic operations. For example, the LUTs 60 may perform logic operations on inputted values (e.g., A, B, C0, C1, D0, D1) while operating as carry-lookahead logic (e.g., performing addition). The outputs (e.g., at P0 and P1) may be utilized as propagating carries, and the outputs may be utilized as generating carries (e.g., at G0 and G1). The outputs may also be partial products of multiplication operations. The values generated by the LUTs 60 may be utilized as inputs into other circuitry included in the
logic block 26A, such as amultiplexer 64,multiplexer 66,multiplexer 68, andmultiplexer 70. Themultiplexer 64 may receive the output values fromLUTs multiplexer 64 may output a value (e.g., O5_0). Similarly, themultiplexer 66 may receive the output values from theLUTs multiplexer 66 may output a value that may be used an input of themultiplexer 68, which may also receive the output of themultiplexer 64 and an input F. Themultiplexer 68 may generate an output value O6. Themultiplexer 70 may receive the outputs of theLUTs - Outputs of the LUTs 60 may also be used as inputs into
adder circuitry 80, which may include two adders that are communicatively coupled to one another (e.g., two carry-lookahead adders in which one of the adders receives one or more outputs from the other adder). Theadder circuitry 80 may also receive a carry-in value (e.g., Cin), for example, from other circuitry included in theintegrated circuit device 12, such as anotherlogic block 26. More specifically, inputs A, B, C0, D0, and D1 may be used in generating four partial products, propagating carries (e.g., P0 and P1), and generating carries (e.g., G0 and G1). As discussed below, theadder circuitry 80 may reduce the partial products. Furthermore, theadder circuitry 80 may generate a carry-out value (e.g., Cout) that may be provided to other circuitry included in the integrated circuit, such as anotherlogic block 26. - The
logic block 26A also includescircuitry 82 that, as illustrated, has gates 84 (e.g., logical AND gates, logical NAND gates) and programmable inverters. Thecircuitry 82 may use inputs E, D0, and C1 to generate two partial products in addition to the four partial products generated by the addingcircuitry 80. Thecircuitry 82 may reduce these two partial products as well as the four partial products generated by theadder circuitry 80. As such, thelogic block 26 may generate and reduce six partial products. - The
gate 84A may receive input E and an input En. The input En may be an enable/disable signal provided by theintegrated circuit device 12 to disable thecircuitry 82 when thecircuitry 82 is not used and enable thecircuitry 82 when thecircuitry 82 is to be used. For example, by disabling thecircuitry 82, power usage that may otherwise be caused by toggling signals may be reduced or eliminated. Thegate 84A may generate an output, which may be used as an input for both of thegates gate 84B may receive input C1 and output a value (e.g., that is provided to alogical NOT gate 86A and amultiplexer 88A), and thegate 84C may receive input D0 and output a value (e.g., that is provided to alogical NOT gate 86B and amultiplexer 88B). The multiplexers 88 may receive inverter signals (e.g., Inv0 formultiplexer 88A and Inv1 formultiplexer 88B) from theintegrated circuit device 12 and be utilized when performing signed multiplication (e.g., multiplication that may include positive and negative values). Furthermore, while the embodiment illustrated inFIG. 3 includes programmable inverters ((e.g., Inv0 and Inv1) ofFIG. 3 )) other types of inverters may be utilized in other embodiments. - Additionally, the
logic block 26A includes twoadditional adders adder 90A may receive an output from the addingcircuitry 80, an output from themultiplexer 88A, and a carry-in value Cin2 to produce a value S0 and a carry out value. The carry-in value Cin2 may be a carry-out value generated by anotherlogic block 26. Theadder 90B may receive the carry out value from theadder 90A, an output from the addingcircuitry 80, and an output from themultiplexer 88B to determine a value S1 and a carry-out value Cout2. - As discussed below, the
logic block 26A may be utilized to perform multiplication operations. However, before progressing to specific examples of multiplication operations carried out by thelogic block 26A, a general discussion of multiplication and mapping is provided. As discussed below, mapping may be undertaken in order to determine how to program theprogrammable logic 48 of theintegrated circuit device 12 to perform multiplication. - Multiplication operations can generally be performed in two stages: partial product generation and partial product reduction. In partial product generation, each bit of one input is multiplied with each bit of another input. To help elaborate,
FIG. 4 illustrates a diagram 100 showing aninput 102 being multiplied by anotherinput 104. In particular, theinput 102 is a five bit input (e.g., input A having bits 0-4), and theinput 104 is a three bit input (e.g., input B having bits 0-2). Multiplying each bit of theinput 102 with each bit of theinput 104 results inpartial products 106. In the partial product reduction state, thepartial products 106 can be summed (e.g., using adders) to determine anoutput 108 that is the sum of thepartial products 106 and the product of theinputs - The example provided in
FIG. 4 can be referred to as an example of unsigned multiplication, meaning whether a value is positive or negative is not taken into account. Signed multiplication, on the other hand, does take into account whether values being multiplied are positive or negative.FIG. 5 is a diagram 130 illustrating an example of signed multiplication. Mostsignificant bits inputs partial products 140 are determined. Some of thepartial products 140 may be inverted, as indicated by shading inFIG. 5 . For instance, a partial product may be inverted when it is a partial product involving one of the mostsignificant bits partial products 140. Additionally, as illustrated inFIG. 5 , a mostsignificant bit 142 of an output 144 (i.e., the sum of the partial products 140) may be inverted. - To perform multiplication using the logic blocks 26, the
integrated circuit device 12 may perform a mapping process to determine how the bits of the values being multiplied are input into the inputs of the logic blocks 26. With this in mind,FIG. 6 illustrates a flow diagram of aprocess 160 that may be performed using the integratedcircuit device 12 using one or more logic blocks 26, for instance, to carry out multiplication operations. - At
process block 162, inputs may be received. For example, the inputs may include one or more bits that are to be multiplied. - At
process block 164, theintegrated circuit device 12 may determine a mapping for the inputs. In other words, theintegrated circuit device 12 may determine how to carry out the multiplication operation involving the inputs. To determine a mapping, theintegrated circuit device 12 may determine one or more patterns among the inputs as well as partial products that may be generated while determining a product of the two inputs. Examples of specific patterns are discussed below in more detail. - At
process block 166, theintegrated circuit device 12 may multiply the two inputs based on the mapping. For instance, circuitry in the integrated circuit device 12 (e.g.,programmable logic 48, lookup tables 60) may be programmed based on the mapping, and, as discussed below, components of each pattern (e.g., bits of an input or partial products) may be input to specific inputs of the logic blocks 26 based on the mapping. Accordingly, the logic blocks 26 may determine a product of two values being multiplied. - Keeping the foregoing in mind, different patterns will be discussed. However, before proceeding to discuss the patterns,
FIG. 7 is provided to show types of symbols that are used to discuss the patterns. In particular,FIG. 7 includes a partial product 180 (indicated by a circle), a non-carry bit 182 (indicated by a square), and a carry bit 184 (indicated by a pentagon). The illustratedpartial product 180 is a partial product of Bj and Ai, where i and j are bits of inputs A and B, respectively (e.g., the ith bit of input A and the jth bit of input B). Various types of patterns and how bits of inputs being multiplied may be inputted into the logic blocks 26 will now be discussed. -
FIG. 8 illustrates twopatterns patterns symbols logic block 26A) to generateoutputs pattern 200A, two carry-in values may be received, six partial products may be determined (based on values of bits of inputs A and B), and theoutput 204A that includes two non-carry bits and two carry bits may be the output from thelogic block 26A. More specifically, when utilizing thepatterns logic block 26A may receive two carry-in values (e.g., from anotherlogic block 26 communicatively coupled to thelogic block 26A) and up to seven inputs, and thelogic block 26A may generate up to two carry bits and two non-carry bits. Table 1 is provided below to indicate how inputs A and B may be routed using thelogic block 26A. -
TABLE 1 Input on logic block Input 26 in FIG. 3 Value A Ai B Ai+1 C0 Bj D0 Bj−1 D1 Bj+1 C1 Bj−2 E Ai+2
Additionally, referring briefly toFIG. 3 , carry bits may be received (e.g., from another logic block 26) via carry lines Cin and Cin2. Furthermore the bits of theoutputs patterns FIG. 8 , it should be noted that thepattern 200B may be utilized to perform multiply-accumulate operations. - Continuing with the discussion of patterns,
FIG. 9 illustrates twopatterns 200C, 200D. Thepatterns 200C, 200D respectively includesymbols 202C, 202D that can be used and/or determined by a single logic block 26 (e.g.,logic block 26A) to generateoutputs 204C, 204D. In pattern 200C, four partial products may be determined (based on values of bits of inputs A and B), and the output 204C that includes up to four non-carry bits may be the output from thelogic block 26A. Inpattern 200D, three partial products (i.e.,symbols 202D) may be determined, and theoutput 204D may include up to three non-carry bits. Table 2 is provided below to indicate how inputs A and B may be routed using thelogic block 26A when utilizing pattern 200C, and Table 3 is provided to indicate how inputs A and B may be routed using thelogic block 26A when utilizingpattern 200D. -
TABLE 2 Input on logic block 26 inInput FIG. 3 for Pattern 200C Value A Ai B Ai+1 C0 Bj D0 Bj+1 C1 Bj D1 Bj+1 E “1” F “1” -
TABLE 3 Input on logic block 26 inInput FIG. 3 for Pattern 200DValue A Ai C0 Bj C1 Bj+1 D1 Bj+2 E “1” F “1”
Referring toFIG. 3 , the output 204C generated using the pattern 200C, may be at S0, S1, O5_0 and O5_1. Theoutput 204D generated using thepattern 200D, may be at S0, S1, and O5_1. -
FIG. 10 illustrates fourpatterns patterns symbols logic block 26A) to generateoutputs pattern 200E, one carry-in bit may be used as an input, and one partial product may be determined (based on values of bits of inputs A and B), and theoutput 204E includes up to two non-carry bits. Inpattern 200F, two carry-in bits are received and one partial product is determined. Theoutput 204F may include up to three non-carry bits. Inpattern 200G, one carry bit may be received and two partial products may be determined; theoutput 204G may include up to three non-carry bits. Inpattern 200H, two carry bits may be received and two partial products may be generated. Theoutput 204H may include up to three non-carry bits. - Referring now to
FIG. 3 , when utilizing thepattern 200E, partial product inputs may be connected to inputs C0 and D0 and input E may be set to “1.” The output will be generated at output O5_0. Additionally, the incoming carry bit (e.g., received from another logic block 26) may be received via carry line Cin, and a resulting output may be at S0. Additionally, it should be noted that utilizing thepattern 200E only uses a half logic block. In other words, when utilizing thepattern 200E, only two adjacent LUTs 60, one adder of theadder circuitry 80, and one of theadders output 204E. Accordingly, the other half of thelogic block 26A may be utilized for other determinations (e.g., using thepattern 200E on another set of inputs). - When utilizing the
pattern 200F, partial product inputs may be connected to inputs C1 and D1, and input F is set to “1.” The portion of theoutput 204F arising from a partial product input may be output via output O5_1. Carry bits may be received (e.g., from another logic block 26) via carry lines Cin and Cin2, and the corresponding portions of theoutput 204F are output via outputs S0 and S1. - When utilizing the
pattern 200G, one of the partial product inputs is connected to inputs C0 and D0, and the other partial product input is connected to inputs C1 and D1. Inputs E and F are set to “1.” The portions of theoutput 204G associated with the partial products will be generated at outputs 050 and O5_1. Similar topattern 200E, the portion of theoutput 204G associated with a carry bit (e.g., received via carry line Cin from another logic block 26), may turn into a portion of the output 204 that is output at S0. -
Pattern 200H may be used to generate a partial product at the same bit position as an incoming carry bits and reduce the partial product. When using thepattern 200H, one partial product input is connected to inputs C0 and D0. The outputs will be in S0 and S1. The second partial product is connected to inputs C1 and D1, and input F is set to “1” The corresponding portion of theoutput 204H will be at output O5_1. Carry bits may be received (e.g., from another logic block 26) via carry lines Cin and Cin2, and the corresponding portions of theoutput 204H are output via outputs S0 and S1. - Continuing with the discussion of patterns,
FIG. 11 illustrates fourpatterns patterns symbols logic block 26A) to generateoutputs pattern 200J, four non-carry bits are received, and theoutput 204J may include up to three non-carry bits. Inpattern 200K, two carry bits and six non-carry bits may be received. Theoutput 204K may include up to two non-carry bits and two carry bits. Inpattern 200L, one carry bit and one non-carry bit may be received, and theoutput 204L may include one non-carry bit. - Referring now to
FIG. 3 , when utilizing the pattern 200I, two inputs (i.e., two of the non-carry bits included in thesymbols 202J) are connected to inputs A and B, and the other two inputs are connected to inputs C0, C1, D0, and D. The outputs will be at outputs S0, S1, O5_0, and O5_1. When utilizing the pattern 200I, the inputs will be connected in the same manner as the inputs when using the pattern 200I. However, the bits of the output 2004J will be at outputs S0, S1, and O5_1. - When utilizing the
pattern 200K, inputs may be connected to according to Table 4 below: -
TABLE 4 Input on logic block 26 inInput FIG. 3 for Pattern 200KValue A S0 B S1 C0 S2 C1 S4 D0 S5 D1 S3 E “1” - The received carry bits may be received via carry lines Cin and Cin2 from another
logic block 26 that is communicatively coupled to thelogic block 26A inputs C1 and D1, and input F is set to “1.” The non-carry bits of theoutput 204K may be generated at outputs S0 and S1, and the carry bits may be output via Cout and Cout2. -
Pattern 200L may be used to generate when a carry bit and a non-carry bit are in the most significant bit of an output. In this situation, because both the carry bit and non-carry bit will not be equal to one, meaning an output generating by summing the carry bit and non-carry bit will generate a non-carry bit and no carry bits. When utilizing thepattern 200L, the carry bit may be received via carry line Cin, and the non-carry bit may be connected to input C0. The outputs may be generated at S0. Additionally, it should be noted that thepattern 200L only uses half of alogic block 26, meaning the other half of the logic block may be utilized to perform other determinations. - Bearing the discussion of the
patterns 200 in mind,FIGS. 9 and 12-22C will be discussed to show mappings of various examples of N×N multiplication operations. The mappings may include one or more of thepatterns 200 discussed above. In these examples, N is an integer ranging in value from two to nine indicative of the number of bits included in an input. For example, a 2×2 multiplication operation involves multiplying two inputs that each include two bits. Returning briefly toFIG. 9 , to perform 2×2 multiplication operations, theintegrated circuit device 12 may utilize pattern 200C to generate the output 204C as discussed above. Accordingly, 2×2 multiplication operations may be carried out using asingle logic block 26. -
FIG. 12 illustrates a 3×3 multiplication operation. As illustrated, the 3×3 multiplication operation may be carried out using thepattern 200A twice and thepattern 200E once. In this example, partial product A0B0 has been moved from the top right position to the left. The output bits may be moved around to create the correct output. As such, 3×3 multiplication operations may be performed using two and one-half logic blocks 26. - N×N multiplication operations in which N is greater than 3 may be performed using more than one stage. A “stage” generally refers to the number rows (or column, depending on orientation) of logic blocks 26 used to perform a multiplication operation. For example, the 2×2 and 3×3 multiplication operations discussed above can be done with a single stage. As discussed below, N×N multiplication operations in which N ranges from 4 to 9 may be performed in two stages. In these examples, bits may be determined using a first stage of logic blocks, and the bits may be provided as inputs to logic blocks 26 included in a second stage of logic blocks 26 (e.g., one or more logic blocks communicatively coupled to the logic blocks 26 of the first stage of logic blocks).
-
FIG. 13A illustrates a first stage of a 4×4 multiplication operation. As illustrated, pattern 200C is used (which will generate non-carry bits of i0, i1, and i2). During a second stage illustrated inFIG. 13B , thepattern 200A may be used once,pattern 200B may be used twice, andpattern 200E may be used once. Accordingly, a total of four and one-half logic blocks 26 across two stages may be utilized to carry out 4×4 multiplication operations. -
FIG. 14A illustrates a first stage of a 5×5 multiplication operation in which thepattern 200A is used three times andpattern 200E is used once. As shown inFIG. 14B , partial products not generated during the first stage can be determined and summed with the bits generated during a second stage using thepattern 200B three times and thepattern 200E once. Accordingly, 5×5 multiplication operations can be performed using sevenlogic blocks 26 across two stages. - For cases in which N is two, three, four, or five, the examples provided above include both the fewest number of stages and logic blocks 26 that can be used to complete multiplication operations. However, for N×N multiplication operations discussed herein in which N is six, seven, eight, or nine, the mapping for a particular multiplication operation may be utilized to use the fewest number of stages or the fewest number of logic blocks 26.
FIGS. 15A-18B relate to N×N multiplication operations in which the fewest number of stages is used, andFIGS. 19A-22C relate to N×N multiplication operation in which the fewest number of logic blocks 26 is utilized. -
FIG. 15A shows a first stage of a 6×6 multiplication operation in which thepattern 200A is used five times and thepattern 200H is used twice. In a second stage, as illustrated inFIG. 15B ,patterns pattern 200E may be used once. As such, 6×6 multiplication operations can be performed using eleven and one-half logic blocks 26 across two stages. -
FIG. 16A shows a first stage of a 7×7 multiplication operation in which thepattern 200A is used seven times, andpatterns FIG. 16B ,pattern 200B is used twice,pattern 200K is used three times, andpattern 200E is used once. As such, 7×7 multiplication operations can be performed using fourteen and one-half logic blocks 26 across two stages. -
FIG. 17A shows a first stage of an 8×8 multiplication operation in which thepattern 200A is used twelve times,pattern 200E is used twice, andpattern 200H is used twice. In a second stage, as illustrated inFIG. 17B ,pattern 200B is used once,pattern 200K is used five times, and pattern 200I is used once. As such, 8×8 multiplication operations can be performed using twenty and one-half logic blocks 26 across two stages. -
FIG. 18A shows a first stage of a 9×9 multiplication operation in which thepattern 200A is used fifteen times and thepattern 200F is used three times. In a second stage, as illustrated inFIG. 18B ,pattern 200K is used six times, andpattern 200L is used once. As such, 9×9 multiplication operations can be performed using twenty-five and one-half logic blocks 26 across two stages. - As noted above,
FIGS. 19A-22C provide examples of mappings for performing N×N multiplication operations in which the fewest number of logic blocks 26 is used.FIG. 19A shows a first stage of a 6×6 multiplication operation in which the pattern 200C is used once. In a second stage, as illustrated inFIG. 19B ,pattern 200A andpattern 200B are each used twice, andpattern 200E is used once.FIG. 19C illustrates a third stage of a 6×6 multiplication operation in which thepattern 200B is used four times and thepattern 200E is used once. As such, 6×6 multiplication operations can be performed using tenlogic blocks 26 across three stages. -
FIG. 20A shows a first stage of a 7×7 multiplication operation in which thepattern 200A is used three times and thepattern 200H is used once. In a second stage, as illustrated inFIG. 20B ,pattern 200A andpattern 200H are each used once, andpattern 200B is used four times.FIG. 20C illustrates a third stage of a 7×7 multiplication operation in which thepattern 200B is used four times and thepattern 200E is used once. As such, 7×7 multiplication operations can be performed using thirteen and one-half logic blocks 26 across three stages. -
FIG. 21A shows a first stage of an 8×8 multiplication operation in which thepattern 200A is used three times andpattern 200H is used once. In a second stage, as illustrated inFIG. 21B ,pattern 200A andpattern 200B are used each four times, andpattern 200H is used twice.FIG. 21C illustrates a third stage of a 8×8 multiplication operation in whichpattern 200B andpattern 200K are each used twice, and thepattern 200E is used once. As such, 8×8 multiplication operations can be performed using nineteen and one-half logic blocks 26 across three stages. -
FIG. 22A shows a first stage of a 9×9 multiplication operation in which thepattern 200A is used four times and thepattern 200H is used once. In a second stage, as illustrated inFIG. 22B ,pattern 200A is used six times,pattern 200B is used five times, andpattern 200H is used twice.FIG. 22C illustrates a third stage of a 9×9 multiplication operation in whichpattern 200B andpattern 200L are each used once, andpattern 200K is used five times. As such, 9×9 multiplication operations can be performed using twenty-fourlogic blocks 26 across three stages. - Continuing with the drawings,
FIG. 23 is a schematic diagram ofcircuitry 82A that can be used as an alternative to thecircuitry 82 and theadders FIG. 3 . More specifically, compared to thecircuitry 82 illustrated inFIG. 3 , inFIG. 19 , theadders XOR gates gates circuitry 82A to generate propagating and generating signals that can be used, for example, with carry-propagate adders. Additionally, theXOR gate 220A may receive an input from the addingcircuitry 80 vialine 224, and the ANDgate 222B may receive an input from the adding circuitry vialine 226. - Somewhat similarly,
FIG. 24 is a schematic diagram ofcircuitry 82B that can be used as an alternative to thecircuitry 82 included inFIG. 3 . More specifically, compared to thecircuitry 82 ofFIG. 3 , the NOT gates 86 and multiplexers 88 are not included, and thegates gates 84D, 84E (e.g., AND gates). Additionally, while inputs forgate 84D are the same as thegate 84B, gate 84E has inputs of the output ofgate 84A and input F compared to the output of gate A and input E for thegate 84C of thecircuitry 82 ofFIG. 3 . The circuitry 82C may be used, for example, if thelogic block 26 is used for unsigned multiplication. Additionally, it should be noted that, in some embodiments, input D0 may be utilized instead of input F. - Before proceeding to discuss another embodiment of the
logic block 26, it should be noted that thelogic block 26A (andlogic block 26B discussed below) may be utilized to add three two-bit numbers together. The bits of one number may be provided as inputs D0 and C1, the bits of another number may be C0 and B, and the bits of the last number may be A and D1. Additionally, input E may be set to “1.” - Continuing with the drawings,
FIG. 25 is a schematic diagram of alogic block 26B. Thelogic block 26B is generally similar to thelogic block 26A, but thelogic block 26B includesadditional circuitry 250 as well as an additional carry line (e.g., Cin3, Cout 3). Theadditional circuitry 250 includes gates 252 (e.g., AND gate 252A and NAND gates, 252B, 252C), gates 254 (e.g.,NOT gates adders additional circuitry 250, thelogic block 26B is able to use nine input bits to generate and reduce eight partial products using two non-carry outputs (e.g., S0 and S1) and up to three carry outputs (e.g., Cout, Cout2, Cout3). - The gate 252A may receive input F and an input En2. Similar to input En, input En2 may be an enable/disable signal provided by the
integrated circuit device 12 to disable theadditional circuitry 250 when theadditional circuitry 250 is not used. For example, by disabling theadditional circuitry 250, power usage that may otherwise be caused by toggling signals may be reduced or eliminated. The gate 252A may generate an output, which may be used as an input for both of thegates gate 252B may also receive input LSIM and output a value (e.g., that is provided toNOT gate 254A andmultiplexer 256A), and thegate 252C may also receive input C1 and output a value (e.g., that is provided to alogical NOT gate 254B and amultiplexer 256B). Themultiplexers multiplexer 256A and Inv3 formultiplexer 256B) from theintegrated circuit device 12 and be utilized when performing signed multiplication. Furthermore, while the embodiment illustrated inFIG. 3 includes programmable inverters ((e.g., Inv2 and Inv3) ofFIG. 25 )) other types of inverters may be utilized in other embodiments. -
Adder 90A may receive an output from theadder 90A, an output from themultiplexer 256A, and a carry-in value Cin3 to produce a value S0 and a carry out value. The carry-in value Cin3 may be a carry-out value generated by another logic block 26 (e.g.,logic block 26B). Theadder 90D may receive the carry out value from theadder 90C, an output from theadder 90B, and an output from themultiplexer 256B to determine a value S1 and a carry-out value Cout3. - As noted above, the
logic block 26B can generate and reduce eight partial products. To do so, thelogic block 26B may utilizepatterns FIG. 26 . Thepatterns symbols logic block 26B) to generateoutputs FIG. 25 ) values may be utilized, while partial products may be generated. For instance, inpattern 200M, three carry-in values may be received, eight partial products may be determined (based on values of bits of inputs A and B), and theoutput 204M that includes up to two non-carry bits and up to three carry bits may be generated. For thepattern 200N, three carry-in bits and two non-carry bits may be received, six partial products may be generated, and output 204 that includes up to two non-carry bits and up to three carry bits is generated. Table 5 is provided below to indicate how inputs A and B may be routed using thelogic block 26B. -
TABLE 5 Input on logic block Input 26 in FIG. 3 Value A Ai B Ai+1 C0 Bj D0 Bj−1 D1 Bj+1 C1 Bj−2 E Ai+2 F Ai+3 LSIM Bi−3 - Additionally, it should be noted that any
patterns 200 discussed above with respect to thelogic block 26A may be used with thelogic block 26B in the same manner as described above with respect tologic block 26A. Moreover,patterns patterns logic block 26B will now be discussed. - The
logic block 26B may perform N×N multiplication operations in which N is equal to two or three (i.e., 2×2 and 3×3 multiplication operations) using the pattern 200C and the mapping illustrated inFIG. 12 , respectively. Accordingly, to perform 2×2 multiplication operations, asingle logic block 26B may be used. Also, to perform a 3×3 multiplication operation, two and one-half logic blocks 26B may be used. -
FIG. 27 illustrates a mapping of a 4×4 multiplication operation. As illustrated,pattern 200M is used three times, andpattern 200E is used once. Accordingly, a total of three and one-half logic blocks 26B in a single stage may be utilized to carry out 4×4 multiplication operations. -
FIG. 28A illustrates a first stage of a 5×5 multiplication operation in which thepattern 200M is used twice andpattern 200H is used once. As shown inFIG. 18B , partial products not generated during the first stage can be determined and summed with the bits (e.g., i0, i1, i2, i3) generated during a second stage using thepattern 200N twice,pattern 200M once, andpattern 200E once. Accordingly, 5×5 multiplication operations can be performed using six and one-half logic blocks 26B across two stages. -
FIG. 29A shows a first stage of a 6×6 multiplication operation in which thepattern 200M is used three times andpattern 200H is used once. In a second stage, as illustrated inFIG. 29B ,pattern 200N may be used twice, andpatterns -
FIG. 30A shows a first stage of a 7×7 multiplication operation in which thepattern 200M is used four times, and the pattern 220H is used once. In a second stage, as illustrated inFIG. 30B ,pattern 200M is used four times, andpattern 200H is used twice. As such, 7×7 multiplication operations can be performed using tenlogic blocks 26B across two stages. -
FIG. 31A shows a first stage of an 8×8 multiplication operation in which thepattern 200M is used nine times,pattern 200E is used once, andpattern 200H is used once. In a second stage, as illustrated inFIG. 31B ,pattern 200M is used once, andpattern 200K is used three times. As such, 8×8 multiplication operations can be performed using sixteen and one-half logic blocks 26B across two stages. -
FIG. 32A shows a first stage of a 9×9 multiplication operation in which thepattern 200M is used twelve times,pattern 200E is used once, andpattern 200G is used once. In a second stage, as illustrated inFIG. 32B ,pattern 200K is used seven times. As such, 9×9 multiplication operations can be performed using twenty and one-half logic blocks 26B across two stages. - It should be noted that because each of the mappings discussed above with respect to the
logic block 26B utilize either one or two stages, the mappings use the fewest number oflogic blocks 26B and stages. To help summarize the mappings provided herein for thelogic block 26A and thelogic block 26B, Table 6 is provided. -
TABLE 6 Number of Logic Blocks Number of 26A for lowest Logic Blocks Number of number of logic 26A for lowest Logic Blocks Multiplication blocks (number number of stages 26B (number Operation of stages) (number of stages) of stages) 2 × 2 1 (1) 1 (1) 1 (1) 3 × 3 2.5 (1) 2.5 (1) 2.5 (1) 4 × 4 4.5 (2) 4.5 (2) 3.5 (1) 5 × 5 7 (2) 7 (2) 6.5 (2) 6 × 6 10 (3) 11.5 (2) 8.5 (2) 7 × 7 13.5 (3) 14.5 (2) 10 (2) 8 × 8 19.5 (3) 20.5 (2) 16.5 (2) 9 × 9 24 (3) 25.5 (2) 20.5 (2) - The technical effects of the techniques discussed herein enable limited space on integrated circuit devices to be more efficiently utilized by including high density circuitry that can be used to perform multiplication operations. For example, the logic blocks 26 discussed herein enable many multiplication operations to be performed simultaneously. Furthermore, reduced amounts of stages may be used to perform certain multiplication operations. Accordingly, the techniques described herein enable integrated circuits to perform multiplication operations quickly and efficiently.
- The
integrated circuit device 12 be a data processing system or a component of a data processing system. For example, theintegrated circuit device 12 may be a component of adata processing system 450, shown inFIG. 33 . Thedata processing system 450 may include ahost processor 452, memory and/orstorage circuitry 454, and anetwork interface 456. Thedata processing system 450 may include more or fewer components (e.g., electronic display, user interface structures, application specific integrated circuits (ASICs)). Thehost processor 452 may include any suitable processor, such as an INTEL® Xeon® processor or a reduced-instruction processor (e.g., a reduced instruction set computer (RISC), an Advanced RISC Machine (ARM) processor) that may manage a data processing request for the data processing system 450 (e.g., to perform encryption, decryption, machine learning, video processing, voice recognition, image recognition, data compression, database search ranking, bioinformatics, network security pattern identification, spatial navigation, or the like). The memory and/orstorage circuitry 454 may include random access memory (RAM), read-only memory (ROM), one or more hard drives, flash memory, or the like. The memory and/orstorage circuitry 454 may hold data to be processed by thedata processing system 450. In some cases, the memory and/orstorage circuitry 454 may also store configuration programs (bitstreams) for programming theintegrated circuit device 12. Thenetwork interface 456 may allow thedata processing system 450 to communicate with other electronic devices. Thedata processing system 450 may include several different packages or may be contained within a single package on a single package substrate. - In one example, the
data processing system 450 may be part of a data center that processes a variety of different requests. For instance, thedata processing system 450 may receive a data processing request via thenetwork interface 456 to perform encryption, decryption, machine learning, video processing, voice recognition, image recognition, data compression, database search ranking, bioinformatics, network security pattern identification, spatial navigation, or some other specialized task. Thehost processor 452 may cause the programmable logic fabric of theintegrated circuit device 12 to be programmed with circuitry suitable to implement a requested task. For instance, thehost processor 452 may instruct that a configuration data (bitstream) stored on the memory and/orstorage circuitry 454 to be programmed into the programmable logic fabric of theintegrated circuit device 12. The configuration data (bitstream) may represent a circuit design for performing multiplication operations that utilize one or more of the logic blocks 26, which may be mapped to the programmable logic according to the techniques described herein. As such, theintegrated circuit device 12 may assist thedata processing system 450 in performing the requested task, such as performing multiplication operations. - While the embodiments set forth in the present disclosure may be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and have been described in detail herein. For example, any suitable combination of the embodiments and/or techniques described herein may be implemented. Moreover, any suitable combination of number formats (e.g., single-precision floating-point, half-precision floating-point, bfloat16, extended precision and/or the like) may be used. Further, each DSP circuitry and/or DSP architecture may include any suitable number of elements (e.g., adders,
multipliers 64, routing, and/or the like). Accordingly, it should be understood that the disclosure is not intended to be limited to the particular forms disclosed. The disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure as defined by the following appended claims. - The techniques presented and claimed herein are referenced and applied to material objects and concrete examples of a practical nature that demonstrably improve the present technical field and, as such, are not abstract, intangible or purely theoretical. Further, if any claims appended to the end of this specification contain one or more elements designated as “means for [perform]ing [a function] . . . ” or “step for [perform]ing [a function] . . . ”, it is intended that such elements are to be interpreted under 35 U.S.C. 112(f). However, for any claims containing elements designated in any other manner, it is intended that such elements are not to be interpreted under 35 U.S.C. 112(f).
Claims (20)
1. A logic block implementable on an integrated circuit device, the logic block comprising:
a plurality of lookup tables configurable to receive a plurality of inputs and generate a first plurality of outputs;
adding circuitry configurable to receive the first plurality of outputs and generate a second plurality of outputs;
circuitry configurable to receive a portion of the plurality of inputs, determine one or more partial products, and generate a third plurality of outputs; and
a first adder configurable to determine a sum of a first portion of the second plurality of outputs and a first portion of the third plurality of outputs.
2. The logic block of claim 1 , further comprising additional circuitry configurable to receive the portion of the plurality of inputs and the sum and determine a second sum of the portion of the plurality of inputs and the sum.
3. The logic block of claim 2 , wherein:
the adding circuitry is configurable to receive a first carry-in value from a second logic block;
the circuitry is configurable to generate a second carry-in value from the second logic block; and
the additional circuitry is configurable to generate a third carry-in value from the second logic block.
4. The logic block of claim 1 , wherein the logic block is configurable to:
generate eight partial products; and
perform signed and unsigned multiplication operations.
5. The logic block of claim 1 , wherein the circuitry is configurable to receive an enable/disable signal to disable the circuitry when the circuitry is not used.
6. The logic block of claim 1 , wherein:
a first portion of the plurality of lookup tables is configurable to receive a first bit value of the plurality of inputs; and
the logic block further comprises a multiplexer communicatively coupled to a second portion of the plurality of lookup tables that is different than the first portion of the plurality of lookup tables.
7. The logic block of claim 6 , wherein the multiplexer is configurable to:
receive the first bit value of the plurality of inputs, a second bit value of the plurality of inputs, and a control signal; and
provide either the first bit value or the second bit value to the second portion of the plurality of lookup tables based on the control signal.
8. The logic block of claim 1 , wherein:
the first plurality of outputs comprises four partial products generated based on the plurality of inputs; and
the third plurality of outputs comprises two partial products.
9. The logic block of claim 8 , wherein the first portion of the third plurality of outputs comprises a first partial product of the two partial products.
10. The logic block of claim 9 , further comprising a second adder configurable to determine a second sum of a second portion of the second plurality of outputs and a second partial product of the two partial products.
11. An integrated circuit device, comprising a logic block, wherein the logic block comprises:
a plurality of lookup tables configurable to receive a plurality of inputs and generate a first plurality of outputs;
adding circuitry configurable to receive the first plurality of outputs and generate a second plurality of outputs;
circuitry configurable to receive a portion of the plurality of inputs, determine one or more partial products, and generate a third plurality of outputs; and
a first adder configurable to determine a sum of a first portion of the second plurality of outputs and a first portion of the third plurality of outputs.
12. The integrated circuit device of claim 11 , wherein:
the logic block further comprises additional circuitry configurable to receive the portion of the plurality of inputs and the sum and determine a second sum of the portion of the plurality of inputs and the sum;
the adding circuitry is configurable to receive a first carry-in value from a second logic block;
the circuitry is configurable to generate a second carry-in value from the second logic block; and
the additional circuitry is configurable to generate a third carry-in value from the second logic block.
13. The integrated circuit device of claim 11 , further comprising a second adder configurable to determine a second sum of the second plurality of outputs and a second portion of the third plurality of outputs.
14. The integrated circuit device of claim 13 , further comprising additional circuitry configurable to receive the portion of the plurality of inputs and the sum and determine a third sum of the portion of the plurality of inputs and the sum.
15. The integrated circuit device of claim 11 , wherein:
a first portion of the plurality of lookup tables is configurable to receive a first bit value of the plurality of inputs; and
the logic block further comprises a multiplexer communicatively coupled to a second portion of the plurality of lookup tables that is different than the first portion of the plurality of lookup tables, wherein the multiplexer is configurable to:
receive the first bit value of the plurality of inputs, a second bit value of the plurality of inputs, and a control signal; and
provide either the first bit value or the second bit value to the second portion of the plurality of lookup tables based on the control signal.
16. The integrated circuit device of claim 15 , wherein:
the first portion of the plurality of lookup tables is configurable to receive a third bit value, a fourth bit value, and a fifth bit value of the plurality of inputs;
the second portion of the plurality of lookup tables is configurable to receive the third bit value, the fourth bit value, and a sixth bit value of the plurality of inputs; and
the circuitry is configurable to receive the second bit value or the fifth bit value of the plurality of inputs.
17. The integrated circuit device of claim 11 , further comprising programmable logic.
18. A non-transitory computer-readable medium comprising instructions that, when executed, cause a logic block of an integrated circuit device to be configured to perform multiplication operations, wherein the logic block comprises:
a plurality of lookup tables configured to receive a plurality of inputs and generate a first plurality of outputs;
adding circuitry configured to receive the first plurality of outputs and generate a second plurality of outputs;
circuitry configured to receive a portion of the plurality of inputs, determine one or more partial products, and generate a third plurality of outputs; and
a first adder configured to determine a sum of a first portion of the second plurality of outputs and a first portion of the third plurality of outputs.
19. The non-transitory computer-readable medium of claim 18 , wherein:
a first portion of the plurality of lookup tables is configurable to receive a first bit value of the plurality of inputs; and
the logic block further comprises a multiplexer communicatively coupled to a second portion of the plurality of lookup tables that is different than the first portion of the plurality of lookup tables, wherein the multiplexer is configurable to:
receive the first bit value of the plurality of inputs, a second bit value of the plurality of inputs, and a control signal; and
provide either the first bit value or the second bit value to the second portion of the plurality of lookup tables based on the control signal.
20. The non-transitory computer-readable medium of claim 18 , wherein when the instructions are executed by a second integrated circuit device, the logic block of integrated circuit device is configured to perform the multiplication operations.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/473,870 US20240028295A1 (en) | 2019-12-27 | 2023-09-25 | Efficient logic blocks architectures for dense mapping of multipliers |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/729,256 US11768661B2 (en) | 2019-12-27 | 2019-12-27 | Efficient logic blocks architectures for dense mapping of multipliers |
US18/473,870 US20240028295A1 (en) | 2019-12-27 | 2023-09-25 | Efficient logic blocks architectures for dense mapping of multipliers |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/729,256 Continuation US11768661B2 (en) | 2019-12-27 | 2019-12-27 | Efficient logic blocks architectures for dense mapping of multipliers |
Publications (1)
Publication Number | Publication Date |
---|---|
US20240028295A1 true US20240028295A1 (en) | 2024-01-25 |
Family
ID=72521496
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/729,256 Active 2042-01-22 US11768661B2 (en) | 2019-12-27 | 2019-12-27 | Efficient logic blocks architectures for dense mapping of multipliers |
US18/473,870 Pending US20240028295A1 (en) | 2019-12-27 | 2023-09-25 | Efficient logic blocks architectures for dense mapping of multipliers |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/729,256 Active 2042-01-22 US11768661B2 (en) | 2019-12-27 | 2019-12-27 | Efficient logic blocks architectures for dense mapping of multipliers |
Country Status (3)
Country | Link |
---|---|
US (2) | US11768661B2 (en) |
EP (1) | EP3842925A1 (en) |
CN (1) | CN113050919A (en) |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5436860A (en) * | 1994-05-26 | 1995-07-25 | Motorola, Inc. | Combined multiplier/shifter and method therefor |
US10715144B2 (en) | 2019-06-06 | 2020-07-14 | Intel Corporation | Logic circuits with augmented arithmetic densities |
-
2019
- 2019-12-27 US US16/729,256 patent/US11768661B2/en active Active
-
2020
- 2020-09-16 EP EP20196378.2A patent/EP3842925A1/en active Pending
- 2020-09-23 CN CN202011013049.2A patent/CN113050919A/en active Pending
-
2023
- 2023-09-25 US US18/473,870 patent/US20240028295A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
US11768661B2 (en) | 2023-09-26 |
EP3842925A1 (en) | 2021-06-30 |
US20210200514A1 (en) | 2021-07-01 |
CN113050919A (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11256978B2 (en) | Hyperbolic functions for machine learning acceleration | |
US11907719B2 (en) | FPGA specialist processing block for machine learning | |
US20220222040A1 (en) | Floating-Point Dynamic Range Expansion | |
US11334318B2 (en) | Prefix network-directed addition | |
US11899746B2 (en) | Circuitry for high-bandwidth, low-latency machine learning | |
US11809798B2 (en) | Implementing large multipliers in tensor arrays | |
US11275998B2 (en) | Circuitry for low-precision deep learning | |
US10049082B2 (en) | Dot product based processing elements | |
US20210326111A1 (en) | FPGA Processing Block for Machine Learning or Digital Signal Processing Operations | |
US11163530B2 (en) | Programmable-logic-directed multiplier mapping | |
US20220230057A1 (en) | Hyperbolic functions for machine learning acceleration | |
US11010134B2 (en) | High radix subset code multiplier architecture | |
US20240028295A1 (en) | Efficient logic blocks architectures for dense mapping of multipliers | |
US20210117157A1 (en) | Systems and Methods for Low Latency Modular Multiplication | |
US11467804B2 (en) | Geometric synthesis | |
US11016733B2 (en) | Continuous carry-chain packing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: ALTERA CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTEL CORPORATION;REEL/FRAME:066353/0886 Effective date: 20231219 |