US10445067B2 - Configurable processor with in-package look-up table - Google Patents

Configurable processor with in-package look-up table Download PDF

Info

Publication number
US10445067B2
US10445067B2 US16/203,599 US201816203599A US10445067B2 US 10445067 B2 US10445067 B2 US 10445067B2 US 201816203599 A US201816203599 A US 201816203599A US 10445067 B2 US10445067 B2 US 10445067B2
Authority
US
United States
Prior art keywords
configurable
logic
lut
die
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US16/203,599
Other versions
US20190114138A1 (en
Inventor
Guobiao Zhang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Haicun Information Technology Co Ltd
Original Assignee
Hangzhou Haicun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/588,642 external-priority patent/US20170322771A1/en
Application filed by Hangzhou Haicun Information Technology Co Ltd filed Critical Hangzhou Haicun Information Technology Co Ltd
Priority to US16/203,599 priority Critical patent/US10445067B2/en
Publication of US20190114138A1 publication Critical patent/US20190114138A1/en
Application granted granted Critical
Publication of US10445067B2 publication Critical patent/US10445067B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/491Computations with decimal numbers radix 12 or 20.
    • G06F7/498Computations with decimal numbers radix 12 or 20. using counter-type accumulators
    • G06F7/4983Multiplying; Dividing
    • G06F7/4988Multiplying; Dividing by table look-up
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03KPULSE TECHNIQUE
    • H03K19/00Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
    • H03K19/02Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components
    • H03K19/173Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
    • H03K19/177Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
    • H03K19/17724Structural details of logic blocks
    • H03K19/17728Reconfigurable logic blocks, e.g. lookup tables

Definitions

  • the present invention relates to the field of integrated circuit, and more particularly to processors.
  • LBC logic-based computation
  • Logic circuits are suitable for arithmetic functions, whose operations consist of basic arithmetic operations only, i.e. addition, subtraction and multiplication.
  • logic circuits are not suitable for non-arithmetic functions, whose operations are more than the above arithmetic operations performable by the conventional logic circuits.
  • Exemplary non-arithmetic functions include transcendental functions and special functions.
  • Non-arithmetic functions are computationally hard and their hardware implementation has been a major challenge.
  • a complex function is a non-arithmetic function with multiple independent variables (independent variable is also known as input variable or argument). It can be expressed as a combination of basic functions.
  • a basic function is a non-arithmetic function with a single independent variable.
  • Exemplary basic functions include basic transcendental functions, such as exponential function (exp), logarithmic function (log), trigonometric functions (sin, cos, tan, atan) and others.
  • a conventional processor 00 X generally comprises a logic circuit 100 X and a memory circuit 200 X.
  • the logic circuit 100 X comprises an arithmetic logic unit (ALU) for performing arithmetic operations, whereas the memory circuit 200 X stores an LUT for the built-in function.
  • ALU arithmetic logic unit
  • the built-in function is approximated to a polynomial of a sufficiently high order.
  • the LUT 200 X stores the coefficients of the polynomial; and the ALU 100 X calculates the polynomial. Because the ALU 100 X and the LUT 200 X are formed side-by-side on a semiconductor substrate 00 S, this type of horizontal integration is referred to as two-dimensional (2-D) integration.
  • the 2-D integration puts stringent requirements on the manufacturing process.
  • the memory transistors in the LUT 200 X are vastly different from the logic transistors in the ALC 100 X.
  • the memory transistors have stringent requirements on leakage current, while the logic transistors have stringent requirements on drive current.
  • To form high-performance memory transistors and high-performance logic transistors at the same time is a challenge.
  • the 2-D integration also limits computational density and computational complexity. Computation has been developed towards higher computational density and greater computational complexity.
  • the computational density i.e. the computational power (e.g. the number of floating-point operations per second) per die area, is a figure of merit for parallel computation.
  • the computational complexity i.e. the total number of built-in functions supported by a processor, is a figure of merit for scientific computation.
  • inclusion of the LUT 200 X increases the die size of the conventional processor 00 X and lowers its computational density. This has an adverse effect on parallel computation.
  • FIG. 1B lists all built-in transcendental functions supported by an Intel Itanium (IA-64) processor (referring to Harrison et al. “The Computation of Transcendental Functions on the IA-64 Architecture”, Intel Technical journal, Q4 1999, hereinafter Harrison).
  • the IA-64 processor supports a total of 7 built-in transcendental functions, each using a relatively small LUT (from 0 to 24 kb) in conjunction with a relatively high-order Taylor series (from 5 to 22).
  • the LBC-based processor 00 X suffers one drawback. Because different logic circuits are used to realize different built-in functions, the processor 00 X is fully customized. In other words, once its design is complete, the processor 00 X can only realize a fixed set of pre-defined built-in functions. Hence, configurable computation is more desirable, where a same hardware can realize different mathematical functions under the control of a set of configuration signals.
  • configurable logic i.e. a same hardware realizes different logics under the control of a set of configuration signals
  • a configurable gate array which is also known as field-programmable gate array (FPGA), complex programmable logic device (CPLD), or other names.
  • FPGA field-programmable gate array
  • CPLD complex programmable logic device
  • U.S. Pat. No. 4,870,302 issued to Freeman on Sep. 26, 1989 discloses a configurable gate array. It comprises an array of configurable logic elements and a hierarchy of configurable interconnects that allow the configurable logic elements to be wired together.
  • logic functions are configurable, but mathematical functions are not configurable. A small number of mathematical functions (i.e.
  • the present invention discloses a configurable processor.
  • the present invention discloses a configurable processor with an in-package look-up table (IP-LUT), i.e. an IP-LUT configurable processor.
  • IP-LUT IP-LUT configurable processor.
  • the preferred IP-LUT configurable processor comprises a plurality of configurable computing elements.
  • Each configurable computing element comprises at least a programmable memory array on a memory die and at least an arithmetic logic circuit (ALC) on a logic die.
  • ALC arithmetic logic circuit
  • the programmable memory array stores at least a portion of a look-up table (LUT) for a mathematical function, which includes numerical values related to said mathematical function (e.g. functional values and/or derivative values thereof), while the ALC performs arithmetic operations on selected data from the LUT.
  • the logic die comprises the ALCs of a plurality of configurable computing elements, while the memory die comprises the programmable memory arrays of another plurality of configurable computing elements.
  • the logic die and memory die are located in a configurable computing-array package and communicatively coupled by a plurality of inter-die connections.
  • the LUT is referred to as in-package LUT (IP-LUT).
  • the preferred IP-LUT configurable processor uses memory-based computation (MBC), which realizes mathematical functions primarily with the LUT. Compared with the LUT used by the conventional processor, the IP-LUT used by the preferred IP-LUT configurable processor has a much larger capacity. Although arithmetic operations are still performed, the MBC only needs to calculate a polynomial to a much lower order because it uses a much larger IP-LUT as a starting point for computation. For the MBC, the fraction of computation done by the IP-LUT is more than the ALC.
  • MBC memory-based computation
  • Each usage cycle of the IP-LUT configurable processor comprises two stages: a configuration stage and a computation stage.
  • the configuration stage the LUT for a desired mathematical function is written into the programmable memory array.
  • the computation stage selected values of the mathematical function are read out from the programmable memory array.
  • the IP-LUT configurable processor can realize field-configurable computation and re-configurable computation.
  • a mathematical function is realized by writing its LUT into the programmable memory array in the field of use.
  • the programmable memory array is re-programmable and different mathematical functions can be realized by writing different LUTs for different mathematical functions thereto during different usage cycles. For example, during a first usage cycle, a first LUT for a first mathematical function is written into the re-programmable memory array; during a second usage cycle, a second LUT for a second mathematical function is written into the re-programmable memory array.
  • this type of vertical integration is referred to as 2.5-D integration.
  • the 2.5-D integration has a profound effect on the computational density and computational complexity.
  • the footprint of a conventional processor 00 X is roughly equal to the sum of those of the ALU 100 X and the LUT 200 X.
  • the IP-LUT configurable processor becomes smaller and computationally more powerful.
  • the total LUT capacity of the conventional processor 00 X is less than 100 Kb, whereas the total IP-LUT capacity for the IP-LUT configurable processor could reach 100 Gb.
  • IP-LUT configurable processor could support as many as 10,000 built-in functions (including various types of complex functions), far more than the conventional processor 00 X.
  • the logic die and the memory die are separate dice, the logic transistors in the logic die and the memory transistors in the memory die are formed on separate semiconductor substrates. Consequently, their manufacturing processes can be individually optimized.
  • the present invention further discloses a preferred IP-LUT configurable computing array for implementing complex functions. It is a special type of the IP-LUT configurable processor and comprises an array of configurable computing elements, an array of configurable logic elements and a plurality of configurable interconnects. Each configurable computing element comprises at least a programmable memory array for storing the LUT for a mathematical function and at least an ALC for performing arithmetic operations on selected data from the LUT.
  • the configurable logic elements and configurable interconnects in the IP-LUT configurable computing array are similar to those in the conventional configurable gate array.
  • a complex function is first decomposed into a combination of basic functions. Each basic function is then realized by an associated configurable computing element. Finally, the complex function is realized by programming the corresponding configurable logic elements and configurable interconnects.
  • the present invention discloses a configurable processor including a plurality of configurable computing elements, each of said configurable computing elements comprising: at least a programmable memory array on a memory level for storing at least a portion of a look-up table (LUT) for a mathematical function; at least an arithmetic logic circuit (ALC) on a logic level for performing at least an arithmetic operation on selected data from said LUT, wherein said logic level is a different physical level than said memory level; and means for communicatively coupling said programmable memory array and said ALC; wherein said mathematical function includes more operations than arithmetic operations performable by said ALC.
  • LUT look-up table
  • ALC arithmetic logic circuit
  • the present invention further discloses another configurable processor for implementing a mathematical function, comprising: at least first and second programmable memory arrays on a memory level, wherein said first programmable memory array stores at least a first portion of a first look-up table (LUT) for a first mathematical function; and, said second programmable memory array stores at least a second portion of a second LUT for a second mathematical function; at least an arithmetic logic circuit (ALC) on a logic level for performing at least an arithmetic operation on selected data from said first or second LUT, wherein said logic level is a different physical level than said memory level; and means for communicatively coupling said first or second programmable memory array with said ALC; wherein said mathematical function is a combination of at least said first and second mathematical functions; and, each of said first and second mathematical functions includes more operations than arithmetic operations performable by said ALC.
  • ALC arithmetic logic circuit
  • the present invention further discloses a configurable computing array for implementing a mathematical function, comprising: at least an array of configurable logic elements including a configurable logic element, wherein said configurable logic element selectively realizes a logic function in a logic library; at least an array of configurable computing elements comprising at least a first programmable memory array, a second programmable memory array and an arithmetic logic circuit (ALC), wherein said first programmable memory array stores at least a first portion of a first look-up table (LUT) for a first mathematical function; said second programmable memory array stores at least a second portion of a second LUT for a second mathematical function; and, said ALC performs at least an arithmetic operation on selected data from said first or second LUT; means for communicatively coupling said configurable logic elements and said configurable computing elements; whereby said configurable computing array realizes said mathematical function by programming said configurable logic elements and said configurable computing elements, wherein said mathematical function is a combination of at least said first and
  • FIG. 1A is a schematic view of a conventional processor (prior art);
  • FIG. 1B lists all transcendental functions supported by an Intel Itanium (IA-64) processor (prior art);
  • FIG. 2A is a block diagram of a preferred IP-LUT configurable processor
  • FIG. 2B is a block diagram of a preferred configurable computing element
  • FIG. 2C is a perspective view of the preferred configurable computing element
  • FIGS. 3A-3C are the cross-sectional views of three preferred IP-LUT configurable processor packages
  • FIG. 4A is a circuit block diagram of a preferred configurable computing element showing more details
  • FIG. 4B is a circuit block diagram of the preferred configurable computing element realizing a single-precision function
  • FIG. 4C lists preferred LUT sizes and Taylor series required to realize mathematical functions with different precisions
  • FIG. 5 is a block diagram of a first preferred IP-LUT configurable computing array
  • FIG. 7 is a block diagram of a second preferred IP-LUT configurable computing array
  • FIGS. 8A-8B show two instantiations of the second preferred IP-LUT configurable computing array.
  • the phrase “mathematical functions” refer to non-arithmetic functions only; the phrase “memory” is used in its broadest sense to mean any semiconductor-based holding place for information, either permanent or temporary; the phrase “permanent” is used in its broadest sense to mean any long-term storage; the phrase “communicatively coupled” is used in its broadest sense to mean any coupling whereby information may be passed from one element to another element; the term “LUT” (or, “IP-LUT”) could refer to the logic look-up table (LUT) stored in the programmable memory array(s), or the physical LUT circuit in the form of the programmable memory array(s), depending on the context; the symbol “/” means a relationship of “and” or “or”.
  • a preferred IP-LUT configurable processor 300 comprises an array of configurable computing elements 300 - 1 , 300 - 2 . . . 300 - i . . . 300 -N ( FIG. 2A ). Each configurable computing element 300 - i could realize a same mathematical function or different mathematical functions. It has at least one input 150 and at least one output 190 .
  • the configurable computing element 300 - i comprises at least a programmable memory array 170 and an arithmetic logic circuit (ALC) 180 , which are communicatively coupled by connections 160 ( FIG. 2B ).
  • the programmable memory array 170 stores at least a portion of the LUT for a mathematical function. It may be a RAM array or a ROM array.
  • the RAM could be SRAM or DRAM, while the ROM could be OTP, EPROM, EEPROM, flash memory (e.g. planar NOR memory, planar NAND memory, or 3D-NAND memory), or 3D-XPoint memory.
  • the LUT includes numerical values related to said mathematical function. Examples of the numerical values include the functional values or the derivative values of said mathematical function.
  • the ALC 180 performs at least an arithmetic operation on selected data from the LUT. It may comprise an adder, a multiplier, and/or a multiply-accumulator (MAC).
  • the ALC 180 may operate on integer, fixed-point numbers, or floating-point numbers.
  • the mathematical function implemented by the programmable memory array 170 is a non-arithmetic function, which includes more operations than the arithmetic operations performable by the ALC 180 . As disclosed before, typical arithmetical operations performable by the ALC 180 consist of addition, subtraction and multiplication.
  • Each usage cycle of the IP-LUT configurable processor 300 comprises two stages: a configuration stage and a computation stage.
  • the configuration stage the LUT for a desired mathematical function is written into the programmable memory array 170 .
  • the computation stage selected values of the mathematical function are read out from the programmable memory array 170 .
  • the IP-LUT configurable processor 300 can be used to realize field-configurable computation and re-configurable computation. For the field-configurable computation, a mathematical function is realized by writing its LUT into the programmable memory array 170 in the field of use.
  • the programmable memory array 170 is re-programmable and different mathematical functions can be realized by writing different LUTs for different mathematical functions into the re-programmable memory array 170 . For example, during a first usage cycle, a first LUT for a first mathematical function is written into the re-programmable memory array 170 ; during a second usage cycle, a second LUT for a second mathematical function is written into the re-programmable memory array 170 .
  • the ALC 180 is formed on a logic die 100 , while the programmable memory array 170 is formed on the memory die 200 ( FIG. 2C ).
  • the logic die 100 is formed on a first semiconductor substrate 100 S, while the memory die 200 is formed on a second semiconductor substrate 200 S.
  • a single logic die 100 comprises the ALCs 180 of a plurality of configurable computing elements, while a single memory die 200 comprises the programmable memory arrays 170 of another plurality of configurable computing elements.
  • the logic die 100 and memory die 200 are disposed in a same package and communicatively coupled by a plurality of inter-die connections 160 .
  • the programmable memory array 170 is represented by dotted line in all figures.
  • the IP-LUT configurable processor 300 uses memory-based computation (MBC), which realizes mathematical functions primarily with the LUT. Compared with the LUT 200 X used by the conventional processor 00 X, the IP-LUT 170 used by the IP-LUT configurable processor 300 has a much larger capacity. Although arithmetic operations are still performed, the MBC only needs to calculate a polynomial to a much lower order because it uses a much larger IP-LUT 170 as a starting point for computation. For the MBC, the fraction of computation done by the IP-LUT 170 is more than the ALC 180 .
  • MBC memory-based computation
  • the IP-LUT configurable processor package 300 in FIG. 3A comprises two separate dice: a logic die 100 and a memory die 200 .
  • the memory die 200 is stacked on top of the logic die 100 , while the logic die 100 is stacked on the package substrate 110 .
  • the memory die 200 is flipped and bonded face-to-face with the logic die 100 .
  • Micro-bumps 116 act as the inter-die connections 160 and provide electrical coupling between the dice 100 , 200 . Both the memory die 200 and the logic die 100 are located in a same package 130 .
  • the memory die 200 comprises the programmable memory arrays 170 for a plurality of configurable computing elements, while the logic die 100 comprises the ALCs 180 for a plurality of configurable computing elements, as well as another plurality of configurable logic elements (as shown in FIGS. 5-6 ).
  • the logic die 100 may be stacked on top of the memory die 200 .
  • neither dice 100 , 200 have to be flipped.
  • the IP-LUT configurable processor package 300 in FIG. 3B comprises a logic die 100 , an interposer 120 and a memory die 200 .
  • the interposer 120 comprise a plurality of through-silicon vias (TSV) 118 .
  • TSVs 118 provide electrical couplings between the logic die 100 and the memory die 200 , offer more freedom in design and facilitate heat dissipation.
  • the TSVs 118 and the micro-bumps 116 collectively form the inter-die connections 160 .
  • the IP-LUT configurable processor package 300 in FIG. 3C comprises at least two logic dice 100 A, 100 B, and at least a memory die 200 . These dice 100 A, 100 B, 200 are separate dice and they are located in a same package 130 .
  • the memory die 200 is stacked on top of the logic die 100 B, and the logic die 100 B is stacked on top of the logic die 100 A.
  • the dice 100 A, 100 B, 200 are electrically coupled with the TSVs 118 and the micro-bumps 116 .
  • the memory die 200 comprises the programmable memory arrays 170 for a plurality of configurable computing elements
  • the logic die 100 B comprises the ALCs 180 for a plurality of configurable computing elements
  • the logic die 100 A comprises another plurality of configurable logic elements (as shown in FIGS. 5-6 ).
  • the IP-LUT configurable processor 300 may comprise more than one memory die 200 . In this case, the IP-LUT will have a large capacity than that in FIG. 3A .
  • the TSVs 118 and the micro-bumps 116 in this figure collectively form the inter-die connections 160 .
  • this type of vertical integration is referred to as 2.5-D integration.
  • the 2.5-D integration has a profound effect on the computational density and computational complexity.
  • the footprint of a conventional processor 00 X is roughly equal to the sum of those of the ALU 100 X and the LUT 200 X.
  • the IP-LUT configurable processor 300 becomes smaller and computationally more powerful.
  • the total LUT capacity of the conventional processor 00 X is less than 100 Kb, whereas the total IP-LUT capacity for the IP-LUT configurable processor 300 could reach 100 Gb.
  • the 2.5-D integration can improve the communication throughput between the IP-LUT 170 and the ALC 180 . Because they are physically close and coupled by a large number of inter-die connections 160 , the IP-LUT 170 and the ALC 180 have a larger communication throughput than that between the LUT 200 X and the ALU 100 X in the conventional processor 00 X. Lastly, the 2.5-D integration benefits manufacturing process. Because the logic die 100 and the memory die 200 are separate dice, the logic transistors in the logic die 100 and the memory transistors in the memory die 200 are formed on separate semiconductor substrates. Consequently, their manufacturing processes can be individually optimized.
  • a preferred configurable computing element 300 - i comprises a pre-processing circuit 180 R, a post-processing circuit 180 T and at least a programmable memory array 170 for storing the LUT(s) for a mathematical function.
  • the pre-processing circuit 180 R converts the input variable (X) 150 into an address (A) 160 A of the programmable memory array 170 .
  • the post-processing circuit 180 T converts it into the output value (Y) 190 .
  • a residue (R) of the input variable (X) is fed into the post-processing circuit 180 T to improve the computational precision.
  • the pre-processing circuit 180 R and the post-processing circuit 180 T are formed in the logic die 100 .
  • at least a portion of the pre-processing circuit 180 R and the post-processing circuit 180 T may be formed in the memory die 200 .
  • the ALC 180 comprises a pre-processing circuit 180 R (mainly comprising an address buffer) and a post-processing circuit 180 T (comprising an adder 180 A and a multiplier 180 M).
  • the inter-die connections 160 transfer data between the ALC 180 and the IP-LUT 170 .
  • a 32-bit input variable X (x 31 . . . x 0 ) is sent to the IP-LUT configurable processor 300 as an input 150 .
  • the pre-processing circuit 180 R extracts the higher 16 bits (x 31 . . . x 16 ) and sends it as a 16-bit address input A to the IP-LUT 170 .
  • the pre-processing circuit 180 R further extracts the lower 16 bits (x 15 . . . x 0 ) and sends it as a 16-bit input residue R to the post-processing circuit 180 T.
  • the post-processing circuit 180 T performs a polynomial interpolation to generate a 32-bit output value Y 190 .
  • a higher-order polynomial interpolation e.g. higher-order Taylor series
  • FIGS. 4A-4B can be used to implement special functions.
  • Special functions can be defined by means of power series, generating functions, infinite products, repeated differentiation, integral representation, differential difference, integral, and functional equations, trigonometric series, or other series in orthogonal functions.
  • IP-LUT configurable processor will simplify the computation of special functions and promote their applications in scientific computation.
  • the first preferred IP-LUT configurable computing array 700 comprises first and second configurable slices 700 A, 700 B.
  • Each configurable slice (e.g. 700 A) comprises a first array of configurable computing elements (e.g. 300 AA- 300 AD) and a second array of configurable logic elements (e.g. 400 AA- 400 AD).
  • a configurable channel 620 is placed between the first array of configurable computing elements (e.g. 300 AA- 300 AD) and the second array of configurable logic elements (e.g. 400 AA- 400 AD).
  • the configurable channels 610 , 630 , 650 are also placed between different configurable slices 700 A, 700 B.
  • the configurable channels 610 - 650 comprise an array of configurable interconnects (represented by slashes at the cross-points in each configurable channel).
  • the sea-of-gates architecture may also be used.
  • the configurable computing elements 300 AA- 300 BD are similar to those in the IP-LUT configurable processor 300 ( FIG. 2B ).
  • Each configurable computing element 300 - i comprises at least a programmable memory array 170 and an arithmetic logic circuit (ALC) 180 . It can realize at least a basic function by loading the LUT for said basic function into the programmable memory array 170 .
  • the configurable logic elements 400 AA- 400 BD and the configurable interconnects 610 - 650 are similar to those disclosed in Freeman (U.S. Pat. No. 4,870,302). Each configurable logic element can selectively realize any one of a plurality of logic operations in a logic library.
  • a typical logic library includes a group of operations consisting of shift, logic NOT, logic AND, logic OR, logic NOR, logic NAND, logic XOR, addition “+”, and subtraction “ ⁇ ”.
  • Each configurable interconnect can selectively couple or de-couple at least one interconnect line.
  • the first preferred IP-LUT configurable computing array 700 can realize a complex function by programming the configurable logic elements 400 AA- 400 BD and the configurable computing elements 300 AA- 300 BD.
  • the complex function is a combination of basic functions, which can be implemented by selected configurable computing elements.
  • the mathematical operations included in each basic function are not only more than the arithmetic operations included in the logic library of the configurable logic elements 400 AA- 400 BD, but also more than the arithmetic operations performable by the ALC 180 .
  • the arithmetic operations included in the logic library consist of addition and subtraction; and, the arithmetic operations performable by the ALC 180 consist of addition, subtraction and multiplication.
  • the programmable memory arrays 170 of the configurable computing elements 300 AA- 300 BD are located on a different physical level than the configurable logic elements 400 AA- 400 BD.
  • the programmable memory arrays 170 are located on a memory die 200
  • the configurable logic elements 400 are located on a logic die.
  • This logic die could be the same logic die 100 for the ALC 180 , as in the case of FIG. 3A .
  • it could be a different logic die, as in the case of FIG. 3C where a first logic dice 100 A is used for the configurable logic elements 400 AA- 400 BD, and a second logic die 100 B is used for the ALCs 180 .
  • the memory die and the logic die are vertically stacked and preferably at least partially overlap.
  • the configurable interconnects in the configurable channel 610 - 650 use the same convention as Freeman: the interconnect with a dot means that the interconnect is connected; the interconnect without dot means that the interconnect is not connected; a broken interconnect means that two broken sections are un-coupled.
  • the configurable computing element 300 AA is configured to realize the function log( ), whose result log(a) is sent to a first input of the configurable logic element 400 AA.
  • the configurable computing element 300 AB is configured to realize the function log[sin( )], whose result log[sin(b)] is sent to a second input of the configurable logic element 400 AA.
  • the configurable logic element 400 AA is configured to realize addition, whose result log(a)+log[sin(b)] is sent the configurable computing element 300 BA.
  • the results of the configurable computing elements 300 AC, 300 AD, the configurable logic elements 400 AC, and the configurable computing element 300 BC can be sent to a second input of the configurable logic element 400 BA.
  • the configurable logic element 400 BA is configured to realize addition, whose result a ⁇ sin(b)+c ⁇ cos(d) is sent to the output e.
  • the IP-LUT configurable computing array 700 can realize other complex functions.
  • a second preferred IP-LUT configurable computing array 700 is shown. Besides configurable computing elements 300 A, 300 B and configurable logic element 400 A, this preferred embodiment further comprises a multiplier 500 .
  • the configurable channels 660 - 680 comprise a plurality of configurable interconnects.
  • the second preferred IP-LUT configurable computing array 700 can realize more mathematical functions with more computational power.
  • FIGS. 8A-8B disclose two instantiations of the second preferred IP-LUT configurable computing array 700 .
  • the configurable computing element 300 A is configured to realize the function exp(f), while the configurable computing element 300 B is configured to realize the function inv(g).
  • the configurable computing element 300 A is configured to realize the function sin(f), while the configurable computing element 300 B is configured to realize the function cos(g).
  • the configurable channel 670 is configured in such a way that the outputs of 300 A, 300 B are fed into the configurable logic element 400 A, which is configured to realize arithmetic addition.
  • the IP-LUT configurable processor of the present invention could be a micro-controller, a controller, a central processing unit (CPU), a digital signal processor (DSP), a graphic processing unit (GPU), a network-security processor, an encryption/decryption processor, an encoding/decoding processor, a neural-network processor, or an artificial intelligence (Al) processor.
  • CPU central processing unit
  • DSP digital signal processor
  • GPU graphic processing unit
  • Al artificial intelligence
  • These IP-LUT configurable processors can be found in consumer electronic devices (e.g. personal computers, video game machines, smart phones) as well as engineering and scientific workstations and server machines. The invention, therefore, is not to be limited except in the spirit of the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Logic Circuits (AREA)

Abstract

A configurable processor comprises a memory die and a logic die. The memory die comprises a programmable memory array for storing a look-up table (LUT) for a mathematical function, while the logic die comprises an arithmetic logic circuit (ALC) for performing at least an arithmetic operation on selected data from the LUT, wherein said mathematical function includes more operation than the arithmetic operations performable by the ALC. Complex mathematical functions can be implemented and configured.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation-in-part of U.S. patent application Ser. No. 15/588,642, filed May 6, 2017, which claims priority from Chinese Patent Application 201610301645.8, filed May 6, 2016; Chinese Patent Application 201710310865.1, filed May 5, 2017, in the State Intellectual Property Office of the People's Republic of China (CN), the disclosure of which are incorporated herein by references in their entireties.
BACKGROUND 1. Technical Field of the Invention
The present invention relates to the field of integrated circuit, and more particularly to processors.
2. Prior Art
Conventional processors use logic-based computation (LBC), which carries out computation primarily with logic circuits (e.g. XOR circuit). Logic circuits are suitable for arithmetic functions, whose operations consist of basic arithmetic operations only, i.e. addition, subtraction and multiplication. However, logic circuits are not suitable for non-arithmetic functions, whose operations are more than the above arithmetic operations performable by the conventional logic circuits. Exemplary non-arithmetic functions include transcendental functions and special functions. Non-arithmetic functions are computationally hard and their hardware implementation has been a major challenge.
A complex function is a non-arithmetic function with multiple independent variables (independent variable is also known as input variable or argument). It can be expressed as a combination of basic functions. A basic function is a non-arithmetic function with a single independent variable. Exemplary basic functions include basic transcendental functions, such as exponential function (exp), logarithmic function (log), trigonometric functions (sin, cos, tan, atan) and others.
For the conventional processors, all complex functions and most basic functions are implemented by software; only a small number of basic functions (e.g. basic algebraic functions and basic transcendental functions) are implemented by hardware, which are referred to as built-in functions. These built-in functions are realized by a combination of arithmetic operations and look-up tables (LUT). For example, U.S. Pat. No. 5,954,787 issued to Eun on Sep. 21, 1999 taught a method for generating sine/cosine functions using look-up tables; U.S. Pat. No. 9,207,910 issued to Azadet et al. on Dec. 8, 2015 taught a method for calculating a power function using LUTs.
Realization of built-in functions is further illustrated in FIG. 1A. A conventional processor 00X generally comprises a logic circuit 100X and a memory circuit 200X. The logic circuit 100X comprises an arithmetic logic unit (ALU) for performing arithmetic operations, whereas the memory circuit 200X stores an LUT for the built-in function. To obtain a desired precision, the built-in function is approximated to a polynomial of a sufficiently high order. The LUT 200X stores the coefficients of the polynomial; and the ALU 100X calculates the polynomial. Because the ALU 100X and the LUT 200X are formed side-by-side on a semiconductor substrate 00S, this type of horizontal integration is referred to as two-dimensional (2-D) integration.
The 2-D integration puts stringent requirements on the manufacturing process. As is well known in the art, the memory transistors in the LUT 200X are vastly different from the logic transistors in the ALC 100X. The memory transistors have stringent requirements on leakage current, while the logic transistors have stringent requirements on drive current. To form high-performance memory transistors and high-performance logic transistors at the same time is a challenge.
The 2-D integration also limits computational density and computational complexity. Computation has been developed towards higher computational density and greater computational complexity. The computational density, i.e. the computational power (e.g. the number of floating-point operations per second) per die area, is a figure of merit for parallel computation. The computational complexity, i.e. the total number of built-in functions supported by a processor, is a figure of merit for scientific computation. For the 2-D integration, inclusion of the LUT 200X increases the die size of the conventional processor 00X and lowers its computational density. This has an adverse effect on parallel computation. Moreover, because the ALU 100X, as the primary component of the conventional processor 00X, occupies a large die area, the LUT 200X is left with only a small die area and therefore, supports few built-in functions. FIG. 1B lists all built-in transcendental functions supported by an Intel Itanium (IA-64) processor (referring to Harrison et al. “The Computation of Transcendental Functions on the IA-64 Architecture”, Intel Technical journal, Q4 1999, hereinafter Harrison). The IA-64 processor supports a total of 7 built-in transcendental functions, each using a relatively small LUT (from 0 to 24 kb) in conjunction with a relatively high-order Taylor series (from 5 to 22).
The LBC-based processor 00X suffers one drawback. Because different logic circuits are used to realize different built-in functions, the processor 00X is fully customized. In other words, once its design is complete, the processor 00X can only realize a fixed set of pre-defined built-in functions. Apparently, configurable computation is more desirable, where a same hardware can realize different mathematical functions under the control of a set of configuration signals.
In the past, configurable logic, i.e. a same hardware realizes different logics under the control of a set of configuration signals, was realized by a configurable gate array, which is also known as field-programmable gate array (FPGA), complex programmable logic device (CPLD), or other names. U.S. Pat. No. 4,870,302 issued to Freeman on Sep. 26, 1989 (hereinafter Freeman) discloses a configurable gate array. It comprises an array of configurable logic elements and a hierarchy of configurable interconnects that allow the configurable logic elements to be wired together. In the prior-art configurable gate arrays, only logic functions are configurable, but mathematical functions are not configurable. A small number of mathematical functions (i.e. built-in functions) are realized in fixed computing elements, which are part of hard blocks. Namely, the circuits realizing these built-in functions are fixedly connected and are not subject to change by programming. Apparently, fixed computing elements would limit further applications of the configurable gate array. To overcome this difficulty, the present invention expands the original concept of the configurable gate array by making the fixed computing elements configurable.
OBJECTS AND ADVANTAGES
It is a principle object of the present invention to realize configurable computation.
It is a further object of the present invention to realize field-configurable computation.
It is a further object of the present invention to realize re-configurable computation.
It is a further object of the present invention to realize configurable computation for complex functions.
It is a further object of the present invention to provide a configurable processor with a greater computational complexity.
It is a further object of the present invention to provide a configurable processor with a higher computational density.
It is a further object of the present invention to provide a configurable gate array with a greater computational flexibility.
In accordance with these and other objects of the present invention, the present invention discloses a configurable processor.
SUMMARY OF THE INVENTION
The present invention discloses a configurable processor with an in-package look-up table (IP-LUT), i.e. an IP-LUT configurable processor. The preferred IP-LUT configurable processor comprises a plurality of configurable computing elements. Each configurable computing element comprises at least a programmable memory array on a memory die and at least an arithmetic logic circuit (ALC) on a logic die. The programmable memory array stores at least a portion of a look-up table (LUT) for a mathematical function, which includes numerical values related to said mathematical function (e.g. functional values and/or derivative values thereof), while the ALC performs arithmetic operations on selected data from the LUT. In general, the logic die comprises the ALCs of a plurality of configurable computing elements, while the memory die comprises the programmable memory arrays of another plurality of configurable computing elements. The logic die and memory die are located in a configurable computing-array package and communicatively coupled by a plurality of inter-die connections. Located in the configurable computing-array package, the LUT is referred to as in-package LUT (IP-LUT).
The preferred IP-LUT configurable processor uses memory-based computation (MBC), which realizes mathematical functions primarily with the LUT. Compared with the LUT used by the conventional processor, the IP-LUT used by the preferred IP-LUT configurable processor has a much larger capacity. Although arithmetic operations are still performed, the MBC only needs to calculate a polynomial to a much lower order because it uses a much larger IP-LUT as a starting point for computation. For the MBC, the fraction of computation done by the IP-LUT is more than the ALC.
Each usage cycle of the IP-LUT configurable processor comprises two stages: a configuration stage and a computation stage. In the configuration stage, the LUT for a desired mathematical function is written into the programmable memory array. In the computation stage, selected values of the mathematical function are read out from the programmable memory array. The IP-LUT configurable processor can realize field-configurable computation and re-configurable computation. For the field-configurable computation, a mathematical function is realized by writing its LUT into the programmable memory array in the field of use. For re-configurable computation, the programmable memory array is re-programmable and different mathematical functions can be realized by writing different LUTs for different mathematical functions thereto during different usage cycles. For example, during a first usage cycle, a first LUT for a first mathematical function is written into the re-programmable memory array; during a second usage cycle, a second LUT for a second mathematical function is written into the re-programmable memory array.
Because the logic die and the memory die are located in a same package, this type of vertical integration is referred to as 2.5-D integration. The 2.5-D integration has a profound effect on the computational density and computational complexity. For the conventional 2-D integration, the footprint of a conventional processor 00X is roughly equal to the sum of those of the ALU 100X and the LUT 200X. On the other hand, because the 2.5-D integration moves the LUT from aside to above, the IP-LUT configurable processor becomes smaller and computationally more powerful. In addition, the total LUT capacity of the conventional processor 00X is less than 100 Kb, whereas the total IP-LUT capacity for the IP-LUT configurable processor could reach 100 Gb. Consequently, a single IP-LUT configurable processor could support as many as 10,000 built-in functions (including various types of complex functions), far more than the conventional processor 00X. Furthermore, because the logic die and the memory die are separate dice, the logic transistors in the logic die and the memory transistors in the memory die are formed on separate semiconductor substrates. Consequently, their manufacturing processes can be individually optimized.
To further improve configurability, the present invention further discloses a preferred IP-LUT configurable computing array for implementing complex functions. It is a special type of the IP-LUT configurable processor and comprises an array of configurable computing elements, an array of configurable logic elements and a plurality of configurable interconnects. Each configurable computing element comprises at least a programmable memory array for storing the LUT for a mathematical function and at least an ALC for performing arithmetic operations on selected data from the LUT. The configurable logic elements and configurable interconnects in the IP-LUT configurable computing array are similar to those in the conventional configurable gate array. During computation, a complex function is first decomposed into a combination of basic functions. Each basic function is then realized by an associated configurable computing element. Finally, the complex function is realized by programming the corresponding configurable logic elements and configurable interconnects.
Accordingly, the present invention discloses a configurable processor including a plurality of configurable computing elements, each of said configurable computing elements comprising: at least a programmable memory array on a memory level for storing at least a portion of a look-up table (LUT) for a mathematical function; at least an arithmetic logic circuit (ALC) on a logic level for performing at least an arithmetic operation on selected data from said LUT, wherein said logic level is a different physical level than said memory level; and means for communicatively coupling said programmable memory array and said ALC; wherein said mathematical function includes more operations than arithmetic operations performable by said ALC.
The present invention further discloses another configurable processor for implementing a mathematical function, comprising: at least first and second programmable memory arrays on a memory level, wherein said first programmable memory array stores at least a first portion of a first look-up table (LUT) for a first mathematical function; and, said second programmable memory array stores at least a second portion of a second LUT for a second mathematical function; at least an arithmetic logic circuit (ALC) on a logic level for performing at least an arithmetic operation on selected data from said first or second LUT, wherein said logic level is a different physical level than said memory level; and means for communicatively coupling said first or second programmable memory array with said ALC; wherein said mathematical function is a combination of at least said first and second mathematical functions; and, each of said first and second mathematical functions includes more operations than arithmetic operations performable by said ALC.
The present invention further discloses a configurable computing array for implementing a mathematical function, comprising: at least an array of configurable logic elements including a configurable logic element, wherein said configurable logic element selectively realizes a logic function in a logic library; at least an array of configurable computing elements comprising at least a first programmable memory array, a second programmable memory array and an arithmetic logic circuit (ALC), wherein said first programmable memory array stores at least a first portion of a first look-up table (LUT) for a first mathematical function; said second programmable memory array stores at least a second portion of a second LUT for a second mathematical function; and, said ALC performs at least an arithmetic operation on selected data from said first or second LUT; means for communicatively coupling said configurable logic elements and said configurable computing elements; whereby said configurable computing array realizes said mathematical function by programming said configurable logic elements and said configurable computing elements, wherein said mathematical function is a combination of at least said first and second mathematical functions; wherein each of said first and second mathematical functions includes more operations than arithmetic operations included in said logic library; and, each of said first and second mathematical functions includes more operations than arithmetic operations performable by said ALC.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1A is a schematic view of a conventional processor (prior art); FIG. 1B lists all transcendental functions supported by an Intel Itanium (IA-64) processor (prior art);
FIG. 2A is a block diagram of a preferred IP-LUT configurable processor; FIG. 2B is a block diagram of a preferred configurable computing element; FIG. 2C is a perspective view of the preferred configurable computing element;
FIGS. 3A-3C are the cross-sectional views of three preferred IP-LUT configurable processor packages;
FIG. 4A is a circuit block diagram of a preferred configurable computing element showing more details; FIG. 4B is a circuit block diagram of the preferred configurable computing element realizing a single-precision function; FIG. 4C lists preferred LUT sizes and Taylor series required to realize mathematical functions with different precisions;
FIG. 5 is a block diagram of a first preferred IP-LUT configurable computing array;
FIG. 6 shows an instantiation of the first preferred IP-LUT configurable computing array for implementing a complex function, i.e. e=a·sin(b)+c·cos(d);
FIG. 7 is a block diagram of a second preferred IP-LUT configurable computing array;
FIGS. 8A-8B show two instantiations of the second preferred IP-LUT configurable computing array.
It should be noted that all the drawings are schematic and not drawn to scale. Relative dimensions and proportions of parts of the device structures in the figures have been shown exaggerated or reduced in size for the sake of clarity and convenience in the drawings. The same reference symbols are generally used to refer to corresponding or similar features in the different embodiments.
Throughout this specification, the phrase “mathematical functions” refer to non-arithmetic functions only; the phrase “memory” is used in its broadest sense to mean any semiconductor-based holding place for information, either permanent or temporary; the phrase “permanent” is used in its broadest sense to mean any long-term storage; the phrase “communicatively coupled” is used in its broadest sense to mean any coupling whereby information may be passed from one element to another element; the term “LUT” (or, “IP-LUT”) could refer to the logic look-up table (LUT) stored in the programmable memory array(s), or the physical LUT circuit in the form of the programmable memory array(s), depending on the context; the symbol “/” means a relationship of “and” or “or”.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Those of ordinary skills in the art will realize that the following description of the present invention is illustrative only and is not intended to be in any way limiting. Other embodiments of the invention will readily suggest themselves to such skilled persons from an examination of the within disclosure.
Referring now to FIG. 2A-2C, a preferred IP-LUT configurable processor 300 is disclosed. It comprises an array of configurable computing elements 300-1, 300-2 . . . 300-i . . . 300-N (FIG. 2A). Each configurable computing element 300-i could realize a same mathematical function or different mathematical functions. It has at least one input 150 and at least one output 190.
The configurable computing element 300-i comprises at least a programmable memory array 170 and an arithmetic logic circuit (ALC) 180, which are communicatively coupled by connections 160 (FIG. 2B). The programmable memory array 170 stores at least a portion of the LUT for a mathematical function. It may be a RAM array or a ROM array. The RAM could be SRAM or DRAM, while the ROM could be OTP, EPROM, EEPROM, flash memory (e.g. planar NOR memory, planar NAND memory, or 3D-NAND memory), or 3D-XPoint memory. The LUT includes numerical values related to said mathematical function. Examples of the numerical values include the functional values or the derivative values of said mathematical function. The ALC 180 performs at least an arithmetic operation on selected data from the LUT. It may comprise an adder, a multiplier, and/or a multiply-accumulator (MAC). The ALC 180 may operate on integer, fixed-point numbers, or floating-point numbers. The mathematical function implemented by the programmable memory array 170 is a non-arithmetic function, which includes more operations than the arithmetic operations performable by the ALC 180. As disclosed before, typical arithmetical operations performable by the ALC 180 consist of addition, subtraction and multiplication.
Each usage cycle of the IP-LUT configurable processor 300 comprises two stages: a configuration stage and a computation stage. In the configuration stage, the LUT for a desired mathematical function is written into the programmable memory array 170. In the computation stage, selected values of the mathematical function are read out from the programmable memory array 170. The IP-LUT configurable processor 300 can be used to realize field-configurable computation and re-configurable computation. For the field-configurable computation, a mathematical function is realized by writing its LUT into the programmable memory array 170 in the field of use. For re-configurable computation, the programmable memory array 170 is re-programmable and different mathematical functions can be realized by writing different LUTs for different mathematical functions into the re-programmable memory array 170. For example, during a first usage cycle, a first LUT for a first mathematical function is written into the re-programmable memory array 170; during a second usage cycle, a second LUT for a second mathematical function is written into the re-programmable memory array 170.
In the preferred configurable computing element 300-i, the ALC 180 is formed on a logic die 100, while the programmable memory array 170 is formed on the memory die 200 (FIG. 2C). The logic die 100 is formed on a first semiconductor substrate 100S, while the memory die 200 is formed on a second semiconductor substrate 200S. In general, a single logic die 100 comprises the ALCs 180 of a plurality of configurable computing elements, while a single memory die 200 comprises the programmable memory arrays 170 of another plurality of configurable computing elements. The logic die 100 and memory die 200 are disposed in a same package and communicatively coupled by a plurality of inter-die connections 160. On a different physical level (e.g. a different die) than the ALC 180, the programmable memory array 170 is represented by dotted line in all figures.
The IP-LUT configurable processor 300 uses memory-based computation (MBC), which realizes mathematical functions primarily with the LUT. Compared with the LUT 200X used by the conventional processor 00X, the IP-LUT 170 used by the IP-LUT configurable processor 300 has a much larger capacity. Although arithmetic operations are still performed, the MBC only needs to calculate a polynomial to a much lower order because it uses a much larger IP-LUT 170 as a starting point for computation. For the MBC, the fraction of computation done by the IP-LUT 170 is more than the ALC 180.
Referring now to FIGS. 3A-3C, the cross-sectional views of three preferred IP-LUT configurable processors 300 are shown. These preferred embodiments are located in multi-chip packages (MCP). Among them, the IP-LUT configurable processor package 300 in FIG. 3A comprises two separate dice: a logic die 100 and a memory die 200. The memory die 200 is stacked on top of the logic die 100, while the logic die 100 is stacked on the package substrate 110. The memory die 200 is flipped and bonded face-to-face with the logic die 100. Micro-bumps 116 act as the inter-die connections 160 and provide electrical coupling between the dice 100, 200. Both the memory die 200 and the logic die 100 are located in a same package 130. In this preferred embodiment, the memory die 200 comprises the programmable memory arrays 170 for a plurality of configurable computing elements, while the logic die 100 comprises the ALCs 180 for a plurality of configurable computing elements, as well as another plurality of configurable logic elements (as shown in FIGS. 5-6). Alternatively, the logic die 100 may be stacked on top of the memory die 200. Optionally, neither dice 100, 200 have to be flipped.
The IP-LUT configurable processor package 300 in FIG. 3B comprises a logic die 100, an interposer 120 and a memory die 200. The interposer 120 comprise a plurality of through-silicon vias (TSV) 118. The TSVs 118 provide electrical couplings between the logic die 100 and the memory die 200, offer more freedom in design and facilitate heat dissipation. In this preferred embodiment, the TSVs 118 and the micro-bumps 116 collectively form the inter-die connections 160.
The IP-LUT configurable processor package 300 in FIG. 3C comprises at least two logic dice 100A, 100B, and at least a memory die 200. These dice 100A, 100B, 200 are separate dice and they are located in a same package 130. The memory die 200 is stacked on top of the logic die 100B, and the logic die 100B is stacked on top of the logic die 100A. The dice 100A, 100B, 200 are electrically coupled with the TSVs 118 and the micro-bumps 116. In this preferred embodiment, the memory die 200 comprises the programmable memory arrays 170 for a plurality of configurable computing elements, the logic die 100B comprises the ALCs 180 for a plurality of configurable computing elements, while the logic die 100A comprises another plurality of configurable logic elements (as shown in FIGS. 5-6). Alternatively, the IP-LUT configurable processor 300 may comprise more than one memory die 200. In this case, the IP-LUT will have a large capacity than that in FIG. 3A. Similarly, the TSVs 118 and the micro-bumps 116 in this figure collectively form the inter-die connections 160.
Because the logic die 100 and the memory die 200 are located in a same package, this type of vertical integration is referred to as 2.5-D integration. The 2.5-D integration has a profound effect on the computational density and computational complexity. For the conventional 2-D integration, the footprint of a conventional processor 00X is roughly equal to the sum of those of the ALU 100X and the LUT 200X. On the other hand, because the 2.5-D integration moves the LUT from aside to above, the IP-LUT configurable processor 300 becomes smaller and computationally more powerful. In addition, the total LUT capacity of the conventional processor 00X is less than 100 Kb, whereas the total IP-LUT capacity for the IP-LUT configurable processor 300 could reach 100 Gb. Consequently, a single IP-LUT configurable processor 300 could support as many as 10,000 built-in functions (including various types of complex functions), far more than the conventional processor 00X. Moreover, the 2.5-D integration can improve the communication throughput between the IP-LUT 170 and the ALC 180. Because they are physically close and coupled by a large number of inter-die connections 160, the IP-LUT 170 and the ALC 180 have a larger communication throughput than that between the LUT 200X and the ALU 100X in the conventional processor 00X. Lastly, the 2.5-D integration benefits manufacturing process. Because the logic die 100 and the memory die 200 are separate dice, the logic transistors in the logic die 100 and the memory transistors in the memory die 200 are formed on separate semiconductor substrates. Consequently, their manufacturing processes can be individually optimized.
Referring now to FIGS. 4A-4C, more details on a preferred configurable computing element 300-i are disclosed. It comprises a pre-processing circuit 180R, a post-processing circuit 180T and at least a programmable memory array 170 for storing the LUT(s) for a mathematical function. The pre-processing circuit 180R converts the input variable (X) 150 into an address (A) 160A of the programmable memory array 170. After the data (D) 160D at the address (A) is read out from the programmable memory array 170, the post-processing circuit 180T converts it into the output value (Y) 190. A residue (R) of the input variable (X) is fed into the post-processing circuit 180T to improve the computational precision. In this example, the pre-processing circuit 180R and the post-processing circuit 180T are formed in the logic die 100. Alternatively, at least a portion of the pre-processing circuit 180R and the post-processing circuit 180T may be formed in the memory die 200.
FIG. 4B shows a preferred configurable computing element 400 realizing a single-precision mathematical function Y=f(X). The IP-LUT 170 includes two LUTs 170Q, 170R with 2 Mb capacity each (16-bit input and 32-bit output): the LUT 170Q includes the functional value of the mathematical function, i.e. D1=f(A), while the LUT 170R includes the first-order derivative value of the mathematical function, i.e. D2=f′(A). The ALC 180 comprises a pre-processing circuit 180R (mainly comprising an address buffer) and a post-processing circuit 180T (comprising an adder 180A and a multiplier 180M). The inter-die connections 160 transfer data between the ALC 180 and the IP-LUT 170. During computation, a 32-bit input variable X (x31 . . . x0) is sent to the IP-LUT configurable processor 300 as an input 150. The pre-processing circuit 180R extracts the higher 16 bits (x31 . . . x16) and sends it as a 16-bit address input A to the IP-LUT 170. The pre-processing circuit 180R further extracts the lower 16 bits (x15 . . . x0) and sends it as a 16-bit input residue R to the post-processing circuit 180T. The post-processing circuit 180T performs a polynomial interpolation to generate a 32-bit output value Y 190. In this case, the polynomial interpolation is a first-order Taylor series: Y(X)=D1+D2*R=f(A)+f′(A)*R. Apparently, a higher-order polynomial interpolation (e.g. higher-order Taylor series) can be used to improve the computational precision.
When realizing a mathematical function, combining the LUT with polynomial interpolation can achieve a high precision without using an excessively large LUT. For example, if only LUT (without any polynomial interpolation) is used to realize a single-precision function (32-bit input and 32-bit output), it would have a capacity of 232*32=128 Gb. By including polynomial interpolation, significantly smaller LUTs can be used. In the above embodiment, a single-precision function can be realized using a total of 4 Mb LUT (2 Mb for the functional values, and 2 Mb for the first-order derivative values) in conjunction with a first-order Taylor series. This is significantly less than the LUT-only approach (4 Mb vs. 128 Gb).
FIG. 4C lists preferred LUT sizes and Taylor series required to realize mathematical functions with different precisions. It uses a range-reduction method taught by Harrison. For the half precision (16 bit), the required IP-LUT capacity is 216*16=1 Mb and no Taylor series is needed; for the single precision (32 bit), the required IP-LUT capacity is 216*32*2=4 Mb and a first-order Taylor series is needed; for the double precision (64 bit), the required IP-LUT capacity is 216*64*3=12 Mb and a second-order Taylor series is needed; for the extended double precision (80 bit), the required IP-LUT capacity is 216*80*4=20 Mb and a third-order Taylor series is needed. To those skilled in the art, other combinations of LUT size and Taylor series can be used to optimize the LUT usage and arithmetic operations.
Besides transcendental functions, the preferred embodiment of FIGS. 4A-4B can be used to implement special functions. Special functions can be defined by means of power series, generating functions, infinite products, repeated differentiation, integral representation, differential difference, integral, and functional equations, trigonometric series, or other series in orthogonal functions. Important examples of special functions are gamma function, beta function, hyper-geometric functions, confluent hyper-geometric functions, Bessel functions, Legrendre functions, parabolic cylinder functions, integral sine, integral cosine, incomplete gamma function, incomplete beta function, probability integrals, various classes of orthogonal polynomials, elliptic functions, elliptic integrals, Lame functions, Mathieu functions, Riemann zeta function, automorphic functions, and others. The IP-LUT configurable processor will simplify the computation of special functions and promote their applications in scientific computation.
Referring now to FIGS. 5-6, a first preferred IP-LUT configurable computing array 700 is disclosed. It is a special type of the configurable processor 300 for implementing complex functions. The first preferred IP-LUT configurable computing array 700 comprises first and second configurable slices 700A, 700B. Each configurable slice (e.g. 700A) comprises a first array of configurable computing elements (e.g. 300AA-300AD) and a second array of configurable logic elements (e.g. 400AA-400AD). A configurable channel 620 is placed between the first array of configurable computing elements (e.g. 300AA-300AD) and the second array of configurable logic elements (e.g. 400AA-400AD). The configurable channels 610, 630, 650 are also placed between different configurable slices 700A, 700B. The configurable channels 610-650 comprise an array of configurable interconnects (represented by slashes at the cross-points in each configurable channel). For those skilled in the art, besides configurable channels, the sea-of-gates architecture may also be used.
The configurable computing elements 300AA-300BD are similar to those in the IP-LUT configurable processor 300 (FIG. 2B). Each configurable computing element 300-i comprises at least a programmable memory array 170 and an arithmetic logic circuit (ALC) 180. It can realize at least a basic function by loading the LUT for said basic function into the programmable memory array 170. The configurable logic elements 400AA-400BD and the configurable interconnects 610-650 are similar to those disclosed in Freeman (U.S. Pat. No. 4,870,302). Each configurable logic element can selectively realize any one of a plurality of logic operations in a logic library. A typical logic library includes a group of operations consisting of shift, logic NOT, logic AND, logic OR, logic NOR, logic NAND, logic XOR, addition “+”, and subtraction “−”. Each configurable interconnect can selectively couple or de-couple at least one interconnect line.
The first preferred IP-LUT configurable computing array 700 can realize a complex function by programming the configurable logic elements 400AA-400BD and the configurable computing elements 300AA-300BD. The complex function is a combination of basic functions, which can be implemented by selected configurable computing elements. The mathematical operations included in each basic function are not only more than the arithmetic operations included in the logic library of the configurable logic elements 400AA-400BD, but also more than the arithmetic operations performable by the ALC 180. In general, the arithmetic operations included in the logic library consist of addition and subtraction; and, the arithmetic operations performable by the ALC 180 consist of addition, subtraction and multiplication.
In one preferred IP-LUT configurable computing array 700, the programmable memory arrays 170 of the configurable computing elements 300AA-300BD are located on a different physical level than the configurable logic elements 400AA-400BD. For example, the programmable memory arrays 170 are located on a memory die 200, while the configurable logic elements 400 are located on a logic die. This logic die could be the same logic die 100 for the ALC 180, as in the case of FIG. 3A. Alternatively, it could be a different logic die, as in the case of FIG. 3C where a first logic dice 100A is used for the configurable logic elements 400AA-400BD, and a second logic die 100B is used for the ALCs 180. The memory die and the logic die are vertically stacked and preferably at least partially overlap.
FIG. 6 discloses an instantiation of the first preferred IP-LUT configurable computing array 700 for implementing a complex function, i.e. e=a·sin(b)+c·cos(d). The configurable interconnects in the configurable channel 610-650 use the same convention as Freeman: the interconnect with a dot means that the interconnect is connected; the interconnect without dot means that the interconnect is not connected; a broken interconnect means that two broken sections are un-coupled. In this preferred instantiation, the configurable computing element 300AA is configured to realize the function log( ), whose result log(a) is sent to a first input of the configurable logic element 400AA. The configurable computing element 300AB is configured to realize the function log[sin( )], whose result log[sin(b)] is sent to a second input of the configurable logic element 400AA. The configurable logic element 400AA is configured to realize addition, whose result log(a)+log[sin(b)] is sent the configurable computing element 300BA. The configurable computing element 300BA is configured to realize the function exp( ), whose result exp{log(a)+log[sin(b)]}=a·sin(b) is sent to a first input of the configurable logic element 400BA. Similarly, through proper configurations, the results of the configurable computing elements 300AC, 300AD, the configurable logic elements 400AC, and the configurable computing element 300BC can be sent to a second input of the configurable logic element 400BA. The configurable logic element 400BA is configured to realize addition, whose result a·sin(b)+c·cos(d) is sent to the output e. Apparently, by changing its configuration, the IP-LUT configurable computing array 700 can realize other complex functions.
The first preferred IP-LUT configurable computing array 700 is particularly suitable for realizing complex functions. If only LUT is used to realize the above 4-variable function, i.e. e=a·sin(b)+c·cos(d), an enormous LUT is needed: 216*216*216*216*16=256 Eb even for half precision, which is impractical. Using the IP-LUT configurable gate array 700, only 8 Mb LUT (including 8 configurable computing elements, each with 1 Mb capacity) is needed to realize a 4-variable function. To those skilled in the art, the first preferred IP-LUT configurable computing array 700 can be used to realize other complex functions.
Referring now to FIGS. 7-8B, a second preferred IP-LUT configurable computing array 700 is shown. Besides configurable computing elements 300A, 300B and configurable logic element 400A, this preferred embodiment further comprises a multiplier 500. The configurable channels 660-680 comprise a plurality of configurable interconnects. With the addition of the multiplier 500, the second preferred IP-LUT configurable computing array 700 can realize more mathematical functions with more computational power.
FIGS. 8A-8B disclose two instantiations of the second preferred IP-LUT configurable computing array 700. In the instantiation of FIG. 8A, the configurable computing element 300A is configured to realize the function exp(f), while the configurable computing element 300B is configured to realize the function inv(g). The configurable channel 670 is configured in such a way that the outputs of 300A, 300B are fed into the multiplier 500. The final output is then h=exp(f)*inv(g). On the other hand, in the instantiation of FIG. 8B, the configurable computing element 300A is configured to realize the function sin(f), while the configurable computing element 300B is configured to realize the function cos(g). The configurable channel 670 is configured in such a way that the outputs of 300A, 300B are fed into the configurable logic element 400A, which is configured to realize arithmetic addition. The final output is then h=sin(f)+cos(g).
While illustrative embodiments have been shown and described, it would be apparent to those skilled in the art that many more modifications than that have been mentioned above are possible without departing from the inventive concepts set forth therein. For example, the IP-LUT configurable processor of the present invention could be a micro-controller, a controller, a central processing unit (CPU), a digital signal processor (DSP), a graphic processing unit (GPU), a network-security processor, an encryption/decryption processor, an encoding/decoding processor, a neural-network processor, or an artificial intelligence (Al) processor. These IP-LUT configurable processors can be found in consumer electronic devices (e.g. personal computers, video game machines, smart phones) as well as engineering and scientific workstations and server machines. The invention, therefore, is not to be limited except in the spirit of the appended claims.

Claims (20)

What is claimed is:
1. A configurable processor including a plurality of configurable computing elements, each of said configurable computing elements comprising:
at least a programmable memory array on a memory level for storing at least a portion of a look-up table (LUT) for a mathematical function;
at least an arithmetic logic circuit (ALC) on a logic level for performing at least an arithmetic operation on selected data from said LUT, wherein said logic level is a different physical level than said memory level; and
means for communicatively coupling said programmable memory array and said ALC;
wherein said mathematical function includes more operations than arithmetic operations performable by said ALC.
2. The configurable processor according to claim 1, wherein said arithmetic operations performable by said ALC consist of addition, subtraction and multiplication.
3. The configurable processor according to claim 1, wherein said memory level is located on a memory die; and, said logic level is located on a logic die; said memory die and said logic die are vertically stacked.
4. The configurable processor according to claim 3, wherein said memory die and said logic die at least partially overlap.
5. The configurable processor according to claim 1, wherein said programmable memory array is a re-programmable memory array, whereby said configurable processor can be re-configured to realize different mathematical functions.
6. A configurable processor for implementing a mathematical function, comprising:
at least first and second programmable memory arrays on a memory level, wherein said first programmable memory array stores at least a first portion of a first look-up table (LUT) for a first mathematical function; and, said second programmable memory array stores at least a second portion of a second LUT for a second mathematical function;
at least an arithmetic logic circuit (ALC) on a logic level for performing at least an arithmetic operation on selected data from said first or second LUT, wherein said logic level is a different physical level than said memory level; and
means for communicatively coupling said first or second programmable memory array with said ALC;
wherein said mathematical function is a combination of at least said first and second mathematical functions; and, each of said first and second mathematical functions includes more operations than arithmetic operations performable by said ALC.
7. The configurable processor according to claim 6, wherein said arithmetic operations performable by said ALC consist of addition, subtraction and multiplication.
8. The configurable processor according to claim 6, wherein said memory level is located on a memory die; and, said logic level is located on a logic die; said memory die and said logic die are vertically stacked.
9. The configurable processor according to claim 8, wherein said memory die and said logic die at least partially overlap.
10. The configurable processor according to claim 6, wherein said first and second programmable memory arrays are re-programmable memory arrays, whereby said configurable processor can be re-configured to realize different mathematical functions.
11. A configurable computing array for implementing a mathematical function, comprising:
at least an array of configurable logic elements including a configurable logic element, wherein said configurable logic element selectively realizes a logic function in a logic library;
at least an array of configurable computing elements comprising at least a first programmable memory array, a second programmable memory array and an arithmetic logic circuit (ALC), wherein said first programmable memory array stores at least a first portion of a first look-up table (LUT) for a first mathematical function; said second programmable memory array stores at least a second portion of a second LUT for a second mathematical function; and, said ALC performs at least an arithmetic operation on selected data from said first or second LUT;
means for communicatively coupling said configurable logic elements and said configurable computing elements;
whereby said configurable computing array realizes said mathematical function by programming said configurable logic elements and said configurable computing elements, wherein said mathematical function is a combination of at least said first and second mathematical functions;
wherein each of said first and second mathematical functions includes more operations than arithmetic operations included in said logic library; and, each of said first and second mathematical functions includes more operations than arithmetic operations performable by said ALC.
12. The configurable computing array according to claim 11, wherein said arithmetic operations included in said logic library consist of addition and subtraction.
13. The configurable computing array according to claim 11, wherein said arithmetic operations performable by said ALC consist of addition, subtraction and multiplication.
14. The configurable computing array according to claim 11, further comprising at least a plurality of configurable interconnects including a configurable interconnect, wherein said configurable interconnect selectively realizes an interconnect from an interconnect library.
15. The configurable computing array according to claim 11, wherein said programmable memory array is a re-programmable memory array, whereby said configurable computing array can be re-configured to realize different mathematical functions.
16. The configurable computing array according to claim 11, wherein said first and second programmable memory array are located on a memory level; said configurable logic arrays are located on a logic level; and, said memory level and said logic level are different physical levels.
17. The configurable computing array according to claim 16, wherein said memory level is located on a memory die; said logic level is located on a logic die; and, said memory die and said logic die are vertically stacked.
18. The configurable computing array according to claim 17, wherein said memory die and said logic die at least partially overlap.
19. The configurable computing array according to claim 17, wherein said ALC is formed on said logic die.
20. The configurable computing array according to claim 17, wherein said ALC is formed on another logic die.
US16/203,599 2016-05-06 2018-11-28 Configurable processor with in-package look-up table Active US10445067B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/203,599 US10445067B2 (en) 2016-05-06 2018-11-28 Configurable processor with in-package look-up table

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
CN201610301645.8 2016-05-06
CN201610301645 2016-05-06
CN201610301645 2016-05-06
CN201710310865 2017-05-05
CN201710310865 2017-05-05
CN201710310865.1 2017-05-05
US15/588,642 US20170322771A1 (en) 2016-05-06 2017-05-06 Configurable Processor with In-Package Look-Up Table
US16/203,599 US10445067B2 (en) 2016-05-06 2018-11-28 Configurable processor with in-package look-up table

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US15/588,642 Continuation-In-Part US20170322771A1 (en) 2016-02-13 2017-05-06 Configurable Processor with In-Package Look-Up Table

Publications (2)

Publication Number Publication Date
US20190114138A1 US20190114138A1 (en) 2019-04-18
US10445067B2 true US10445067B2 (en) 2019-10-15

Family

ID=66097431

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/203,599 Active US10445067B2 (en) 2016-05-06 2018-11-28 Configurable processor with in-package look-up table

Country Status (1)

Country Link
US (1) US10445067B2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10445067B2 (en) * 2016-05-06 2019-10-15 HangZhou HaiCun Information Technology Co., Ltd. Configurable processor with in-package look-up table
US10782759B1 (en) 2019-04-23 2020-09-22 Arbor Company, Lllp Systems and methods for integrating batteries with stacked integrated circuit die elements
EP3959717A4 (en) * 2019-04-23 2023-05-31 Arbor Company LLLP Systems and methods for reconfiguring dual-function cell arrays
CN116097109B (en) 2020-06-29 2023-11-24 乔木有限责任合伙公司 Reconfigurable processor module using 3D die stacking and mobile IOT edge device for processor independent 5G modems

Citations (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4870302A (en) 1984-03-12 1989-09-26 Xilinx, Inc. Configurable electrical circuit having configurable logic elements and configurable interconnects
US5046038A (en) 1989-07-07 1991-09-03 Cyrix Corporation Method and apparatus for performing division using a rectangular aspect ratio multiplier
US5060182A (en) 1989-09-05 1991-10-22 Cyrix Corporation Method and apparatus for performing the square root function using a rectangular aspect ratio multiplier
US5604499A (en) 1993-12-28 1997-02-18 Matsushita Electric Industrial Co., Ltd. Variable-length decoding apparatus
US5835396A (en) 1996-10-17 1998-11-10 Zhang; Guobiao Three-dimensional read-only memory
US5901274A (en) 1994-04-30 1999-05-04 Samsung Electronics Co. Ltd. Method for enlargement/reduction of image data in digital image processing system and circuit adopting the same
US5954787A (en) 1996-12-26 1999-09-21 Daewoo Electronics Co., Ltd. Method of generating sine/cosine function and apparatus using the same for use in digital signal processor
US6181355B1 (en) 1998-07-17 2001-01-30 3Dlabs Inc. Ltd. Graphics processing with transcendental function generator
US6263470B1 (en) 1998-02-03 2001-07-17 Texas Instruments Incorporated Efficient look-up table methods for Reed-Solomon decoding
US20040044710A1 (en) 2002-08-28 2004-03-04 Harrison John R. Converting mathematical functions to power series
US7028247B2 (en) 2002-12-25 2006-04-11 Faraday Technology Corp. Error correction code circuit with reduced hardware complexity
US20060106905A1 (en) 2004-11-17 2006-05-18 Chren William A Jr Method for reducing memory size in logarithmic number system arithmetic units
US7206410B2 (en) 2001-10-10 2007-04-17 Stmicroelectronics S.R.L. Circuit for the inner or scalar product computation in Galois fields
US7366748B1 (en) 2000-06-30 2008-04-29 Intel Corporation Methods and apparatus for fast argument reduction in a computing system
US7472149B2 (en) 2004-01-21 2008-12-30 Kabushiki Kaisha Toshiba Arithmetic unit for approximating function
US7512647B2 (en) 2004-11-22 2009-03-31 Analog Devices, Inc. Condensed Galois field computing system
US7539927B2 (en) 2005-04-14 2009-05-26 Industrial Technology Research Institute High speed hardware implementation of modified Reed-Solomon decoder
US7558812B1 (en) * 2003-11-26 2009-07-07 Altera Corporation Structures for LUT-based arithmetic in PLDs
US7574468B1 (en) 2005-03-18 2009-08-11 Verisilicon Holdings (Cayman Islands) Co. Ltd. Digital signal processor having inverse discrete cosine transform engine for video decoding and partitioned distributed arithmetic multiply/accumulate unit therefor
US7634524B2 (en) 2003-12-12 2009-12-15 Fujitsu Limited Arithmetic method and function arithmetic circuit for a fast fourier transform
US20100289064A1 (en) * 2009-04-14 2010-11-18 NuPGA Corporation Method for fabrication of a semiconductor device and structure
US7962543B2 (en) 2007-06-01 2011-06-14 Advanced Micro Devices, Inc. Division with rectangular multiplier supporting multiple precisions and operand types
US20120129301A1 (en) * 2010-11-18 2012-05-24 Monolithic 3D Inc. System comprising a semiconductor device and structure
US8203564B2 (en) 2007-02-16 2012-06-19 Qualcomm Incorporated Efficient 2-D and 3-D graphics processing
US8487948B2 (en) 2007-05-01 2013-07-16 Vivante Corporation Apparatus and method for texture level of detail computation
US20140067889A1 (en) 2012-09-04 2014-03-06 Analog Devices A/S Datapath circuit for digital signal processors
US9015452B2 (en) 2009-02-18 2015-04-21 Texas Instruments Incorporated Vector math instruction execution by DSP processor approximating division and complex number magnitude
US9207910B2 (en) 2009-01-30 2015-12-08 Intel Corporation Digital signal processor having instruction set with an xK function using reduced look-up table
US9225501B2 (en) 2013-04-17 2015-12-29 Intel Corporation Non-linear modeling of a physical system using look-up table with polynomial interpolation
US9465580B2 (en) 2011-12-21 2016-10-11 Intel Corporation Math circuit for estimating a transcendental function
US9606796B2 (en) 2013-10-30 2017-03-28 Texas Instruments Incorporated Computer and methods for solving math functions
US20170237440A1 (en) * 2016-02-13 2017-08-17 HangZhou HaiCun Information Technology Co., Ltd. Processor Comprising Three-Dimensional Memory (3D-M) Array
US20170322771A1 (en) * 2016-05-06 2017-11-09 Chengdu Haicun Ip Technology Llc Configurable Processor with In-Package Look-Up Table
US20170322906A1 (en) * 2016-05-04 2017-11-09 Chengdu Haicun Ip Technology Llc Processor with In-Package Look-Up Table
US20170323041A1 (en) * 2016-05-04 2017-11-09 Chengdu Haicun Ip Technology Llc Simulation Processor with In-Package Look-Up Table
US20170322770A1 (en) * 2016-05-04 2017-11-09 Chengdu Haicun Ip Technology Llc Processor with Backside Look-Up Table
US20170323042A1 (en) * 2016-05-04 2017-11-09 Chengdu Haicun Ip Technology Llc Simulation Processor with Backside Look-Up Table
US20170322774A1 (en) * 2016-05-07 2017-11-09 Chengdu Haicun Ip Technology Llc Configurable Processor with Backside Look-Up Table
US20170329548A1 (en) * 2016-05-10 2017-11-16 Chengdu Haicun Ip Technology Llc Processor for Realizing at least Two Categories of Functions
US20190114170A1 (en) * 2016-02-13 2019-04-18 HangZhou HaiCun Information Technology Co., Ltd. Processor Using Memory-Based Computation
US20190114138A1 (en) * 2016-05-06 2019-04-18 HangZhou HaiCun Information Technology Co., Ltd. Configurable Processor with In-Package Look-Up Table

Patent Citations (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4870302A (en) 1984-03-12 1989-09-26 Xilinx, Inc. Configurable electrical circuit having configurable logic elements and configurable interconnects
US5046038A (en) 1989-07-07 1991-09-03 Cyrix Corporation Method and apparatus for performing division using a rectangular aspect ratio multiplier
US5060182A (en) 1989-09-05 1991-10-22 Cyrix Corporation Method and apparatus for performing the square root function using a rectangular aspect ratio multiplier
US5604499A (en) 1993-12-28 1997-02-18 Matsushita Electric Industrial Co., Ltd. Variable-length decoding apparatus
US5901274A (en) 1994-04-30 1999-05-04 Samsung Electronics Co. Ltd. Method for enlargement/reduction of image data in digital image processing system and circuit adopting the same
US5835396A (en) 1996-10-17 1998-11-10 Zhang; Guobiao Three-dimensional read-only memory
US5954787A (en) 1996-12-26 1999-09-21 Daewoo Electronics Co., Ltd. Method of generating sine/cosine function and apparatus using the same for use in digital signal processor
US6263470B1 (en) 1998-02-03 2001-07-17 Texas Instruments Incorporated Efficient look-up table methods for Reed-Solomon decoding
US6181355B1 (en) 1998-07-17 2001-01-30 3Dlabs Inc. Ltd. Graphics processing with transcendental function generator
US7366748B1 (en) 2000-06-30 2008-04-29 Intel Corporation Methods and apparatus for fast argument reduction in a computing system
US7206410B2 (en) 2001-10-10 2007-04-17 Stmicroelectronics S.R.L. Circuit for the inner or scalar product computation in Galois fields
US20040044710A1 (en) 2002-08-28 2004-03-04 Harrison John R. Converting mathematical functions to power series
US7028247B2 (en) 2002-12-25 2006-04-11 Faraday Technology Corp. Error correction code circuit with reduced hardware complexity
US7558812B1 (en) * 2003-11-26 2009-07-07 Altera Corporation Structures for LUT-based arithmetic in PLDs
US7634524B2 (en) 2003-12-12 2009-12-15 Fujitsu Limited Arithmetic method and function arithmetic circuit for a fast fourier transform
US7472149B2 (en) 2004-01-21 2008-12-30 Kabushiki Kaisha Toshiba Arithmetic unit for approximating function
US20060106905A1 (en) 2004-11-17 2006-05-18 Chren William A Jr Method for reducing memory size in logarithmic number system arithmetic units
US7512647B2 (en) 2004-11-22 2009-03-31 Analog Devices, Inc. Condensed Galois field computing system
US7574468B1 (en) 2005-03-18 2009-08-11 Verisilicon Holdings (Cayman Islands) Co. Ltd. Digital signal processor having inverse discrete cosine transform engine for video decoding and partitioned distributed arithmetic multiply/accumulate unit therefor
US7539927B2 (en) 2005-04-14 2009-05-26 Industrial Technology Research Institute High speed hardware implementation of modified Reed-Solomon decoder
US8203564B2 (en) 2007-02-16 2012-06-19 Qualcomm Incorporated Efficient 2-D and 3-D graphics processing
US8487948B2 (en) 2007-05-01 2013-07-16 Vivante Corporation Apparatus and method for texture level of detail computation
US7962543B2 (en) 2007-06-01 2011-06-14 Advanced Micro Devices, Inc. Division with rectangular multiplier supporting multiple precisions and operand types
US9207910B2 (en) 2009-01-30 2015-12-08 Intel Corporation Digital signal processor having instruction set with an xK function using reduced look-up table
US9015452B2 (en) 2009-02-18 2015-04-21 Texas Instruments Incorporated Vector math instruction execution by DSP processor approximating division and complex number magnitude
US20100289064A1 (en) * 2009-04-14 2010-11-18 NuPGA Corporation Method for fabrication of a semiconductor device and structure
US20120129301A1 (en) * 2010-11-18 2012-05-24 Monolithic 3D Inc. System comprising a semiconductor device and structure
US20120248595A1 (en) * 2010-11-18 2012-10-04 MonolithlC 3D Inc. System comprising a semiconductor device and structure
US9136153B2 (en) * 2010-11-18 2015-09-15 Monolithic 3D Inc. 3D semiconductor device and structure with back-bias
US8273610B2 (en) * 2010-11-18 2012-09-25 Monolithic 3D Inc. Method of constructing a semiconductor device and structure
US9465580B2 (en) 2011-12-21 2016-10-11 Intel Corporation Math circuit for estimating a transcendental function
US20140067889A1 (en) 2012-09-04 2014-03-06 Analog Devices A/S Datapath circuit for digital signal processors
US9225501B2 (en) 2013-04-17 2015-12-29 Intel Corporation Non-linear modeling of a physical system using look-up table with polynomial interpolation
US9606796B2 (en) 2013-10-30 2017-03-28 Texas Instruments Incorporated Computer and methods for solving math functions
US20170237440A1 (en) * 2016-02-13 2017-08-17 HangZhou HaiCun Information Technology Co., Ltd. Processor Comprising Three-Dimensional Memory (3D-M) Array
US20190114170A1 (en) * 2016-02-13 2019-04-18 HangZhou HaiCun Information Technology Co., Ltd. Processor Using Memory-Based Computation
US20170323042A1 (en) * 2016-05-04 2017-11-09 Chengdu Haicun Ip Technology Llc Simulation Processor with Backside Look-Up Table
US20170323041A1 (en) * 2016-05-04 2017-11-09 Chengdu Haicun Ip Technology Llc Simulation Processor with In-Package Look-Up Table
US20170322770A1 (en) * 2016-05-04 2017-11-09 Chengdu Haicun Ip Technology Llc Processor with Backside Look-Up Table
US20170322906A1 (en) * 2016-05-04 2017-11-09 Chengdu Haicun Ip Technology Llc Processor with In-Package Look-Up Table
US20170322771A1 (en) * 2016-05-06 2017-11-09 Chengdu Haicun Ip Technology Llc Configurable Processor with In-Package Look-Up Table
US20190114138A1 (en) * 2016-05-06 2019-04-18 HangZhou HaiCun Information Technology Co., Ltd. Configurable Processor with In-Package Look-Up Table
US20170322774A1 (en) * 2016-05-07 2017-11-09 Chengdu Haicun Ip Technology Llc Configurable Processor with Backside Look-Up Table
US20170329548A1 (en) * 2016-05-10 2017-11-16 Chengdu Haicun Ip Technology Llc Processor for Realizing at least Two Categories of Functions

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Harrison et al., "The Computation of Transcendental Functions on the IA-64 Architecture", Intel Technical Journal, Q4, 1999.

Also Published As

Publication number Publication date
US20190114138A1 (en) 2019-04-18

Similar Documents

Publication Publication Date Title
US20190114139A1 (en) Configurable Processor with Backside Look-Up Table
US10445067B2 (en) Configurable processor with in-package look-up table
US20170322774A1 (en) Configurable Processor with Backside Look-Up Table
US20170322771A1 (en) Configurable Processor with In-Package Look-Up Table
US20170322770A1 (en) Processor with Backside Look-Up Table
US20170322906A1 (en) Processor with In-Package Look-Up Table
US11907719B2 (en) FPGA specialist processing block for machine learning
US20190042924A1 (en) Hyperbolic functions for machine learning acceleration
US20170323042A1 (en) Simulation Processor with Backside Look-Up Table
US9577644B2 (en) Reconfigurable logic architecture
US20170323041A1 (en) Simulation Processor with In-Package Look-Up Table
US10763861B2 (en) Processor comprising three-dimensional memory (3D-M) array
US20190114170A1 (en) Processor Using Memory-Based Computation
US11809798B2 (en) Implementing large multipliers in tensor arrays
US10372359B2 (en) Processor for realizing at least two categories of functions
US20220230057A1 (en) Hyperbolic functions for machine learning acceleration
US10141939B2 (en) Configurable computing array using two-sided integration
US11128303B2 (en) Three-dimensional memory (3D-M)-based configurable processor singlet
US11960857B2 (en) Adder circuit using lookup tables
US20190115921A1 (en) Configurable Computing-Array Package
US20190115920A1 (en) Configurable Computing-Array Package Implementing Complex Math Functions
US20210117157A1 (en) Systems and Methods for Low Latency Modular Multiplication
US11768661B2 (en) Efficient logic blocks architectures for dense mapping of multipliers
US10148271B2 (en) Configurable computing array die based on printed memory and two-sided integration

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 4