WO2020095347A1

WO2020095347A1 - Reconfigurable circuit

Info

Publication number: WO2020095347A1
Application number: PCT/JP2018/041043
Authority: WO
Inventors: Xu Bai; Toshitsugu Sakamoto; Makoto Miyamura; Ryusuke Nebashi; Ayuka Tada
Original assignee: Nec Corporation
Priority date: 2018-11-05
Filing date: 2018-11-05
Publication date: 2020-05-14

Abstract

A semiconductor apparatus with a reconfigurable circuit is disclosed. The reconfigurable circuit comprises: a carry skip circuit; two or more logic elements, each of which includes: a first programmable circuit which can implement any 2-input logic function or a 3-input full-adder sum function; a second programmable circuit which can implement any 2-input logic function or a 3-input full-adder carry function; a third programmable circuit which can implement any 2-input logic function, wherein the carry skip circuit receives propagation control signals, each output from each of the logic elements, to select, as a carry out signal, either a carry in signal supplied to a first logic element, or a carry out signal output from last logic element, based on the propagation control signals.

Description

RECONFIGURABLE CIRCUIT

The present invention relates to a semiconductor apparatus, and more particularly to a semiconductor apparatus with a reconfigurable circuit using non-volatile resistive switches.

A typical semiconductor integrated circuit (IC) is constructed by transistors built on a semiconductor substrate and upper layer wires used to connect the transistors. Patterns of transistors and wires are determined in a design stage of the IC. Interconnections between the transistors and wires cannot be changed after fabrication. In order to improve flexibility of IC, field-programmable gate arrays (FPGAs) have been proposed and developed. In an FPGA, configuration data including operation and interconnection information of the FPGA is stored in memories of the FPGA, so that different logic operations and interconnections can be realized by configuring memories after fabrication of the FPGA, according to requirements of end users. Interconnections within the FPGA can be altered by controlling ON-and-OFF of switches in a routing multiplexer (MUX) or routing fabrics arranged in the FPGA in accordance with interconnection information stored in the memories of the FPGA.

Relatively large energy consumption of FPGAs limits integration of commercial FPGAs into IoT (Internet of Things) devices. In most of commercial FPGAs, SRAM (Static Random Access Memory) is used to store configuration data. Typically, each memory cell of SRAM is composed of six transistors. Currently, there are provided in a market such FPGA chips that are equipped with more than 10M (ten-million) memory cells of SRAM. These FPGA chips need extremely large area overhead and are accompanied with increase in cost and energy consumption. Furthermore, since SRAM cells in the FPGA chip are volatile, some externally provided circuits are needed to permanently store the configuration data. This causes to further increase cost and area overhead of the FPGA.

Recently, FPGAs with non-volatile resistive switches (NVRSs) such as Nanobridge (a registered trademark of NEC) (abbreviated as "NB") integrated between wires upon a transistor layer have been proposed to overcome the problems of SRAM-based FPGAs and achieve small area overhead (e.g., reference may be made to NPL 1). NVRS has ON and OFF states, and the ON/OFF resistance ratio is over 10⁴. There are mainly two kinds of NVRSs, one is ReRAM (Resistance Random Access Memory) using a transition metal oxide, the other one is Nano Bridge using ion conductor. NVRSs are used in routing blocks and LUT (Look Up Table) memories (e.g., reference may be made to PTL 1). Moreover, a CMOS (Complementary Metal Oxide Semiconductor) /NVRS hybrid circuit, where NVRS is stacked on a CMOS logic circuit, is introduced to remove a dedicated CMOS carry chain circuit in SRAM-FPGAs for further speed, power efficiency and area efficiency improvement (e.g., reference may be made to PTL 2 and NPL 2).

PTL 3 discloses a sub-array arranged as a reconfigurable LUT (RLUT) which performs an arithmetic logic function such as an adder, a multiplier, and the like and also discloses an 8 bit adder with a carry_in which is realized through cascading or routing signals between two RLUTs, each of which is configured as a 4 bit adder. PTL 4 discloses a carry-lookahead adder (CLA) that calculates one or more carry bits before sum adder to improve speed by reducing an amount of time required to determine a carry bit(s).

WO2015/198573 A1 WO2016/117134 A1 JP patent Kokai Publication No. JP2016-123092A JP patent Kokai Publication No. JP2003-330701A

NON-PATENT LITERATURE

Xu Bai et al., A low-power Cu atom switch programmable logic fabricated in a 40nm-node CMOS technology, Proc. IEEE Symp. VLSI Technol., 2017, pp. 28-29. Xu Bai et al., Area-Efficient Nonvolatile Carry Chain based on Pass-Transistor/Atom-Switch Hybrid Logic, Japanese Journal of Applied Physics 55, pp. 04EF01, 2016.

Summary

In a semiconductor apparatus including a reconfigurable circuit, such as FPGA, wherein series-connected CMOS/NVRS hybrid circuits implement an N-bit adder, delay time is proportional to N. This may cause a serious problem in case of a large N. Accordingly, it is an object to provide a semiconductors apparatus enabling to reduce delay time of N-bit adder.

In accordance with one aspect of the present invention, there is provided a semiconductor apparatus comprising a reconfigurable circuit that comprises:
a carry skip circuit;
first to M-th logic elements, where M is an integer of 2 or more, each of the logic elements comprising:
a first programmable circuit that is able to be programmed to implement a 2-input logic function or a full-adder sum function to output a sum of two 1-bit inputs and a carry in;
a second programmable circuit that is able to be programmed to implement a 2-input logic function or a full-adder carry function to output a carry out based on the two 1-bit inputs and the carry in; and
a third programmable circuit that is able to be programmed to implement 2-input logic function,
wherein the carry out output from i-th logic element is supplied as the carry in to (i+1)-th logic element, where i is an integer from 1 to M-1,
each of the first to M-th logic elements, based on the two 1-bit inputs supplied thereto, generates a propagation control signal, and
wherein the carry skip circuit receives first to M-th propagation control signals output in parallel from the first to M-th logic elements, respectively, to select, as a final carry out signal, one of a carry in signal supplied to the first logic element and a carry out signal output from the M-th logic element, based on a combination of values of the first to M-th propagation control signals.

According to the present invention, it is made possible to provide a semiconductors device enabling to reduce delay time of N-bit adder.

Figs.1A and 1B are diagrams illustrating a structure of a cell array and a cell of a FPGA using NVRSs according to a first example embodiment of the present invention. Figs.2A and 2B are diagrams illustrating schematic views of a logic block without a carry skip circuit and FA-type 4 LUT included therein, respectively, according to a first example embodiment of the present invention. Fig.3 is a diagram illustrating a transistor-level view of the FA-type 4-LUT, according to a first example embodiment of the present invention. Figs.4A and 4B are diagrams illustrating implementation of a 2-bit adder in a logic block without a carry skip circuit and a 1-bit adder included in the 2-bit adder, respectively, according to a first example embodiment of the present invention. Fig.5 is a diagram illustrating a critical path in an 8-bit adder implemented by multiple cells without carry skip circuits according to a first example embodiment of the present invention. Figs.6A and 6B are diagrams illustrating a logic block with a carry skip circuit and FA-type 4 LUT included therein, respectively, according to a first example embodiment of the present invention. Fig.7 is a diagram illustrating a critical path in an 8-bit adder implemented by multiple cells with carry skip circuits according to a first example embodiment of the present invention. Figs.8A, 8B, and 8C are diagrams illustrating a logic block with a carry skip circuit, FA-type 4-LUT included therein, and a transistor-level AND gate, respectively, according to a second example embodiment of the present invention. Figs.9A and 9B are diagrams illustrating implementation of a 2-bit adder in a logic block with a carry skip circuit and a 1-bit adder included in the 2-bit adder, respectively, according to a second example embodiment of the present invention. Fig.10 is a diagram illustrating a logic block with four LUTs using a carry skip circuit according to a second example embodiment of the present invention. Fig.11 is a diagram illustrating a 16-bit adder implemented by multiple cells with carry skip circuits according to a second example embodiment of the present invention. Fig.12 is a diagram illustrating delay time comparison between N-bit adders with carry skip circuits and without carry skip circuits according to a second example embodiment of the present invention. Figs.13A, 13B, and 13C are diagrams illustrating a logic block with a carry skip circuit. FA-type 4-LUT included therein, and a transistor-level NOR gate, respectively, according to a third example embodiment of the present invention. Figs.14A and 14B are diagrams illustrating implementation of a 2-bit adder in a logic block with a carry skip circuit and a 1-bit adder included in the 2-bit adder, respectively, according to a third example embodiment of the present invention.

Example embodiments of the present invention will be described with reference to the accompanying drawings.

(First example embodiment)
Figs.1A and 1B each schematically illustrate an arrangement of an FPGA using NVRSs. NVRS-FPGA is constructed by a reconfigurable cell array 1 as illustrated in Fig.1A. Each cell 10 in the cell array 1 is composed of a routing block 102 and a logic block 101, as illustrated in Fig.1B. The routing block 102 adopts a crossbar switch structure wherein non-volatile-switch-cells (NVSCs) 103 are allocated at cross points for programmable data transfer control. The NVSC is constructed by one or more NVRSs. As described in PTL 2, the NVSC may have two kinds of structures: a 1-transistor 1 NVRS (1T1R) and a 1-transistor 2 NVRSs (1T2R). NVSCs are also used to implement memories in LUTs for function configuration. Though not particularly limited thereto, the NVRS may include a metal oxide resistance change element or a solid electrolyte resistance change element (solid-electrolyte switch) such as "NanoBridge" with a dual-damascene Cu interconnect using a highly reliable bilayer solid-electrolyte (e.g., TaSiO/Ta_2O_5) and a thin oxidation barrier, resulting in an excellent ON/OFF ratio at a low ON resistance.

In the logic block 101 of the cell 10, two or more look-up tables (LUTs)/adders (11₁,11₂) are provided to implement logic operation or addition. That is, in Fig.1B, two LUT/adders (11₁,11₂) are illustrated as an example, but the number of LUT/adders in the logic block 101 is not limited to two. Each of the LUT/adders (11₁,11₂) is configured as a CMOS/NVRS hybrid full-adder (FA)-type. D-flip-flops (D-FF) (12₁,12₂) are used to store a signal at data input terminal (D) (connected to an output of the LUT/adder) responsive to a rising edge of a clock signal (provided to a clock input terminal indicated as a triangle symbol) to output a signal from a data output terminal (Q). Each of multiplexers (MUXs) (13₁,13₂) receives an output from each of the D-FFs (12₁,12₂) and the output of the LUT/adder to select either one of the signals received to output the selected signal to the routing block 102.

Fig.2A illustrates a schematic view of a logic block 101 without a carry skip circuit. The logic block 101 is composed of FA (Full Adder)- type 4-LUTs (11₁,11₂), D-flip-flops (DFFs) (12₁,12₂) and 2:1 MUXs (13₁,13₂).

The DFFs (12₁,12₂) store the operation results from the FA-type 4-LUTs (11₁,11₂). The MUXs (13₁,13₂) select outputs from the FA-type LUTs (11₁,11₂) or the DFFs (12₁,12₂). Though not limited thereto, FA-type 4-input LUT (4-LUT) as an example is used to explain the structure of the present embodiment. The FA- type 4-LUTs (11₁,11₂) may each also be termed as a logic element.

Fig.2B schematically illustrates a configuration of the FA-type 4-LUTs (11₁). The FA-type 4-LUT (11₂) has the same configuration as the FA-type 4-LUTs (11₁), except with two 1-bit inputs (A1, B1) supplied as an operand input bit pair, and C₀ (carry out (Cout) from the FA-type 4-LUT (11₁)) supplied as carry in (Cin) and with S1 and C1 output as a sum and a carry out signal.

The FA-type 4-LUT (11₁) consists of two 2-input LUTs (2-LUTs) (113,114), a sum-type 2-LUT (111), a carry-type 2-LUT (112) and a 4:1 MUX (115). Each of 2-input LUTs can implement any 2-input logic function. 2-LUTs (111-114) in the FA-type 4-LUT may each also be termed as a programmable circuit.

The sum-type 2-LUT (111) is able to be programmed to implement any 2-input logic function or a 3-input FA sum function. The carry-type 2-LUT (112) is able to be programmed to implement any 2-input logic function or a 3-input FA carry function. The 3-input FA sum function receives two 1-bit inputs (operand input bit pair) and a carry in to output a sum, while the 3-input FA carry function receives, in common with the 3-input FA sum function, the two 1-bit inputs (operand input bit pair) and the carry in to output a carry out.

FA inputs, for each i-th bit, an operand input bit pair: (A_i, B_i) and carry in: C_i and outputs SUM: S_i and carry out: C_i+1.

is an exclusive or (xor).

If (A_i xor B_i) = 0 then SUM: S_i = C_i else S_i = ~C_i (C_{i_bar}: also termed as inverse C_i).
If (A_i xor B_i) = 0 then C_i+1 = A_i else C_i+1 = C_i.
That is, if (A_i, B_i) = (1,1) or (0,0), then C_i+1 =1 or 0 (= A_i), while if (A_i, B_i) = (1,0) or (0,1), then C_i+1 = C_i.

In Fig.2B, Cin signal is applied to both the sum-type 2-LUT (111) and the carry-type 2-LUT (112). Reference may be made to PTL 2 or NPL 2.

The carry-type 2-LUT (112) outputs Cout to a next FA-type 4- LUT. As a result, the FA-type 4-LUT can implement any 4-input LUT or a 2-bit FA.

Fig.3 illustrates a transistor-level view of the FA-type 4-LUT based on the CMOS/NVRS hybrid logic technology introduced in PTL 2 or NPL 2. The FA-type 4-LUT is composed of five 4:1 MUXs (MUX0-MUX4) and an NVSC array.

1T1R NVSC array is used as an example to illustrate an arrangement of the FA-type 4-LUT. 1T1R NVSC array has input lines coupled to a power line: Vdd, a ground line: Gnd, carry in line: Cin and its inverse line: ~Cin, and has output lines coupled to every four input terminals (ports) V₁~V₁₆ of MUX0 - MUX3.

In the 1T1R-NVSC array illustrated in Fig.3, each of the 1T1R-NVSCs has 2 terminals, where a first terminal is connected to a first wire (input line) disposed in a first direction (vertical direction in Fig. 3) (one out of four first wires coupled respectively to a power line: Vdd, a ground line: Gnd, a carry in line: Cin and an inverse carry in line: ~Cin one-to-one), while a second terminal is connected to a second wire (output line) disposed in a second direction (horizontal direction in Fig. 3) (one out of second wires coupled to V₁ to V₁₆ one-to-one).

One terminal of the NVRS in the 1T1R-NVSC is connected to a source of a transistor, whose gate and drain are connected respectively to a corresponding control signal Ctrl_x and a corresponding first wire. Control signals Ctrl_x and Ctrl_y are used to determine an address of the NVSC to be configured, where Ctrl_y is connected to a gate of a transistor provided between one end of the first wire and a programming voltage line PV_y. The transistor in the 1T1R-NVSC works as a switch to access the NVSC selected by control signals Ctrl_x and Ctrl_y and to isolate unselected NVSCs. Only when the transistor is switched ON, the selected NVSC can be configured.

The 1T1R-NVSCs are arranged at the crosspoints: (C_in, V₁), (~C_in, V₂), (~C_in, V₃), (C_in, V₄), (C_in, V₆) and (C_in, V₇) to construct a MUX input switch block for implementation of the FA. 1T1R-NVSCs arranged at crosspoints: (Vdd, V₁) - (Vdd, V₁₆) and (Gnd, V₁) - (Gnd, V₁₆) constructs memories M₁ ~ M₁₆ for implementation of any 4-variable function. Each memory M_i (i is an integer of 1 to16) is a tri-state memory including two 1T1R-NVSCs, the second terminal of which is connected to the first wires to which Vdd and GND are applied, respectively.

When the transistor of the 1T1R-NVSC in the memory Mi having a drain connected to the first wire of Vdd is configured as "ON" and the transistor of the 1T1R-NVSC in the memory Mi having a drain connected to the first wire of Gnd is configured as "OFF", the memory Mi provides a Vdd state. When the transistor of the 1T1R-NVSC in the memory Mi having a drain connected to the first wire to which Vdd is supplied, is configured as "OFF" and the transistor of the 1T1R-NVSC having a drain connected to the first wire to which Gnd is supplied, is configured as "ON", the memory Mi provides a Gnd state. When the transistors of the two 1T1R-NVSCs in the memory Mi are configured as "OFF", the memory Mi is set in a high impedance state.

Programming voltages PV_x and PV_y are used to configure NVSCs as "ON" or "OFF". A write enable signal WE is used to enable a configuration mode. In the configuration mode, for example, in order to program NVSC (1, 1) as "ON", the programming voltage lines PV_X and PV_y are set to Vset (Set voltage for NVRS) and Gnd, respectively. WE, Ctrlx₁ and Ctrly₁ are set to "1"(e.g., High level), and Ctrlx₀ and Ctrlyo are set to "0"(e.g., Low level). Vset and Gnd are applied to the two terminals of the NVSC (1, 1) which can be configured as "ON". On the other hand, if NVSC (1, 1) is to be programed as "OFF", PV_x and PV_y are set to Gnd and Vreset (reset voltage for NVRS), respectively.

In an operation mode, WE, Ctrl_y0 and Ctrl_yi are set to "0" to turn off PVX and PV_y, and Ctrl_x0 and Ctrl_x1 are set to "1" to turn on a data transfer path, so that data inputs (Cin and ~Cin) can be switched according to "ON and OFF" of NVSCs. In Fig. 3, Cin is applied through the NVSCs configured as "ON" to the second wires which are coupled respectively to V₁, V₄, V₆, and V₇, while ~Cin are applied through the NVSCs configured as "ON" to the second wires which are coupled respectively to V₂ and V₃.

In order to implement the FA, all the 1T1R-NVSCs in the MUX input switch block are configured as "ON", the memory M5 is configured as the Vdd state, the memory M8 is configured as the Gnd state, and the other memories are configured as the high impedance state.

In Fig.3, MUX0 that corresponds to a MUX included in the sum-type 2-LUT (111) selects, as OUT_IM1(SUM), input terminals V₁, V₂, V₃, and V₄, if (A, B) = (1, 1), (0, 1), (1, 0), and (0, 0), respectively. V₁ and V4 are connected to Cin, via NVSC (0, 0) and NVSC (3, 0), respectively, both of which are programmed as "ON", while V2 and V3 are connected to ~Cin,via NVSC (1, 1) and NVSC (2, 1), respectively, both of which programmed as "ON". In case (A, B) = (1, 1) or (0, 0), SUM =Cin, while in case (A, B) = (1, 0) or (0, 1), SUM =~Cin.

MUX1 that corresponds to a MUX included in the carry-type 2-LUT (112) selects, as OUT_IM2(Cout), input terminals V₅, V₆, V₇, and V₈, if (A, B) =(1, 1), (0, 1), (1, 0), and (0, 0), respectively, wherein V5 which is connected to the memory M5 in the Vdd state, is set to Vdd, V6 and V7 are connected to Cin via NVSC (5, 0) and NVSC (6, 0), respectively, both of which are programmed as "ON, and V8 which is connected to the memory M8 in the Gnd state, is set to Gnd. In case (A, B) = (1,1), Cout=1, in case (A, B) = (0,0), Cout=0, and in case (A,B) = (1,0) or (0,1), Cout = Cin.

MUX2 that corresponds to a MUX included in the 2-LUT (113) selects, as OUT_IM3, input terminals V₉, V₁₀, V₁₁, and V₁₂ that are all in a high impedance state, if (A, B) = (1, 1), (0, 1), (1, 0), and (0, 0), respectively. MUX3 that corresponds to a MUX included in the 2-LUT (114) selects, as OUT_IM4, input terminals V₁₃, V₁₄, V₁₅, and V₁₆ that are all in a high impedance state, if (A, B) = (1, 1), (0, 1), (1, 0), and (0, 0), respectively.

MUX4 that corresponds to the MUX 115, selects OUT_IM1(SUM), OUT_IM2(Cout), OUT_IM3,and OUT_IM4, in case (IN2, IN3) = (1, 1), (0, 1), (1, 0), and (0, 0), respectively.

Fig.4A illustrates an implementation example of 2-bit adder in the logic block 101 without a carry skip circuit. The operand input bit pairs (A0, B0) and (A1, B1) are applied respectively to the two FA-type 4-LUTs (11₁, 11₂) in parallel. Each FA-type 4-LUT (11₁, 11₂) is programmed to implement 1-bit adder (1-bit full adder). A carry in (Cin) signal from a last cell (not shown) is applied to the first FA-type 4-LUT (11₁) to implement the first 1-bit adder. In case where a last cell does not exist, that is, the FA-type 4-LUT (11₁) is included in the first cell of an N-bit adder, "0" is applied as the carry in (Cin) signal. A carry out signal C0 of the first FA-type 4-LUT (11₁) is applied to the second FA-type 4-LUT (11₂) to implement the second 1-bit adder. The second bit carry out signal Cout is applied to a next cell (not shown). In case where a next cell does not exist, that is, the FA-type 4-LUT (11₂) is included in the last cell of the N-bit adder, the second bit carry out signal Cout is output as a carry out signal of the N-bit adder.

Fig.4B schematically illustrates the circuit configuration of the FA-type 4-LUT (11₁). The FA-type 4-LUT (11₂) has the same configuration as the FA-type 4-LUT (11₁), except with two 1-bit inputs (A1, B1) supplied as an operand input bit pair, and C0 (carry out (Cout) from the FA-type 4-LUT (11₁)) supplied as carry in (Cin) and with S1 and C1 output as a sum and a carry out signal. As illustrated in Fig.4B, in each FA-type 4-LUT (1-bit adder) (11₁), the sum-type 2-LUT (111) and the carry-type 2-LUT (112) are programmed to implement FA (Full-Adder) sum and carry functions, respectively, while the other two 2-LUTs (113, 114) are not used in an adder mode.

In Fig.4B, with both selection signals IN2 and IN3 being set to "0", MUX 115 (which corresponds to MUX 4 in Fig.3) of the first FA-type 4-LUT (11₁) selects a sum result S0 (output of FA sum 111), as the output thereof. MUX 115 (which corresponds to MUX 4 in Fig.3) of the second FA-type 4-LUT (11₂) selects a sum result S1 (output of FA sum 111), as the output thereof.

For each 1-bit adder, the carry out: Cout is equal to "1" in case of (Ai, Bi) = (1, 1), for i=0,1, and equal to "0" in case of (Ai, Bi) = (0, 0), while Cout is equal to carry input Cin in case of (Ai, Bi) = (1, 0) or (0, 1).

Therefore, a worst delay case is that both (A0 xor B0) =1 and (A1 xor B1) = 1, and Cin is propagated to Cout.

A delay time of 1-bit carry calculation in each FA-type 4-LUT (11₁, 11₂) is denoted as D_CP, and a delay time of 1-bit sum calculation therein is denoted as D_SUM. Critical path delay becomes
D_CP+ D_SUM.

Fig.5 illustrates an N-bit adder implementation using logic blocks, each without a carry skip circuit. N is set to 8 as an example. Four cells are connected in serial to implement an 8-bit adder. The dotted line indicates the critical path and its delay is equal to 7*D_CP+ D_SUM.

If it is necessary to implement an N-bit adder, the critical path delay becomes
(N-1)* D_CP+ D_SUM.

This critical path delay may cause a serious problem in the FPGA when N is large.

Fig.6A illustrates a schematic view of a logic block 101A with a carry skip circuit according to the first example embodiment. As compared with the logic block 101 in Fig.2A, a carry skip circuit 14 is added to speed up carry signal propagation. The logic block 101 without a carry skip circuit as illustrated in Fig.2A and Fig.4A may be regarded as a related example embodiment.

The first and second FA-type 4-LUTs (11A₁, 11A₂) output respectively propagation control signals P0 and P1 that are supplied in parallel to the carry skip circuit 14. The carry in signal: Cin applied to the first FA-type 4-LUT (11A₁) from a last cell (not shown) and the carry out signal: C1 of the second FA-type 4-LUT (11A₂) are applied to the carry skip circuit 14.

Fig.6B schematically illustrates the circuit configuration of the FA-type 4-LUT (11A₁). The FA-type 4-LUT (11A₂) has the same configuration as the FA-type 4-LUT (11A₁), except with two 1-bit inputs (A1, B1) supplied as an operand input bit pair, and C₀ (carry out (Cout) from the FA-type 4-LUT (11₁)) supplied as carry in and with S1, C1 and P1 output as a sum, carry out signal and propagation control signal.

One of the 2-LUTs (113, 114) in the first FA-type 4-LUT (11A₁) generates a propagation control signal P0. In the same manner, one of the 2-LUTs (113, 114) in the second FA-type 4-LUT (11A₂) generates a propagation control signal P1. In Fig.6B, the 2-LUT (113) is programmed to generate the propagation control signal P0. One of the 2-LUTs (113,114) in the first and second FA-type 4-LUT (11A₁,11A₂) may be programmed to generate the propagation control signals P0 and P1, based on (A0 xor B0) and (A1 xor B1) for example, respectively, though not limited thereto.

The carry skip circuit 14 uses the propagation control signals P0 and P1 to select Cin or C1 as a carry out (also termed as a final carry out) Cout which is applied to a next cell (not shown). If both (A0 xor B0) = 1 and (A1 xor B1) = 1, the carry skip circuit 14 selects Cin as Cout, otherwise C1 as Cout.

The delay (D_SK) of the carry skip circuit 14 is much smaller than D_CPand D_SUM.

Fig.7 illustrates an N-bit adder implementation using logic blocks with carry skip circuits as described with reference to Fig.6A. In Fig.7, the first FA-type 4-LUT (11A₁) and the second FA-type 4-LUT (11A₂) in Fig.6A, are denoted as LUT0 and LUT1, respectively, and N is also set to 8, as an example.

All the input operands (4 sets of two 1-bit operand input pairs):
(A0, B0) and (A1, B1),
(A2, B2) and (A3, B3),
(A4, B4) and (A5, B5), and
(A6, B6) and (A7, B7) are applied respectively to CELL0, CELL1, CELL2, and CELL3, in parallel, so that the propagation control signals:
(P00, P01), (P10, P11), (P20, P21), and (P30, P31) output respectively from LUT0 and LUT1 in CELL0, CELL1, CELL2, and CELL3 are generated simultaneously.

In Fig.7, a dotted line indicates a critical path when the following holds:
(A0 xor B0) =0 and (A1 xor B1)=1 in the CELL0,
(A2 xor B2)=1 and (A3 xor B3)=1 in the CELL1,
(A4 xor B4)=1 and (A5 xor B5)=1 in the CELL2, and
(A6 xor B6)=1, and (A7 xor B7)=1 in the CELL3.

In the CELL0, the first FA-type LUT0 generates a carry out C00 as "1" or "0", since (A0 xor B0)=0. The FA-type LUT1 propagates C00 to its carry out C01, since (A1 xor B1)=1. Then, the skip circuit 14 selects C01 as Cout0 which is applied to CELL1.

The critical delay in the CELL0 is 2*D_CP + D_SK.

In CELL1 or CELL2, a carry in signal is propagated to the next cell by the skip circuits 14, instead of FA-type LUTs (LUT0 and LUT1), so the delay is D_SK.

In CELL3, a delay to generate a final sum signal SUM33, is
D_CP+ D_SUM.

As a result, the total critical path delay is
3*D_CP + 3* D_SK+ D_SUM, which is much smaller than the critical path delay:
7*D_CP+ D_SUM of the 8-bit adder implementation using logic blocks 101 without carry skip circuits.

(Second example embodiment)
Next, a second example embodiment will be described. The present embodiment discloses a gate-level carry skip circuit.

Fig.8A illustrates a logic block with a carry skip circuit 14. Referring to Fig.8A, the carry skip circuit 14 includes an AND gate 141 and a MUX 142. The AND gate 141 receives as inputs the propagation control signals P0 and P1 to generate a select signal SEL that is supplied to the MUX 142 as a selection control signal. If the select signal is "1", then the MUX 142 selects Cin, else selects C1, as a carry out to a next cell (final carry out) Cout.

Fig.8B schematically illustrates the circuit configuration of the FA-type 4-LUT (11A₁) in Fig. 8A. The FA-type 4-LUTs (11A₁, 11A₂) in Fig.8B are the same as those described with reference to Fig. 6B and thus the description thereof is omitted for brevity.

Fig.8C illustrates a transistor-level CMOS AND gate constructed by six transistors, wherein P-channel MOS transistors PM1 and PM2 and N-channel MOS transistors NM1 and NM2 compose a CMOS NAND gate which is followed by a CMOS inverter including a P-channel MOS transistor PM3 and a N-channel MOS transistor NM3. Only when both IN0 and IN1 are "1", OUT is "1", otherwise "0". That is, when both IN0 and IN1 are "1"(at high level), NM1 and NM2 with gates supplied with a high level are turned on (while PM1 and PM2 with gates supplied with a high level are turned off), to supply a low level to a gate of PM3, which is turned on (while NM3 is turned off), thus setting OUT to a high level.

Fig.9A illustrates implementation of a 2-bit adder in a logic block 101A with a carry skip circuit 14 according to the present embodiment. The operand input bit pairs (A0, B0) and (A1, B1) are applied respectively to the first and second FA-type 4-LUTs (11A₁, 11A₂) in parallel. Each FA-type 4-LUT (11A₁, 11A₂) is programmed to implement 1-bit adder. A carry in (Cin) signal from the last cell is applied to a first FA-type 4-LUT (11A₁) to implement the first 1- bit adder and its carry out signal C0 is applied to a second FA-type 4-LUT (11A₂) to implement the second 1-bit adder.

In each of the first and second FA-type 4-LUTs (11A₁, 11A₂), the sum-type 2-LUT (111) and carry type 2-LUT (112) are programmed to implement FA (Full adder) sum and carry functions, respectively. The 2-LUT (113) of the first FA-type 4-LUT (11A₁) is programmed to implement an exclusive OR (xor) function of the operand input bit pair (A0, B0) to have an output P0 (= (A0 xor B0)) connected to a first input terminal of the AND gate 141. The 2-LUT (113) of the second FA-type 4-LUT (11A₂) is programmed to implement an exclusive OR (xor) function of the operand input bit pair (A1, B1) to have an output P1 (= A1 xor B1) connected to a second input terminal of the AND gate 141.

When both IN2 and IN3 are set to "0", the MUXs 115 in the first and second FA-type 4-LUTs (11A₁, 11A₂) select sum results S0 and S1, as outputs, respectively. If the propagation control signals P0 and P1 are both "1", the AND gate 141 outputs "1", as the select signal SEL and the MUX 142 selects Cin, as Cout to a next cell (not shown). The exclusive OR (xor) function can be programed in the 2-LUT (113) of the first and second FA-type 4-LUTs (11A₁, 11A₂), by setting the memories M9 and M12 to a Gnd state and setting the memories M10 and M11 to a Vdd state in Fig.3 and an output signal (OUT_IM3) of MUX 2 in Fig.3 may be taken out as the propagation control signals P0 and P1.

The sum-type 2-LUT (111) may include, with reference to Fig.3, a 4:1 multiplexer (MUX0) that selects, as SUM, one of four input terminals (V1-V4), based on the operand input bit pair (two 1-bit inputs) (A, B). Four first wires arranged in the first direction (vertical direction) are coupled respectively to a power line: Vdd, a ground line: Gnd, a carry in line: Cin and an inverse carry in line: ~Cin one-to-one. The four second wires arranged in the second direction (horizontal direction) are coupled to the four input terminals (V1-V4) of the 4:1 multiplexer (MUX) one to one. The first wires are connected through NVSCs to corresponding ones of the second wires. The NVSCs may include a transistor and at least one non-volatile resistive switch that has two variable states of "ON" and "OFF".

The carry-type 2-LUT (112) may include, with reference to Fig.3, a 4:1 multiplexer (MUX1) that selects, as Cout, one of four input terminals (V5-V8), based on the two 1-bit inputs (A, B). Three first wires arranged in the first direction (vertical direction) are coupled to the power line: Vdd, the ground line: Gnd and a carry in line: Cin one-to-one. Four second wires arranged in the second direction (horizontal direction) are coupled to the four input terminals (V5-V8) of the 4:1 multiplexer (MUX1) one to one. The first wires are connected through NVSCs to corresponding ones of the second wires.

The 2-LUTs (113) may include, with reference to Fig.3, a 4:1 multiplexer (MUX2) that selects, one of four input terminals (V5-V8), based on the two 1-bit inputs (A, B). The two first wires arranged in the first direction (vertical direction) are coupled to a power line: Vdd and a ground line: Gnd one-to-one. Four second wires arranged in the second direction (horizontal direction) are coupled to the four inputs (V9-V12) of the 4:1 multiplexer (MUX2) one to one. The first wires are connected through NVSCs to corresponding ones of the second wires.

Fig.10 illustrates a logic block 101A with first to fourth FA-type 4-LUTs (11A₁~11A₄) according to the present embodiment. Each FA-type 4-LUT (11A₁~11A₄) may have a configuration as described with reference to Fig. 9B, for example. A 4-bit adder can be implemented in the logic block 101A. The first to fourth FA-type 4-LUTs (11A₁~11A₄) generate first to fourth propagation control signals P0, P1, P2 and P3, respectively. They are applied to a 4-input AND gate 141 to generate a select signal SEL which controls a 2:1 MUX 142. When the select signal SEL is "1", i.e., the first to fourth propagation control signals P0, P1, P2 and P3 are all "1", the MUX 142 selects the carry in signal Cin, else select a carry out signal C3 from the FA-type 4-LUT3 (11A₄), as a carry out signal (final carry out signal) to a next cell (now shown) Cout.

Fig.11 illustrates an N-bit adder implemented by multiple cells, each of which is provided with the four FA-type 4 LUTs described with reference to Fig.10. N is set to 16 as an example. The dotted line indicates the critical path.

The critical path delay is:
7*D_CP + 3* D_SK+ D_SUM
which is much smaller than the critical path delay:
15*D_CP+ D_SUM of the 16-bit adder implementation using logic blocks without carry skip circuits.

Fig.12 illustrates delay time comparison between N-bit adders implemented by cells with and without carry skip circuit according to the present example embodiment. The delay time is simulated by HSPICE based on 28nm (nanometer) CMOS/NB hybrid process. When the N is 32, speed is improved by 2.28 times by using the skip circuit 14.

(Third example embodiment)
Next, a third example embodiment will be described. The present embodiment discloses another gate-level carry skip circuit. Fig.13A illustrates a logic block 101B with a carry skip circuit 14 using a NOR gate 143 and a MUX 142 according to the third example embodiment.

The propagation control signals P0 and P1 from FA-type 4-LUT (11B₁, 11A₂) are applied to a NOR gate 143 to generate output a select signal SEL which controls a MUX 142 to select Cin or C1 as the carry out Cout.

Fig. 13B schematically illustrates the circuit configuration of the FA-type 4-LUT (11B₁). The FA-type 4-LUT (11B₂) has the same configuration as the FA-type 4-LUT (11B₁), except with two 1-bit inputs (A1, B1) supplied as an operand input bit pair, and C₀ (carry out (Cout) from the FA-type 4-LUT (11B₁)) supplied as carry in (Cin) and with S1, C1, and P1 output as a sum, carry out signal and propagation control signal.

One of the 2-LUTs (113, 114) in the first FA-type 4-LUT (11B₁) generates a propagation control signal P0. In the same manner, one of the 2-LUTs (113, 114) in the second FA-type 4-LUT (11B₂) generates a propagation control signal P1. In Fig.13B, the 2-LUT (113) is programmed to generate the propagation control signal P0. One of the 2-LUTs (113,114) in the first and second FA-type 4-LUT (11B₁, 11B₂) may be programmed to generate the propagation control signals P0 and P1, based on (A0 xnor B0) and (A1 xnor B1) for example, respectively, though not limited thereto.

Fig.13C illustrates a transistor-level CMOS NOR gate 143 constructed by four transistors (two p-channel MOS transistors PM1 and PM2 and two n-channel MOS transistors NM1 and NM2). Only when both IN0 and IN1 are both "0", OUT is "1", otherwise OUT is "0". That is, when both IN0 and IN1 are "0"(at low level), PM1 and PM2 with gates supplied with a low level are turned on (while NM1 and NM2 with gates supplied with a high level are turned off), to set OUT to a high level.

Fig.14A illustrates implementation of a 2-bit adder in a logic block 101B with a carry skip circuit according to the present embodiment. The operand input bit pairs (A0, B0) and (A1, B1) are applied to the two FA-type 4-LUTs (11B₁, 11B₂), in parallel. Each of the FA-type 4-LUTs (11B₁, 11B₂), is programmed to implement 1-bit adder. A carry in (Cin) signal from a last cell (not shown) is applied to the first FA-type 4-LUT (11B₁) to implement the first 1-bit adder and its carry out signal C0 is applied to the second FA-type 4-LUT (11B₁) to implement the second 1-bit adder.

As illustrated in Fig.14B, in each of the FA-type 4-LUTs (11B₁, 11B₂), the sum-type and carry type 2-LUTs (111,112) are programmed to implement FA sum and carry functions, respectively. The 2-LUTs (113) with the propagation control signals P0 and P1 in the FA-type 4-LUTs (11B₁, 11B₂) are programmed to implement exclusive NOR (XNOR) function of the operand input bit pairs (A0, B0) and (A1, B1), respectively. In the FA-type 4-LUTs (11B₁, 11B₂), both the IN2 and IN3 are set to "0" to select sum result S0 and S1 as the outputs of the first and second FA-type 4-LUTs (11B₁ and 11B₂). In the FA-type 4-LUTs (11B₁, 11B₂), when the propagation control signals P0 and P1 are both "0", the NOR gate 143 outputs "1" as the selection signal SEL to cause the MUX 142 to select Cin as a carry out to a next cell (final carry out) Cout.

The exclusive NOR (XNOR) function can be programed in the 2-LUT (113) of the first and second FA-type 4-LUTs (11B₁, 11B₂), by setting the memories M9 and M12 to a Vdd state and setting the memories M10 and M11 to a Gnd state in Fig.3 and an output signal (OUT_IM3) of MUX 2 in Fig.3 is taken out as P0 and P1.

The 2-LUTs (113) may include, with reference to Fig.3, a 4:1 multiplexer (MUX2) that selects, one of four input terminals (V5-V8), based on the operand input bit pair (A,B). The two second wires are coupled to a power line: Vdd and a ground line: Gnd one-to-one. Four first wires coupled to the four inputs (V9-V12) of the 4:1 multiplexer (MUX2) one to one. The first wires are connected through NVSCs to corresponding ones of the second wires.

The reconfigurable circuits of the above example embodiments may be used in, for example, mobile phone, IoT (Internet of Things) devices, and so on.

The disclosure of the aforementioned Patent Literatures1-4 and Non-Patent Literatures 1-2 is incorporated by reference herein. The particular example embodiments or examples may be modified or adjusted within the scope of the entire disclosure of the present invention, inclusive of claims, based on the fundamental technical concept of the invention. In addition, a variety of combinations or selections of elements disclosed herein may be used within the concept of the claims. That is, the present invention may encompass a wide variety of modifications or corrections that may occur to those skilled in the art in accordance with the entire disclosure of the present invention, inclusive of claims and the technical concept of the present invention.

1 Cell array
10 Cell
11₁, 11₂, 11A₁ - 11A₄, 11B₁ - 11B₄ FA-type 4-LUT
12₁,12₂ D-FF
13₁,13₂ MUX
14 Skip circuit
101,101A, 101B Logic block
102 Routing block
103 NVSC
111 Sum-type 2-LUT
112 Carry-type 2-LUT
113 2-LUT
114 2-LUT
115 MUX
141 AND gate
142 MUX
143 NOR gate

Claims

A semiconductor apparatus comprising
a reconfigurable circuit comprising:
a carry skip circuit;
first to M-th logic elements, where M is an integer of 2 or more, each of the logic elements comprising:
a first programmable circuit that is able to be programmed to implement a 2-input logic function or a full-adder sum function to output a sum of two 1-bit inputs and a carry in;
a second programmable circuit that is able to be programmed to implement a 2-input logic function or a full-adder carry function to output a carry out based on the two 1-bit inputs and the carry in; and
a third programmable circuit that is able to be programmed to implement 2-input logic function,
wherein the carry out output from i-th logic element is supplied as the carry in to (i+1)-th logic element, where i is an integer from 1 to M-1,
each of the first to M-th logic elements, based on the two 1-bit inputs supplied thereto, generates a propagation control signal, and
wherein the carry skip circuit receives first to M-th propagation control signals output in parallel from the first to M-th logic elements, respectively, to select, as a final carry out signal, one of a carry in signal supplied to the first logic element and a carry out signal output from the M-th logic element, based on a combination of values of the first to M-th propagation control signals.
The semiconductor apparatus according to claim 1, wherein the carry skip circuit includes:
a logic gate that receives the first to M-th propagation control signals to output a selection signal of a first logic value, when the first to M-th propagation control signals take a predetermined combination of logic values and otherwise output the selection signal of a second logic value; and
a 2:1 multiplexor that receives, as inputs, the carry in signal supplied to the first logic element and the carry out signal output from the M-th logic element and selects the carry in signal supplied to the first logic element if the selection signal is of the first logic value, while selecting the carry out signal output from the M-th logic element if the selection signal is of the second logic value, to output the selected signal.
The semiconductor apparatus according to claim 2, wherein the first programmable circuit further comprising:
a 4:1 multiplexer that selects, as a sum, one of four signals applied one to one to four input terminals thereof, based on the two 1-bit inputs;
four first wires coupled respectively to a power line, a ground line, a carry in line and an inverse carry in line one-to-one;
four second wires coupled to the four input terminals of the 4:1 multiplexer one to one;
a plurality of switch cells through which the first wires are connected to corresponding ones of the second wires, wherein each of the switch cells include a transistor and at least one non-volatile resistive switch that has two variable states of "ON" and "OFF".
The semiconductor apparatus according to claim 2 or 3, wherein the second programmable circuit further comprising:
a 4:1 multiplexer that selects, as a carry out, one of four signals applied one to one to four input terminals thereof, based on the two 1-bit inputs;
three first wires coupled to a power line, a ground line and a carry in line one-to-one;
four wires coupled to the four input terminals of the 4:1 multiplexer one to one; and
a plurality of switch cells through which the first wires are connected to corresponding ones of the second wires, wherein each of the switch cells include a transistor and at least one non-volatile resistive switch that has two variable states of "ON" and "OFF".
The semiconductor apparatus according to any one of claims 2 to 4, wherein the third programmable circuit further comprising:
a 4:1 multiplexer that selects one of four signals applied one to one to four input terminals thereof, based on the two 1-bit inputs;
two first wires coupled to a power line and a ground line one-to-one;
four second wires coupled to the four inputs of the 4:1 multiplexer one to one; and
a plurality of switch cells through which the first wires are connected to corresponding ones of the second wires, wherein each of the switch cells includes a transistor and at least one non-volatile resistive switch that has two variable states of "ON" and "OFF".
The semiconductor apparatus according to any one of claims 2 to 5, wherein in the carry skip circuit, the logic gate is an AND gate that outputs a result of taking a logical AND of the first to M-th propagation control signals, as the selection signal, and wherein
the first programmable circuit is programmed to implement 1-bit full-adder sum function to output a sum of two 1-bit inputs and the carry in,
the second programmable circuit is programmed to implement 1-bit full-adder carry function to output a carry out generated based on the two 1-bit inputs and the carry in, and
the third programmable circuit is programmed to implement 2-input exclusive OR function of the two 1-bit inputs to output a result of the exclusive OR, as the propagation control signal.
The semiconductor apparatus according to any one of claims 2 to 5, wherein in the carry skip circuit, the logic gate is a NOR gate that outputs a result of taking a logical NOR of the first to M-th propagation control signals, as the selection signal, and wherein
the first programmable circuit is programmed to implement 1-bit full-adder sum function to output a sum of two 1-bit inputs and the carry in,
the second programmable circuit is programmed to implement 1-bit full-adder carry function to output a carry out generated based on the two 1-bit inputs and the carry in, and
the third programmable circuit is programmed to implement 2-input exclusive NOR function of the two 1-bit inputs to output a result of the exclusive NOR, as the propagation control signal.
The semiconductor apparatus according to any one of claims 3 to 5, wherein the non-volatile resistive switch comprises a metal oxide resistance change element or a solid electrolyte resistance change element.
A method to utilize the reconfigurable circuit included in the semiconductor apparatus according to claim 6, the method comprising:
configuring the first programmable circuit in each logic element to implement a full-adder sum function;
configuring the second programmable circuit in each logic element to implement a full-adder carry function; and
configuring the third programmable circuit in each logic element to implement a 2-input exclusive OR function.
A method to utilize the reconfigurable circuit included in the semiconductor apparatus according to claim 7, the method comprising:
configuring the first programmable circuit in each logic element to implement a full-adder sum function;
configuring the second programmable circuit in each logic element to implement a full-adder carry function; and
configuring the third programmable circuit in each logic element to implement a 2-input exclusive NOR function.