TWI793278B

TWI793278B - Computing cell for performing xnor operation, neural network and method for performing digital xnor operation

Info

Publication number: TWI793278B
Application number: TW108107498A
Authority: TW
Inventors: 玻那喬斯哦拉都比; 提塔許瑞許特; 喬治亞德里安凱特爾; 萊恩麥可海雀
Original assignee: 南韓商三星電子股份有限公司
Priority date: 2018-03-08
Filing date: 2019-03-06
Publication date: 2023-02-21
Also published as: CN110245749A; CN110245749B; TW202036389A

Abstract

A computing cell and method for performing a digital XNOR of an input signal and weights are described. The computing cell includes at least one pair of FE-FETs and a plurality of selection transistors. The pair(s) of FE-FETs are coupled with a plurality of input lines and store the weight. Each pair of FE-FETs includes a first FE-FET that receives the input signal and stores a first weight and a second FE-FET that receives the input signal complement and stores a second weight. The selection transistors are coupled with the pair of FE-FETs.

Description

Computational cell for performing exclusive OR operation, neural network and method for performing digital exclusive OR operation

本發明大體而言是有關於神經網路，且更具體而言，是有關於一種可在神經形態計算中使用的基於鐵電場效電晶體的反互斥或胞元。The present invention relates generally to neural networks, and more specifically to a ferroelectric field effect transistor based antimutex or cell that can be used in neuromorphic computing.

涉及深度學習神經網路（Neural Network，NN）或神經形態計算（neuromorphic computing）的應用（例如影像辨識、自然語言處理及更一般而言各種型樣匹配或分類任務）正迅速變得像通用計算一樣重要。NN的基本計算元素或神經元（neuron）將一組輸入訊號與一組權重相乘並對各乘積進行求和。因此，神經元執行向量-矩陣乘積（vector-matrix product）或乘法-累加（multiply-accumulate，MAC）運算。NN通常包括大量經互連的神經元，所述神經元中的每一者執行MAC運算。因此，NN的運算是計算密集型的。Applications involving deep learning Neural Networks (NN) or neuromorphic computing (such as image recognition, natural language processing, and more generally various pattern matching or classification tasks) are rapidly becoming like general-purpose computing As important. The basic computational element, or neuron, of a NN multiplies a set of input signals by a set of weights and sums the products. Thus, neurons perform vector-matrix product or multiply-accumulate (MAC) operations. A NN typically includes a large number of interconnected neurons, each of which performs a MAC operation. Therefore, the operation of NN is computationally intensive.

可藉由改良MAC運算的效率來改良NN的效能。將期望在本地端儲存權重，以降低動態隨機存取記憶體（dynamic random access memory，DRAM）存取的功率及頻率。亦可期望以數位方式執行MAC運算，以幫助降低雜訊及過程變異性（process variability）。二值化神經元可能夠滿足該些目標。因此，已開發出一種二值化加權反互斥或網路（XNORNet）。The performance of the NN can be improved by improving the efficiency of the MAC operation. It would be desirable to store weights locally to reduce dynamic random access memory (DRAM) access power and frequency. It may also be desirable to perform the MAC operation digitally to help reduce noise and process variability. Binarized neurons may be able to meet these goals. Therefore, a binarized weighted antimutual exclusion or network (XNORNet) has been developed.

在二值化反互斥或（XNOR）胞元中，權重w在數學上是1及-1，但以數位方式被表示為1及0。同樣地，訊號x在數學上是1及-1，但由1及0以數位方式表示。乘法運算p_i =w_i x_i 的結果只有當x及w均為1且在數學上均為-1（在布林表示法（Boolean representation）中均為0）時才是正的。此正是互斥或運算的邏輯非（反互斥或）。因此，單獨的權重與訊號的乘積可被表達為p_i =XNOR (w_i ,x_i )。給定神經元的完整MAC運算被表達為

，或以布林法被表達為sum =2Count(XNOR(w,x))-n 。計數運算對反互斥或表達式的非零結果的數目進行計數，其中n是神經元的輸入的總數。然後，對照偏差（bias）對結果進行定限（threshold），進而得到神經元的高態輸出或低態輸出。整個過程是數位的。因此，不會招致與類比處理相關聯的資訊丟失。In a binarized anti-exclusive OR (XNOR) cell, weights w are mathematically 1 and -1, but are digitally represented as 1 and 0. Likewise, the signal x is mathematically 1 and -1, but is represented digitally by 1 and 0. The result of the multiplication operation p _i = w _i x _i is only positive if both x and w are 1 and both are mathematically -1 (0 in Boolean representation). This is exactly the logical NOT of the exclusive OR operation (inverse exclusive OR). Therefore, the product of individual weights and signals can be expressed as p _i = XNOR ( w _i , x _i ). The complete MAC operation for a given neuron is expressed as

, or expressed as sum = 2Count(XNOR(w,x))-n in Boolean's method. The count operation counts the number of non-zero outcomes of the inverse mutex or expression, where n is the total number of inputs to the neuron. Then, the result is thresholded against the bias (bias), and then the high-state output or low-state output of the neuron is obtained. The whole process is digital. Therefore, no loss of information associated with analog processing is incurred.

然而，對權重二值化表示法的使用可成為資訊丟失的來源。二值化網路通常使用較類比（或多位元數位）網路實質上更多的神經元來獲得相同水準的總體準確度。若權重為三值化而非二值化的，則可達成顯著改良。三值化權重採取數學值-1、0及1。對於輸入的任一組合，權重0均產生輸出-1（邏輯0）。因此，三值化反互斥或閘（亦被稱為「閘式反互斥或（Gated XNOR）」）的輸出由以下給出：

However, the use of weighted binarized representations can be a source of information loss. Binarized networks typically use substantially more neurons than analog (or multi-bit digital) networks to achieve the same level of overall accuracy. Significant improvements can be achieved if the weights are ternarized rather than binarized. The ternarization weights take mathematical values -1, 0, and 1. A weight of 0 produces an output of -1 (logical 0) for any combination of inputs. Thus, the output of a ternary demutual exclusive OR gate (also known as "Gated XNOR") is given by:

當以上述方程式來執行反互斥或運算時，非零權重及所有訊號自{-1, 1}域被映射至{0, 1}布林域。所述映射是在基於權重的數學值進行分支（branching）之後執行。When performing an exclusive OR operation with the above equation, non-zero weights and all signals from the {-1, 1} domain are mapped to the {0, 1} Bollinger domain. The mapping is performed after branching based on the mathematical value of the weights.

當使用相同數目的神經元時，三值化網路可相對於二值化網路提供改良的準確度。作為另外一種選擇，三值化網路可達成與二值化網路相同水準的準確度，但是以數目更小的神經元來達成。此得到面積、功率及推論通量（throughput）及時延的節省。因此，二值化數位反互斥或網路及三值化數位反互斥或網路均可用於例如NN等應用中。需要一種改良的反互斥或邏輯胞元，以增強數位二值化NN運算及/或數位三值化NN運算或者其他邏輯運算。Ternarized networks may provide improved accuracy over binarized networks when using the same number of neurons. Alternatively, a ternarized network can achieve the same level of accuracy as a binarized network, but with a smaller number of neurons. This results in savings in area, power and inferential throughput and latency. Therefore, both binarized digital demutual exclusion or network and ternary digital demutual exclusion or network can be used in applications such as NN. There is a need for an improved anti-mutual exclusion or logic cell to enhance digital binarization NN operations and/or digital ternary NN operations or other logic operations.

根據一些實施例，一種用於對輸入訊號與權重執行反互斥或運算的計算胞元包括：至少一對鐵電場效電晶體（ferroelectric field-effect transistor，FE-FET），與多個輸入線耦合並儲存所述權重，所述至少一對FE-FET中的每一對FE-FET包括第一FE-FET及第二FE-FET，所述第一FE-FET接收所述輸入訊號並儲存第一權重，所述第二FE-FET接收所述輸入訊號的補數並儲存第二權重；以及多個選擇電晶體，與所述一對FE-FET耦合。According to some embodiments, a computation cell for performing an exclusive OR operation on input signals and weights includes at least one pair of ferroelectric field-effect transistors (FE-FETs), and a plurality of input lines coupling and storing the weight, each pair of FE-FETs in the at least one pair of FE-FETs includes a first FE-FET and a second FE-FET, and the first FE-FET receives the input signal and stores a first weight, the second FE-FET receives the complement of the input signal and stores a second weight; and a plurality of selection transistors coupled to the pair of FE-FETs.

各示例性實施例是有關於執行反互斥或運算且可用於多種領域中的數位計算胞元，所述領域包括但不限於機器學習、人工智慧、神經形態計算及神經網路。所述方法及系統可延伸至其中使用邏輯裝置的其他應用。呈現以下說明是為了使此項技術中具有通常知識者能夠製作及使用本發明，且以下說明是在專利申請案及其要求的上下文中提供。將易於明瞭對本文所述的示例性實施例以及一般原理及特徵的各種潤飾。各示例性實施例主要是依照在特定實施方案中所提供的特定方法及系統來加以闡述。然而，在其他實施方案中，所述方法及系統將有效地運行。Exemplary embodiments are related to digital computing cells that perform exclusive OR operations and are useful in a variety of fields including, but not limited to, machine learning, artificial intelligence, neuromorphic computing, and neural networks. The methods and systems can be extended to other applications in which logic devices are used. The following description is presented to enable one of ordinary skill in the art to make and use the invention, and is provided in the context of a patent application and its claims. Various modifications to the exemplary embodiments and general principles and features described herein will be readily apparent. The exemplary embodiments are described primarily in terms of specific methods and systems provided in specific implementations. However, in other embodiments, the methods and systems will operate efficiently.

例如「示例性實施例」、「一個實施例」及「另一實施例」等片語可指代同一實施例或不同的實施例以及多個實施例。將參照具有某些組件的系統及/或裝置來闡述各實施例。然而，所述系統及/或裝置可包括比所示者更多或更少的組件，且在不背離本發明的範圍的條件下，可對組件的配置及類型作出變化。亦將在具有某些步驟的特定方法的上下文中闡述各示例性實施例。然而，對於具有不同及/或附加步驟以及呈與示例性實施例並不相悖的不同次序的步驟的其他方法，所述方法及系統有效地運行。因此，本發明並非旨在僅限於所示的實施例，而是應被賦予與本文所述的原理及特徵相一致的最寬廣範圍。Phrases such as "exemplary embodiment," "one embodiment," and "another embodiment" can refer to the same embodiment or different embodiments as well as multiple embodiments. Embodiments will be described with reference to systems and/or devices having certain components. However, the systems and/or devices may include more or fewer components than shown, and changes may be made in the configuration and type of components without departing from the scope of the invention. Various exemplary embodiments will also be described in the context of particular methods having certain steps. However, the methods and systems operate effectively with other methods having different and/or additional steps and steps in a different order not inconsistent with the exemplary embodiments. Thus, the present invention is not intended to be limited to the embodiments shown but is to be accorded the widest scope consistent with the principles and features described herein.

除非本文中另有指示或明顯地與上下文相矛盾，否則在對本發明進行闡述的上下文中（尤其在以下申請專利範圍的上下文中）所使用的用語「一個（a及an）」以及「所述（the）」以及相似指代語應被解釋為涵蓋單數形式及複數形式。除非另有說明，否則用語「包括（comprising、including）」、「具有（having）」及「包含（containing）」應被解釋為開放式用語（即，意指「包括但不限於）。Unless otherwise indicated herein or clearly contradicted by context, the terms "a and an" and "the (the)" and similar pronouns should be construed to cover both singular and plural forms. Unless otherwise stated, the terms "comprising, including", "having" and "containing" should be construed as open-ended terms (ie, meaning "including, but not limited to).

除非另有定義，否則本文中所使用的所有技術用語及科學用語均具有與本發明所屬的技術中具有通常知識者通常所理解的含義相同的含義。應注意，除非另有規定，否則對本文所提供的任何及所有實例或示例性用語的使用僅旨在更好地闡明本發明，而非對本發明的範圍進行限制。此外，除非另有定義，否則在通用的詞典中所定義的所有用語不應被過度解讀。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It should be noted that the use of any and all examples, or exemplary terms, provided herein is intended merely to better clarify the invention and is not intended to limit the scope of the invention unless otherwise specified. Furthermore, all terms defined in common dictionaries should not be unduly interpreted unless otherwise defined.

本發明闡述一種用於對輸入訊號與權重執行數位反互斥或的計算胞元及方法。所述計算胞元包括至少一對FE-FET及多個選擇電晶體。所述一（多）對FE-FET與多個輸入線耦合並儲存所述權重。所述至少一對FE-FET中的每一對FE-FET包括：第一FE-FET，接收所述輸入訊號並儲存第一權重；以及第二FE-FET，接收所述輸入訊號的補數並儲存第二權重。所述選擇電晶體與所述一對FE-FET耦合。The present invention describes a computational cell and method for performing a digital inverse exclusive OR of input signals and weights. The calculation cell includes at least one pair of FE-FETs and a plurality of selection transistors. The one (multiple) pairs of FE-FETs are coupled to a plurality of input lines and store the weights. Each pair of FE-FETs in the at least one pair of FE-FETs includes: a first FE-FET receiving the input signal and storing a first weight; and a second FE-FET receiving a complement of the input signal And store the second weight. The select transistor is coupled to the pair of FE-FETs.

圖1是繪示數位反互斥或計算胞元100的示例性實施例的方塊圖。為簡單起見，僅示出反互斥或胞元100的一部分。計算胞元100以數位方式對輸入訊號與權重執行反互斥或運算。因此，計算胞元100可被視為神經形態計算胞元。此外，計算胞元100可執行二值化反互斥或運算或者三值化反互斥或運算。FIG. 1 is a block diagram illustrating an exemplary embodiment of a digital anti-mutex or computation cell 100 . For simplicity, only a portion of the anti-mutex or cell 100 is shown. The calculation cell 100 digitally performs an exclusive OR operation on input signals and weights. Therefore, the computational cell 100 may be considered a neuromorphic computational cell. In addition, the calculation cell 100 can perform a binary demutual exclusive OR operation or a ternary demutual exclusive OR operation.

計算胞元100包括至少兩個鐵電場效電晶體（FE-FET）110及120、選擇電晶體130以及可選的重設電晶體140。圖中亦示出輸入線102及104以及輸出線106及選擇線108。輸入線102及104分別接收輸入訊號及其補數以用於推論運算。輸出線106提供反互斥或運算的結果。例如，若計算胞元100是神經網路（NN）的一部分，則選擇線108可用於選擇計算胞元100來進行運算。The computing cell 100 includes at least two ferroelectric field effect transistors (FE-FETs) 110 and 120 , a select transistor 130 and an optional reset transistor 140 . Also shown are input lines 102 and 104 as well as output lines 106 and select lines 108 . Input lines 102 and 104 respectively receive the input signal and its complement for inference operations. Output line 106 provides the result of the exclusive OR operation. For example, if the computation cell 100 is part of a neural network (NN), the selection line 108 may be used to select the computation cell 100 to perform operations on.

計算胞元100包括至少兩個FE-FET 110及120。在其他實施例中，可使用多於兩個FE-FET，但是以胞元密度為代價。在其他實施例中，每一計算胞元僅包括兩個FE-FET 110及120。FE-FET 110及120中的每一者包括鐵電層（圖1中未明確示出）及電晶體（例如，FET），所述鐵電層通常駐存於兩個金屬層之間，進而形成鐵電電容器。在替代實施例中，鐵電層可替換閘極氧化物。鐵電層藉由鐵電層的極化狀態來儲存權重。舉例而言，鐵電層可包含鋯鈦酸鉛（PbZrTi）、氧化鋯鉿（HfZrO）、鈦酸鋇（BaTiO₃ ）、鈦酸鉍（B₁₂ TiO₂₀ ）、碲化鍺（GeTe）及Ba_x Eu_1-x TiO₃ 中的至少一者，其中x大於0且不大於1。在其他實施例中，可使用另一種及/或附加鐵電材料。The computing cell 100 includes at least two FE-FETs 110 and 120 . In other embodiments, more than two FE-FETs may be used, but at the expense of cell density. In other embodiments, each computing cell includes only two FE-FETs 110 and 120 . Each of FE-FETs 110 and 120 includes a ferroelectric layer (not explicitly shown in FIG. 1 ), which typically resides between two metal layers, and a transistor (eg, a FET), thereby form a ferroelectric capacitor. In alternative embodiments, the ferroelectric layer may replace the gate oxide. The ferroelectric layer stores the weight by the polarization state of the ferroelectric layer. For example, the ferroelectric layer may include lead zirconate titanate (PbZrTi), hafnium zirconium oxide (HfZrO), barium titanate (BaTiO ₃ ), bismuth titanate (B ₁₂ TiO ₂₀ ), germanium telluride (GeTe) and Ba At least one of _x Eu _1-x TiO ₃ , where x is greater than 0 and not greater than 1. In other embodiments, another and/or additional ferroelectric material may be used.

在運算中，可使用重設-評估邏輯（reset-evaluate logic）。因此，可在推論運算（即，使用先前所程式化權重的反互斥或運算）開始時重設儲存節點150。為執行推論運算，分別經由輸入線102及104將輸入x及其補數x_bar提供至FE-FET 110及120。FE-FET 110及120內的鐵電層的極化基於被程式化至FE-FET 110及120中的權重而變化。在推論運算期間，對動態儲存節點150執行選擇性上拉。然後，可經由輸出線106輸出動態儲存節點電壓，以評估或提供反互斥或運算的結果。因此，FE-FET 110及120被連接以使得輸出線106提供輸入訊號x與由FE-FET 110及120儲存的權重w的反互斥或。選擇電晶體130選擇反互斥或胞元100來進行運算。可選的重設電晶體140可用於顯式地重設計算胞元100以例如進行三值化運算。在其他實施例中，可以另一種方式來執行重設操作。因此，可以二值化模式或三值化模式來使用計算胞元100。In operation, reset-evaluate logic may be used. Thus, storage nodes 150 may be reset at the beginning of an inference operation (ie, an exclusive OR operation using previously programmed weights). To perform inference operations, an input x and its complement x_bar are provided to FE-FETs 110 and 120 via input lines 102 and 104, respectively. The polarization of the ferroelectric layers within FE-FETs 110 and 120 varies based on the weights programmed into FE-FETs 110 and 120 . During an inference operation, a selective pull-up is performed on the dynamic storage node 150 . Then, the dynamic storage node voltage can be output via the output line 106 to evaluate or provide the result of the exclusive OR operation. Accordingly, FE-FETs 110 and 120 are connected such that output line 106 provides an inverse exclusive OR of input signal x and weight w stored by FE-FETs 110 and 120 . The selection transistor 130 selects the anti-mutually exclusive OR cell 100 for operation. The optional reset transistor 140 can be used to explicitly reset the computation cell 100 to perform a binarization operation, for example. In other embodiments, the reset operation may be performed in another manner. Therefore, the computation cell 100 can be used in a binarization mode or a binarization mode.

計算胞元100可高效地實施反互斥或運算，且可以相對緊湊的方式來實作。由於運算是數位的，因此可減少或消除關於類比反互斥或運算的問題。舉例而言，數位權重的使用為FE-FET 110及120得到程式化穩健性。數位運算亦可為輸出106引起較少雜訊。可避免使用類比至數位轉換器（analog to digital converter，ADC），且亦可節省功率及面積。權重亦在本地端儲存於用作非揮發性記憶體的FE-FET 110及120中。可使得推論運算更高效且更快速。如下所述，計算胞元100可提供二值化或三值化反互斥或。因此，計算胞元100可以數位方式高效且可靠地執行反互斥或運算。Computational cells 100 can efficiently implement exclusive OR operations and can be implemented in a relatively compact manner. Since the operation is digital, problems with analogous exclusive OR operations are reduced or eliminated. For example, the use of digital weights provides stylized robustness for FE-FETs 110 and 120 . Digital operations may also cause less noise for the output 106 . The use of an analog to digital converter (ADC) can be avoided, and power and area can also be saved. The weights are also stored locally in FE-FETs 110 and 120 used as non-volatile memory. Inference operations can be made more efficient and faster. As described below, the computation cell 100 may provide a binary or ternary demutual exclusive OR. Therefore, the computing cell 100 can digitally efficiently and reliably perform the exclusive OR operation.

圖2是繪示數位神經網路的一部分180的示例性實施例的方塊圖。部分180可被視為神經元。神經元180執行乘法-累加（MAC）運算。神經元180說明計算胞元100的可能用途且並非旨在進行限制。FIG. 2 is a block diagram illustrating an exemplary embodiment of a portion 180 of a digital neural network. Section 180 can be considered a neuron. Neuron 180 performs a multiply-accumulate (MAC) operation. Neuron 180 illustrates a possible use of computational cell 100 and is not intended to be limiting.

神經元180包括多個計算胞元100-1、100-2、100-3及100-4（統稱為計算胞元100）、以及位元計數與正負號區塊（bit count and sign block）190。在此實施例中，期望將四個輸入x1/x1_bar、x2/x2_bar、x3/x3_bar及x4/x4_bar與四個權重組合。因此，使用四個計算胞元100來執行四個反互斥或運算。在替代實施例中，可使用另一數目的計算胞元100。圖2所示計算胞元100-1、100-2、100-3及100-4中的每一者以與圖1所示計算胞元100類似的方式運行。位元計數與正負號區塊190對來自所述四個反互斥或胞元100的非零結果的數目進行計數並減去4（即神經元180的輸入訊號的數目）。然後，對照偏差對結果進行定限，進而得到神經元180的高態輸出或低態輸出。Neuron 180 includes a plurality of computing cells 100-1, 100-2, 100-3, and 100-4 (collectively referred to as computing cells 100), and a bit count and sign block 190 . In this embodiment, it is desired to combine four inputs x1/x1_bar, x2/x2_bar, x3/x3_bar and x4/x4_bar with four weights. Therefore, four compute cells 100 are used to perform four inverse exclusive OR operations. In alternative embodiments, another number of computing cells 100 may be used. Each of computing cells 100-1, 100-2, 100-3, and 100-4 shown in FIG. 2 operates in a similar manner as computing cell 100 shown in FIG. The bit count and sign block 190 counts and subtracts 4 (ie, the number of input signals to neuron 180 ) the number of non-zero results from the four demutes or cells 100 . Then, the result is qualified against the deviation, and then the high state output or the low state output of the neuron 180 is obtained.

因此，使用計算胞元100的神經元180可執行MAC運算。由於神經元180使用以硬體實作的胞元，因此神經元180高效地運行。MAC運算可以數位方式來執行，此避免關於類比反互斥或運算的問題。如參照圖1所述，由計算胞元100執行的反互斥或運算亦可為緊湊、高效的且以二值化模式或三值化模式運行。因此，神經元180的效能可得以改良。Therefore, the neuron 180 using the computation cell 100 can perform a MAC operation. Because neuron 180 uses cells implemented in hardware, neuron 180 operates efficiently. MAC operations can be performed digitally, which avoids problems with analogous exclusive OR operations. As described with reference to FIG. 1 , the exclusive OR operation performed by the compute cell 100 can also be compact, efficient, and run in binarization mode or ternarization mode. Accordingly, the performance of neurons 180 may be improved.

圖3繪示用於執行數位反互斥或運算的計算胞元100A的示例性實施例的示意圖。計算胞元100A類似於反互斥或胞元100，且可用於神經元180或其他應用中。因此，計算胞元100A的與反互斥或胞元100中的組件類似的部分被相似地標示。因此，反互斥或胞元100A包括分別與輸入線102及104、FE-FET 110及120、選擇電晶體130、輸出線106及選擇線108類似的輸入線102及104、FE-FET 110A及120A、選擇電晶體132及134、輸出線106及選擇線108。圖中亦示出動態輸出節點150以及程式化線152及154。在所示的實施例中，選擇電晶體132及134為n-FET。FE-FET 110A及120A被示出為分別包括FET 112及122且分別包括具有鐵電層的鐵電電容器114及124。舉例而言，圖4繪示可在執行數位反互斥或運算的計算胞元中使用的FE-FET 110A/120A的一部分的示例性實施例。FE-FET 110A/120A包括FET 112/122及電容器114/124。介電層116/126可包含PbZrTi、HfZrO、BaTiO₃ 、B₁₂ TiO₂₀ 、GeTe及Ba_x Eu_1-x TiO₃ 中的一或多者，其中x大於0且不大於1。在一些實施例中，鐵電層116/126被併入至第一金屬（M1）層中。在其他實施例中，鐵電層116/126可被併入至其他層中。FIG. 3 is a schematic diagram of an exemplary embodiment of a computing cell 100A for performing a digital exclusive OR operation. Computation cell 100A is similar to anti-mutex or cell 100 and may be used in neuron 180 or other applications. Accordingly, portions of computation cell 100A that are similar to components in antimutex or cell 100 are similarly labeled. Thus, anti-mutual exclusion or cell 100A includes input lines 102 and 104, FE-FETs 110A and 120A, select transistors 132 and 134 , output line 106 and select line 108 . Also shown is a dynamic output node 150 and stylized lines 152 and 154 . In the illustrated embodiment, select transistors 132 and 134 are n-FETs. FE-FETs 110A and 120A are shown to include FETs 112 and 122, respectively, and to include ferroelectric capacitors 114 and 124, respectively, having ferroelectric layers. For example, FIG. 4 depicts an exemplary embodiment of a portion of a FE-FET 110A/120A that may be used in a computational cell that performs a digital exclusive OR operation. FE-FET 110A/120A includes FET 112/122 and capacitor 114/124. The dielectric layer 116/126 may include one or more of PbZrTi, HfZrO, _BaTiO3 , _B12TiO20 , _GeTe , and _BaxEu1 _-xTiO3 _, where x is greater than zero and not greater than one. In some embodiments, the ferroelectric layer 116/126 is incorporated into the first metal (M1) layer. In other embodiments, the ferroelectric layer 116/126 may be incorporated into other layers.

輸入線102及104分別載運輸入訊號x及其補數x_bar。輸入線102及104分別連接至FE-FET 110A及120A的源極。FE-FET 110A及120A的閘極分別經由選擇電晶體132及134分別連接至程式化線152及154。程式化線152及154分別提供程式化訊號P及其補數P_bar。各FE-FET的汲極耦合於一起且形成動態輸出節點150。選擇電晶體132及134的源極分別連接至FE-FET 110A及120A，選擇電晶體132及134的汲極分別連接至程式化線152及154，且選擇電晶體132及134的閘極連接至選擇線108。Input lines 102 and 104 carry the input signal x and its complement x_bar, respectively. Input lines 102 and 104 are connected to the sources of FE-FETs 110A and 120A, respectively. The gates of FE-FETs 110A and 120A are connected to programming lines 152 and 154 via select transistors 132 and 134, respectively. The programming lines 152 and 154 respectively provide the programming signal P and its complement P_bar. The drains of the FE-FETs are coupled together and form the dynamic output node 150 . The sources of select transistors 132 and 134 are connected to FE-FETs 110A and 120A, respectively, the drains of select transistors 132 and 134 are connected to programming lines 152 and 154, respectively, and the gates of select transistors 132 and 134 are connected to Select line 108.

如上所述，儲存於FE-FET 110A及120A中的權重由鐵電層114及124的極化決定。該些權重可在晶片外被訓練。舉例而言，若計算胞元100A的預期應用僅為推論（晶片外訓練），則抹除操作及程式化操作是不頻繁地執行。因此，FE-FET 110A及120A可僅在期望改變權重時被程式化。在一些實施例中，例如，為考量到晶片外訓練的改良，此程式化可每年僅發生幾次。在替代實施例中，FE-FET 110A及120A可更頻繁或更不頻繁地被程式化。As mentioned above, the weights stored in FE-FETs 110A and 120A are determined by the polarization of ferroelectric layers 114 and 124 . These weights can be trained off-chip. For example, if the intended application of the compute cell 100A is inference only (off-chip training), then the erase and program operations are performed infrequently. Thus, FE-FETs 110A and 120A may only be programmed when it is desired to change weights. In some embodiments, for example, to allow for improvements in off-wafer training, this stylization may only occur a few times per year. In alternative embodiments, FE-FETs 110A and 120A may be programmed more or less frequently.

程式化至FE-FET 110A及120A中的權重取決於是期望計算胞元100A以二值化模式還是以三值化模式使用。兩個FE-FET 110A及120A中所儲存的狀態對於非零權重可互為補數或者對於零權重可為相等的（例如，為其兩者設定高Vt狀態）。對於三值化運算或零權重，可發生零權重的使用。The weights programmed into FE-FETs 110A and 120A depend on whether computation cell 100A is desired to be used in binarized mode or in binarized mode. The states stored in the two FE-FETs 110A and 120A may be complements for non-zero weights or may be equal for zero weights (eg, setting a high Vt state for both). The use of zero weights may occur for ternary operations or zero weights.

為程式化權重，首先將計算胞元100A抹除且然後進行程式化。若以陣列的形式存在，則可首先將整個陣列中的所有計算胞元100全域地抹除，且然後可程式化單獨的非零位元。為將胞元100A（及陣列中的所有胞元）抹除，將程式化線152及154上的訊號P及P_bar設定成低態（例如，設定成地），且將輸入線102及104上的輸入x及x_bar設定成高態。計算胞元100A的輸出線106被容許浮置。結果是：在胞元100A及陣列中，跨每一FE-FET 110A中的鐵電電容器114及每一FE-FET 120A中的鐵電電容器124存在負電壓。在抹除結束時，每一FE-FET 110A及120A在FE-FET 110A及120A中的基本FET 112及122的閘極的節點上具有小電壓、零電壓或略負電壓。此將所有FE-FET 110A及120A置於低導通性狀態。To program the weights, the calculation cell 100A is first erased and then programmed. If present in the form of an array, all computational cells 100 in the entire array can first be globally erased, and then individual non-zero bits can be programmed. To erase cell 100A (and all cells in the array), the signals P and P_bar on programming lines 152 and 154 are set low (eg, to ground), and the signals on input lines 102 and 104 are set to The inputs x and x_bar are set to a high state. The output line 106 of the compute cell 100A is allowed to float. The result is that in cell 100A and in the array, there is a negative voltage across ferroelectric capacitor 114 in each FE-FET 110A and ferroelectric capacitor 124 in each FE-FET 120A. At the end of erase, each FE-FET 110A and 120A has a small, zero, or slightly negative voltage at the node of the gates of the primary FETs 112 and 122 in FE-FETs 110A and 120A. This places all FE-FETs 110A and 120A in a low-conductivity state.

圖5是繪示對執行數位反互斥或運算的計算胞元的示例性實施例的程式化的時序圖200。在圖5所示的實施例中，程式化線152及154被施加脈衝而達到適度高的電壓，例如2.5伏（V）至3伏。參照圖3及圖5，實線202表示施加至程式化線152及154的電壓，點線204表示輸入線102及104上的輸入電壓x或x_bar。短劃線206是在抹除過程中FE-FET 110A及120A的FET 112及122的閘極處的電壓。在垂直線209左側的所施加電壓表示可將FE-FET 110A及120A抹除的所施加電壓。抹除在線209處完成。因此，剛到達線209的右側，FE-FET 110A及120A的閘極的節點處的電壓便為小的。因此，FE-FET 110A及120A均已被抹除至低的閘極電壓。FIG. 5 is a stylized timing diagram 200 illustrating an exemplary embodiment of a computational cell performing a digital exclusive OR operation. In the embodiment shown in FIG. 5, the programming lines 152 and 154 are pulsed to a moderately high voltage, such as 2.5 volts (V) to 3 volts. Referring to FIGS. 3 and 5 , the solid line 202 represents the voltage applied to the programmed lines 152 and 154 , and the dotted line 204 represents the input voltage x or x_bar on the input lines 102 and 104 . Dashed line 206 is the voltage at the gates of FETs 112 and 122 of FE-FETs 110A and 120A during the erase process. The applied voltages to the left of vertical line 209 represent applied voltages that can erase FE-FETs 110A and 120A. Wiping is done at line 209 . Thus, just to the right of line 209, the voltage at the node of the gates of FE-FETs 110A and 120A is small. Therefore, both FE-FETs 110A and 120A have been erased to a low gate voltage.

在抹除完成之後，可將FE-FET 110A及120A程式化。程式化事件設定表示由FE-FET 110A及120A儲存的數學權重的單獨的位元。若期望進行三值化運算，則僅程式化非零權重。程式化是藉由將訊號輸入線102及104接地（x、x_bar為低態）並將高電壓施加至程式化線152（P為高態）或程式化線152（P_bar為高態）來達成。為被程式化的每一計算胞元100A接通選擇線108。程式化線152或154上的高電壓分別使得跨每一FE-FET 110A中的鐵電電容器114或每一FE-FET 120A中的鐵電電容器124存在正電壓。因此，引起極化狀態的改變。每一FE-FET的閘極節點現在被程式化至高的正電壓，進而將基本FET設定成導通狀態。在圖5中，FE-FET 110A（其可儲存權重）的最終經程式化狀態的閘極電壓由短劃線207示出，而FE-FET 120A（其可儲存權重補數）的最終經程式化狀態的閘極電壓由短劃線208示出。因此，FE-FET 110A及120A中的一或兩者可被程式化。在圖5中，閘極電壓207與208不同。在一些實施例中，接通狀態FET與關斷狀態FET之間的最終電壓差是大致500毫伏（mV）。此可對應於一般的7奈米（nm）節點FET。可藉由改良鐵電電容器114及124的鐵電極化對非鐵電極化的比率而顯著增大所述電壓差。舉例而言，可使用具有較強鐵電極化的較厚鐵電電容器/較厚鐵電層116/126。因此，可使用抹除-程式化將權重程式化至FE-FET 110A及120A中。After the erase is complete, FE-FETs 110A and 120A can be programmed. The programmed events set individual bits representing the mathematical weights stored by FE-FETs 110A and 120A. If ternarization is desired, only non-zero weights are programmed. Programming is achieved by grounding signal input lines 102 and 104 (x, x_bar are low) and applying high voltage to programming line 152 (P is high) or programming line 152 (P_bar is high) . Select line 108 is turned on for each computational cell 100A being programmed. A high voltage on programming line 152 or 154 causes a positive voltage to exist across ferroelectric capacitor 114 in each FE-FET 110A or ferroelectric capacitor 124 in each FE-FET 120A, respectively. Therefore, a change in the polarization state is caused. The gate node of each FE-FET is now programmed to a high positive voltage, thereby setting the base FET into an on-state. In FIG. 5, the gate voltage of the final programmed state of FE-FET 110A (which can store weights) is shown by dashed line 207, while the final programmed state of FE-FET 120A (which can store weight complements) is shown by The gate voltage for the ON state is shown by dashed line 208 . Thus, one or both of FE-FETs 110A and 120A can be programmed. In FIG. 5 , the gate voltages 207 and 208 are different. In some embodiments, the resulting voltage difference between the on-state FET and the off-state FET is approximately 500 millivolts (mV). This may correspond to a typical 7 nanometer (nm) node FET. The voltage difference can be increased significantly by improving the ratio of ferroelectric to non-ferroelectric polarization of ferroelectric capacitors 114 and 124 . For example, thicker ferroelectric capacitors/thicker ferroelectric layers 116/126 with stronger ferroelectric polarization may be used. Therefore, erase-program can be used to program weights into FE-FETs 110A and 120A.

為執行推論運算，分別在輸入線102及104上提供輸入x及其補數x_bar。將選擇線108驅動成低態。因此，FE-FET 110A及120A的閘極被容許浮置。FE-FET 110A及120A的閘極被容許可浮置，以容許閘極電壓在推論運算期間超過供電電壓且提供全輸出擺幅（full output swing）。期望將鐵電電壓的差最小化或減小，以抑制讀取干擾（read disturbance）。可在儲存節點150上產生計算胞元100A的輸出（輸入x與權重的反互斥或）。因此，推論運算可被執行。用於執行推論運算的時間亦可保持為小的。舉例而言，圖6是繪示執行數位反互斥或運算的計算胞元的示例性實施例的推論運算的時序的圖210。短劃線212指示輸入線102上的輸入x自低態轉變成高態。短劃線214及點線216指示在推論運算期間於FE-FET 110A及120A的閘極（即，電晶體112的閘極及電晶體122的閘極）上產生的電壓。如在圖6中可看出，趨穩至最終電壓可在少於0.1奈秒（ns）內發生。To perform inference operations, an input x and its complement x_bar are provided on input lines 102 and 104, respectively. Select line 108 is driven low. Therefore, the gates of FE-FETs 110A and 120A are allowed to float. The gates of FE-FETs 110A and 120A are allowed to float to allow gate voltages to exceed the supply voltage during inference operations and to provide full output swing. It is desirable to minimize or reduce the difference in ferroelectric voltage to suppress read disturbance. The output of the computation cell 100A (the inverse exclusive OR of the input x and the weight) may be generated on the storage node 150 . Therefore, inference operations can be performed. The time for performing inference operations can also be kept small. For example, FIG. 6 is a diagram 210 illustrating the timing of an inference operation of an exemplary embodiment of a computation cell performing a digital exclusive OR operation. Dashed line 212 indicates the transition of input x on input line 102 from a low state to a high state. Dashed line 214 and dotted line 216 indicate the voltages developed on the gates of FE-FETs 110A and 120A (ie, the gates of transistor 112 and the gates of transistor 122 ) during inference operations. As can be seen in Figure 6, settling to the final voltage can occur in less than 0.1 nanoseconds (ns).

因此，計算胞元100A可具有改良的效能。計算胞元100A可利用僅兩個FE-FET 110A及120A與兩個選擇n型FET（nFET） 132及134的組合。因此，計算胞元100A可為緊湊的。由於FE-FET 110A及120A可以數位方式被程式化，因此程式化可為穩健的。權重是藉由鐵電層116/126的極化而在本地端儲存。由於不需要自晶片外DRAM存取權重，因此時間及功率得以節省。由於計算胞元100A可以數位方式執行反互斥或（推論）運算，因此與類比實施方案相較，在儲存節點150上產生的輸出可展現出降低的雜訊。此外，推論運算是迅速且高效地執行。計算胞元100A對於讀取干擾可為穩健的。各FE-FET的閘極節點（鐵電電容器114及124的頂部節點）在推論期間浮置。因此，推論事件跨鐵電電容器114/124自身而確立（assert）極小的電壓。此外，此小的電壓增量是在較標準鐵電材料的鐵電極化回應小得多的時間標度上發生。如上參照圖6所述，推論時間可合理地被保持於0.1奈秒之下。此時間標度可較標準鐵電材料的鐵電回應的時間小得多。此乃因期望在推論運算期間鐵電極化不發生改變。此回應時間較PbZrTi回應快大約兩個數量級，且較HfZrO回應快至少幾個數量級。因此，預期鐵電層116/126的極化不發生改變。因此，反覆的推論事件可對FE-FET 110A及120A中的閘極節點電壓幾乎沒有影響。此表明鐵電層116/126的極化狀態一直未發生改變。因此，推論運算/讀取操作可不干擾FE-FET 110A及120A的經程式化狀態。Therefore, the computing cell 100A may have improved performance. Compute cell 100A may utilize only two FE-FETs 110A and 120A in combination with two select n-type FETs (nFETs) 132 and 134 . Thus, the computing cell 100A may be compact. Since FE-FETs 110A and 120A can be programmed digitally, the programming can be robust. The weights are stored locally by the polarization of the ferroelectric layer 116/126. Time and power are saved because access weights from off-die DRAM are not required. Since the compute cell 100A can digitally perform an inverse OR (deduction) operation, the output generated on the storage node 150 can exhibit reduced noise compared to an analog implementation. Furthermore, inference operations are performed quickly and efficiently. Compute cell 100A may be robust to read disturbs. The gate nodes (top nodes of ferroelectric capacitors 114 and 124) of each FE-FET float during inference. Thus, the corollary event asserts an extremely small voltage across the ferroelectric capacitor 114/124 itself. Furthermore, this small voltage increment occurs on a much smaller timescale than the ferroelectric polarization response of standard ferroelectric materials. As described above with reference to FIG. 6, the inference time can reasonably be kept below 0.1 nanoseconds. This time scale can be much smaller than the time of ferroelectric response of standard ferroelectric materials. This is because it is expected that the ferroelectric polarization does not change during the inference operation. This response time is about two orders of magnitude faster than the PbZrTi response and at least several orders of magnitude faster than the HfZrO response. Therefore, no change in polarization of the ferroelectric layer 116/126 is expected. Therefore, repeated corollary events may have little effect on the gate node voltages in FE-FETs 110A and 120A. This indicates that the polarization state of the ferroelectric layer 116/126 has not changed. Therefore, inference/read operations may not disturb the programmed states of FE-FETs 110A and 120A.

亦可在三值化運算中使用計算胞元100A。對於三值化運算，使用完整權重集合{1, 0, -1}。對於零權重，在以上所述的抹除之後不將計算胞元100A程式化。換言之，抹除-程式化操作是簡單地藉由將計算胞元100A抹除而完成。然而，存在因反覆推論而在儲存節點150處發生電荷累積的可能性。此發生可能是由於當FE-FET 110A及120A均關斷（對於零權重而言，即為此種情形）時動態儲存節點150的自然放電率與推論率相較為低的。為防止發生此種電荷累積，在三值化運算的每次推論之前執行顯式重設。在反互斥或網路情形中，輸入線102及104的x及x_bar的最初接地狀態足以藉由FeFET 110A及/或120A將儲存節點150放電。The calculation cell 100A can also be used in the ternary operation. For ternarization, use the full set of weights {1, 0, -1}. For zero weight, the computation cell 100A is not programmed after the erase described above. In other words, the erase-program operation is simply accomplished by erasing the computing cell 100A. However, there is a possibility that charge accumulation occurs at the storage node 150 due to back and forth. This may occur because the natural discharge rate of dynamic storage node 150 is low compared to the deduced rate when both FE-FETs 110A and 120A are off, which is the case for zero weight. To prevent this charge accumulation from occurring, an explicit reset is performed before each inference of the ternarization operation. In an anti-mutual exclusion or net situation, the initial ground state of x and x_bar of input lines 102 and 104 is sufficient to discharge storage node 150 by FeFET 110A and/or 120A.

在一個實施例中，可在無任何附加電晶體或互連線的情況下使用計算胞元100A。在此種實施例中，儲存節點150藉由FE-FET 110A及120A被放電。然而，藉由在選擇電晶體132及134接通的同時分別對程式化線152及154施加高電壓來提高FE-FET 110A及120A的導通性。FE-FET 110A及120A上閘極電壓的增添使得常關型FE-FET 110A及120A臨時更具導通性。FE-FET 110A及120A的此種較高導通性使得儲存節點150能夠被快速放電。雖然此種方法起作用，然而在每次推論時，均會對程式化線152及154施加高電壓脈衝。此在原本極不頻繁地受到應力的選擇電晶體132及134上造成增大的電源與電壓應力。作為另外一種選擇，可使用計算胞元的不同實施例。In one embodiment, the computing cell 100A may be used without any additional transistors or interconnect lines. In such an embodiment, storage node 150 is discharged through FE-FETs 110A and 120A. However, the conduction of FE-FETs 110A and 120A is increased by applying a high voltage to programming lines 152 and 154, respectively, while select transistors 132 and 134 are on. The increase in gate voltage on FE-FETs 110A and 120A makes normally-off FE-FETs 110A and 120A temporarily more conductive. This higher conductance of FE-FETs 110A and 120A enables storage node 150 to be discharged quickly. While this approach works, a high voltage pulse is applied to the stylized lines 152 and 154 at each inference. This causes increased power and voltage stress on select transistors 132 and 134, which would otherwise be stressed very infrequently. Alternatively, different embodiments of computing cells may be used.

圖7繪示用於藉由顯式重設操作來執行數位反互斥或運算的計算胞元100B的另一示例性實施例。計算胞元100B類似於反互斥或胞元100及計算胞元100A。因此，計算胞元100B可用於神經元180或其他應用中。因此，計算胞元100B的與胞元100/100A中的組件類似的部分被相似地標示。因此，計算胞元100B包括分別與輸入線102及104、FE-FET 110/110A及120/120A、選擇電晶體130/132及134、輸出線106以及選擇線108類似的輸入線102及104、FE-FET 110B及120B、選擇電晶體132及134、輸出線106以及選擇線108。圖中亦示出與圖3所示者類似的動態輸出節點150以及程式化線152及154。選擇電晶體132及134是n-FET。FE-FET 110B及120B分別包括FET 112及122且分別包括具有鐵電層的鐵電電容器114及124。FET 112及122以及鐵電電容器114及124類似於圖3所示者。介電層（圖7中未標示）可包含PbZrTi、HfZrO、BaTiO₃ 、B₁₂ TiO₂₀ 、GeTe及Ba_x Eu_1-x TiO₃ 中的一或多者，其中x大於0且不大於1。組件102、104、106、108、110B、112、114、120B、122、124、132、134、150、152及154的結構及功能類似於圖2至圖4中編號相同的組件。FIG. 7 illustrates another exemplary embodiment of a computation cell 100B for performing a digital exclusive OR operation with an explicit reset operation. Computation cell 100B is similar to antimutex OR cell 100 and computation cell 100A. Therefore, the computational cell 100B can be used in the neuron 180 or other applications. Accordingly, portions of computing cell 100B that are similar to components in cells 100/100A are similarly labeled. Computational cell 100B thus includes input lines 102 and 104 similar to input lines 102 and 104, FE-FETs 110/110A and 120/120A, select transistors 130/132 and 134, output line 106, and select line 108, respectively. FE-FETs 110B and 120B, select transistors 132 and 134 , output line 106 and select line 108 . Also shown is a dynamic output node 150 and stylized lines 152 and 154 similar to those shown in FIG. 3 . Select transistors 132 and 134 are n-FETs. FE-FETs 110B and 120B include FETs 112 and 122, respectively, and include ferroelectric capacitors 114 and 124, respectively, having ferroelectric layers. FETs 112 and 122 and ferroelectric capacitors 114 and 124 are similar to those shown in FIG. 3 . The dielectric layer (not shown in FIG. 7 ) may include one or more of PbZrTi, HfZrO, BaTiO ₃ , B ₁₂ TiO ₂₀ , GeTe, and Ba _x Eu _1-x TiO ₃ , where x is greater than 0 and not greater than 1. Components 102 , 104 , 106 , 108 , 110B, 112 , 114 , 120B, 122 , 124 , 132 , 134 , 150 , 152 , and 154 are similar in structure and function to like-numbered components in FIGS. 2-4 .

計算胞元100B亦包括重設電晶體140（其可為n-FET）以及重設線142。重設電晶體140的閘極耦合至重設線142，而源極耦合至地。為將FE-FET 110B及120B抹除，將重設線142設定成低態，對程式化線152及154低態地施加脈衝，且將輸入線102及104設定成高態。對於推論/反互斥或運算，在分別於輸入線102及104上施加輸入x及x_bar之前，藉由將重設線142通電來接通重設FET 140。因此，重設電晶體140的使用將儲存節點150放電。然後，可施加輸入x及x_bar，且可執行推論運算。因此，在以三值化模式使用計算胞元100B時，可避免出現以上所述的高電壓。在以下兩種情形之間所作的選擇取決於目標及技術約束：利用較小計算胞元100A而對程式化線152及154施加高電壓、與利用具有重設FET 140的較大計算胞元100B但不採用高電壓。The computing cell 100B also includes a reset transistor 140 (which can be an n-FET) and a reset line 142 . Reset transistor 140 has its gate coupled to reset line 142 and its source coupled to ground. To erase FE-FETs 110B and 120B, reset line 142 is set low, programming lines 152 and 154 are pulsed low, and input lines 102 and 104 are set high. For inference/inverse exclusive-or operations, reset FET 140 is turned on by energizing reset line 142 before applying inputs x and x_bar on input lines 102 and 104, respectively. Thus, use of reset transistor 140 discharges storage node 150 . Then, the inputs x and x_bar can be applied, and an inference operation can be performed. Therefore, when the computing cell 100B is used in the ternary mode, the above-mentioned high voltage can be avoided. The choice between using the smaller compute cell 100A with a high voltage applied to the programmed lines 152 and 154 and using the larger compute cell 100B with the reset FET 140 depends on the goals and technical constraints. But do not use high voltage.

圖8是繪示用於使用硬體胞元的示例性實施例來執行反互斥或運算的方法的示例性實施例的流程圖。為簡單起見，一些步驟可被省略、以另一次序來執行及/或加以組合。方法300亦係在反互斥或胞元100/100A/100B的上下文中加以闡述。然而，方法300可結合另一反互斥或計算胞元而使用。8 is a flowchart illustrating an exemplary embodiment of a method for performing an exclusive OR operation using an exemplary embodiment of a hardware cell. For simplicity, some steps may be omitted, performed in another order, and/or combined. Method 300 is also described in the context of demutual exclusion or cells 100/100A/100B. However, method 300 may be used in conjunction with another anti-mutex or computation cell.

藉由步驟302，將權重程式化至FE-FET 110/110A/110B及120/120A/120B中。因此，可如上所述來執行步驟302。舉例而言，步驟302可包括將計算胞元100/100A/100B抹除，隨後進行程式化步驟。雖然被示出為流程300的一部分，然而步驟302可在方法300的其餘步驟很久之前實施且可與所述其餘步驟脫離。By step 302, weights are programmed into FE-FETs 110/110A/110B and 120/120A/120B. Accordingly, step 302 may be performed as described above. For example, step 302 may include erasing the computing cell 100/100A/100B, followed by a programming step. Although shown as part of process 300 , step 302 may be performed well before and disconnected from the remaining steps of method 300 .

藉由步驟304，視需要將重設線142驅動成高態、然後驅動成低態，以為重設電晶體140賦能。步驟304是針對計算胞元100B而執行。作為另外一種選擇，可藉由所施加電壓來重設FE-FET 110A及120A。藉由步驟306，接收訊號及其補數。步驟306可包括分別在輸入線102及104中接收x_value及x_value_bar。如上所述來執行推論運算。然後，藉由步驟308，可轉發反互斥或運算的結果。Reset line 142 is driven high and then low to energize reset transistor 140 as desired, via step 304 . Step 304 is performed for the computing cell 100B. Alternatively, FE-FETs 110A and 120A may be reset by an applied voltage. By step 306, the signal and its complement are received. Step 306 may include receiving x_value and x_value_bar in input lines 102 and 104, respectively. Inference operations are performed as described above. Then, by step 308, the result of the inverse exclusive OR operation can be forwarded.

因此，使用方法300，可使用反互斥或胞元100、100A、100B及/或類似的裝置。因此，可達成反互斥或胞元100、100A、100B及/或類似的裝置中的一或多者的優點。已闡述了一種用於以二值化模式或三值化模式使用緊湊FE-FET計算胞元100/100A/100B來執行數位反互斥或運算的方法及系統。已根據所示的示例性實施例闡述了所述方法及系統，且此項技術中具有通常知識者將容易認識到，可對各實施例作出變化，且任何變化將處於所述方法及系統的精神及範圍內。因此，在不背離隨附申請專利範圍的精神及範圍的條件下，此項技術中具有通常知識者可作出諸多潤飾。Thus, using method 300, antimutex or cells 100, 100A, 100B, and/or similar devices may be used. Accordingly, anti-mutual exclusion or the advantages of one or more of cells 100, 100A, 100B, and/or similar devices may be achieved. A method and system have been described for performing digital de-OR operations using compact FE-FET computation cells 100/100A/100B in binarization mode or ternarization mode. The methods and systems have been described in terms of the exemplary embodiments shown, and those of ordinary skill in the art will readily recognize that changes can be made to the various embodiments and that any changes will be within the scope of the methods and systems described. spirit and range. Accordingly, numerous modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the appended claims.

100:數位反互斥或計算胞元/反互斥或胞元/計算胞元 100-1、100-2、100-3、100-4:計算胞元 100A:計算胞元/反互斥或胞元/胞元/緊湊FE-FET計算胞元 100B:計算胞元/反互斥或胞元/緊湊FE-FET計算胞元 102、104:輸入線/訊號輸入線/組件 106:輸出線/輸出/組件 108:選擇線/組件 110、110A、120、120A:FE-FET 110B、120B:FE-FET/組件 112、122:FET/電晶體/組件 114、124:鐵電電容器/電容器/鐵電層/組件 116、126:介電層/鐵電層 130:選擇電晶體 132、134:選擇電晶體/選擇nFET/組件 140:重設電晶體/重設FET 142:重設線 150:儲存節點/動態儲存節點/動態輸出節點/組件 152、154:程式化線/組件 180:數位神經網路的一部分/神經元 190:位元計數與正負號區塊 200、210:時序圖 202:實線 204、216:點線 206、207、208、212、214:短劃線 209:垂直線 300:方法/流程 302、304、306、308:步驟 P:程式化訊號/訊號 P_bar:程式化訊號的補數/訊號 x:輸入/輸入電壓 x1、x1_bar、x2、x2_bar、x3、x3_bar、x4、x4_bar:輸入 x_bar:輸入的補數/輸入電壓/輸入100: digital anti-mutual exclusion or calculation cell / anti-mutual exclusion or cell / calculation cell 100-1, 100-2, 100-3, 100-4: Computing cells 100A: Computation Cell/Demutual Exclusion or Cell/Cell/Compact FE-FET Computation Cell 100B: Computation Cell / Antimutex or Cell / Compact FE-FET Computation Cell 102, 104: input line/signal input line/component 106:Output line/output/component 108:Select line/component 110, 110A, 120, 120A: FE-FET 110B, 120B: FE-FET/component 112, 122: FET/transistor/component 114, 124: Ferroelectric capacitors/capacitors/ferroelectric layers/components 116, 126: dielectric layer/ferroelectric layer 130:Select Transistor 132, 134: select transistor/select nFET/component 140: reset transistor / reset FET 142: reset line 150:Storage Node/Dynamic Storage Node/Dynamic Output Node/Component 152, 154: Stylized lines/components 180: Part of a digital neural network/neuron 190:Bit Count and Sign Block 200, 210: timing diagram 202: solid line 204, 216: point line 206, 207, 208, 212, 214: dashes 209: vertical line 300: method/process 302, 304, 306, 308: steps P: stylized signal/signal P_bar: Complement/signal of programmed signal x: input/input voltage x1, x1_bar, x2, x2_bar, x3, x3_bar, x4, x4_bar: input x_bar: complement of input/input voltage/input

圖1是繪示數位反互斥或計算胞元的示例性實施例的方塊圖。圖2是繪示神經網路的一部分的示例性實施例的方塊圖，所述神經網路包括多個反互斥或計算胞元且執行乘法-累加運算。圖3繪示用於執行數位反互斥或運算的計算胞元的示例性實施例。圖4繪示可在執行數位反互斥或運算的計算胞元中使用的FE-FET的一部分的示例性實施例。圖5是繪示對執行數位反互斥或運算的計算胞元的示例性實施例的程式化的時序圖。圖6是繪示執行數位反互斥或運算的計算胞元的示例性實施例的推論運算的時序的圖。圖7繪示用於執行數位反互斥或運算的計算胞元的另一示例性實施例。圖8是繪示用於使用計算胞元的示例性實施例來執行反互斥或運算的方法的示例性實施例的流程圖。FIG. 1 is a block diagram illustrating an exemplary embodiment of a digital antimutex or computation cell. FIG. 2 is a block diagram illustrating an exemplary embodiment of a portion of a neural network that includes a plurality of demutual exclusion or computation cells and performs a multiply-accumulate operation. FIG. 3 illustrates an exemplary embodiment of a computation cell for performing a digital exclusive OR operation. Figure 4 illustrates an exemplary embodiment of a portion of a FE-FET that may be used in a computational cell that performs a digital exclusive OR operation. 5 is a stylized timing diagram illustrating an exemplary embodiment of a computation cell performing a digital exclusive OR operation. 6 is a diagram illustrating the timing of an inference operation of an exemplary embodiment of a computation cell performing a digital exclusive OR operation. FIG. 7 illustrates another exemplary embodiment of a computation cell for performing a digital exclusive OR operation. FIG. 8 is a flowchart illustrating an exemplary embodiment of a method for performing an exclusive OR operation using an exemplary embodiment of a compute cell.

100:數位反互斥或計算胞元/反互斥或胞元/計算胞元 100: digital anti-mutual exclusion or calculation cell / anti-mutual exclusion or cell / calculation cell

102、104:輸入線/訊號輸入線/組件 102, 104: input line/signal input line/component

106:輸出線/輸出/組件 106:Output line/output/component

108:選擇線/組件 108:Select line/component

110、120:FE-FET 110, 120: FE-FET

130:選擇電晶體 130:Select Transistor

140:重設電晶體/重設FET 140: reset transistor / reset FET

150:儲存節點/動態儲存節點/動態輸出節點/組件 150:Storage Node/Dynamic Storage Node/Dynamic Output Node/Component

x:輸入/輸入電壓 x: input/input voltage

x_bar:輸入的補數/輸入電壓/輸入 x_bar: complement of input/input voltage/input

Claims

A computational cell for performing an inverse exclusive OR operation on input signals and weights, comprising: at least one pair of ferroelectric field effect transistors coupled to a plurality of input lines and storing the weights, the at least one pair of ferroelectric field effect transistors The transistor includes a first ferroelectric field effect transistor and a second ferroelectric field effect transistor, the first ferroelectric field effect transistor receives the input signal and stores the first weight, and the second ferroelectric field effect transistor receives the input signal The complement of the input signal and store the second weight, the first ferroelectric field effect transistor is coupled with the second ferroelectric field effect transistor to form a dynamic storage node; a plurality of selection transistors, and the at least one pair a ferroelectric field effect transistor coupled; a reset transistor having a reset transistor source, a reset transistor gate and a reset transistor drain, the reset transistor source being connected to the dynamic storage node, The reset transistor gate is coupled to a reset line.

The computing cell as described in item 1 of the scope of the patent application, wherein the multiple selection transistors include a first selection transistor and a second selection transistor, and the first selection transistor and the second selection transistor Each of them includes a gate, a source and a drain, the source of the first select transistor is connected to the first gate of the first ferroelectric field effect transistor, the second select transistor The source of the crystal is connected to the second gate of the second ferroelectric field effect transistor, and the gate of the first selection transistor and the gate of the second selection transistor are coupled to Select the line.

The computing cell as described in item 2 of the scope of the patent application, wherein the drain of the first selection transistor is coupled to a programming line, and the second selection transistor The drain of is coupled to a programmed complement line.

The computing cell as described in item 1 of the scope of the patent application, wherein the first ferroelectric field effect transistor includes a first drain and a first source, and the first source is connected to the plurality of input lines The first input line is coupled, and wherein the second ferroelectric field effect transistor includes a second source and a second drain, and the second source is coupled to a second input line among the plurality of input lines, so The second drain is coupled to the first drain to provide the dynamic storage node.

The computing cell as described in item 1 of the scope of the patent application, wherein the first ferroelectric field effect transistor includes a first ferroelectric material, the second ferroelectric field effect transistor includes a second ferroelectric material, and the first ferroelectric field effect transistor includes a second ferroelectric material. A ferroelectric material and the second ferroelectric material include at least one of PbZrTi, HfZrO, BaTiO ₃ , B ₁₂ TiO ₂₀ , GeTe and Ba _x Eu _1-x TiO ₃ , where x is greater than 0 and not greater than 1.

The computing cell described in claim 1 of the present invention, wherein the first weight and the second weight are complements of each other when the weight is non-zero and equal when the weight is zero.

The computing cell as described in item 1 of the scope of the patent application, wherein the at least one pair of ferroelectric field effect transistors is composed of the first ferroelectric field effect transistor and the second ferroelectric field effect transistor, and the The multiple selection transistors are composed of a first selection transistor and a second selection transistor.

A neural network comprising: a plurality of input lines; a plurality of anti-mutually exclusive or cells, each of said plurality of anti-mutually exclusive or cells for performing a digital demutual exclusive OR operation on the input signals and weights, each of the plurality of demutual exclusive OR cells comprising at least one pair of ferroelectric field effect transistors, a plurality of select transistors, and a reset transistor, so The plurality of selection transistors are coupled to the at least one pair of ferroelectric field effect transistors, the reset transistors have a reset transistor source, a reset transistor gate, and a reset transistor drain, and the at least A pair of ferroelectric field effect transistors is coupled to a part of the plurality of input lines and stores the weight, the at least one pair of ferroelectric field effect transistors includes a first ferroelectric field effect transistor and a second ferroelectric field effect transistor, The first ferroelectric field effect transistor receives the input signal and stores a first weight, the second ferroelectric field effect transistor receives a complement of the input signal and stores a second weight, and the first ferroelectric field effect transistor A transistor is coupled to the second FFET to form a dynamic storage node, a source of the reset transistor is connected to the dynamic storage node, and a gate of the reset transistor is coupled to a reset line.

The neural network as described in item 8 of the scope of the patent application, wherein the multiple selection transistors include a first selection transistor and a second selection transistor, and the first selection transistor and the second selection transistor Each of them includes a gate, a source and a drain, the source of the first select transistor is connected to the first gate of the first ferroelectric field effect transistor, the second select transistor The source of the crystal is connected to the second gate of the second ferroelectric field effect transistor, and the gate of the first selection transistor and the gate of the second selection transistor are coupled to a selection line, the drain of the first selection transistor is coupled to a programming line, and the drain of the second selection transistor is coupled to a programming complement line, and wherein the at least one pair of iron The electric field effect transistor is composed of the first ferroelectric field effect transistor and the second ferroelectric field effect transistor become.

The neural network described in item 9 of the scope of the patent application, wherein the first ferroelectric field effect transistor includes a first drain and a first source, and the first source is connected to the plurality of input lines The first input line is coupled, and wherein the second ferroelectric field effect transistor includes a second drain and a second source, and the second source is coupled to a second input line among the plurality of input lines, so The first drain is coupled to the second drain to form the dynamic storage node.

The neural network described in item 10 of the scope of the patent application further includes: a plurality of reset lines, each of the plurality of reset lines is connected with at least one of the plurality of antimutual exclusion or cells or the reset transistor gate coupling.

The neural network described in item 8 of the scope of the patent application, wherein the first ferroelectric field effect transistor includes a first ferroelectric material, the second ferroelectric field effect transistor includes a second ferroelectric material, and the first ferroelectric field effect transistor includes a second ferroelectric material. a ferroelectric material and said second ferroelectric material include at least one of PbZrTi, HfZrO, BaTiO ₃ , B ₁₂ TiO ₂₀ , GeTe, and Ba _x Eu _1-x TiO ₃ , where x is greater than zero and not greater than 1, And wherein the first ferroelectric material and the second ferroelectric material are integrated at the first metal layer or higher of the neural network.

A method for performing a digital exclusive OR operation comprising: providing an input signal and a complement of the input signal to an exclusive OR cell for performing a digital exclusive OR operation on the input signal and weights The anti-mutual exclusion or cell includes at least one pair of ferroelectric field effect transistors, a plurality of selection transistors and reset transistors, and the plurality of selection transistors are coupled to the at least one pair of ferroelectric field effect transistors , The at least one pair of ferroelectric field effect transistors is coupled to a plurality of input lines and stores the weight, the at least one pair of ferroelectric field effect transistors includes a first ferroelectric field effect transistor and a second ferroelectric field effect transistor, so The first ferroelectric field effect transistor receives the input signal and stores the first weight, the second ferroelectric field effect transistor receives the complement of the input signal and stores the second weight, and the first ferroelectric field effect transistor a crystal coupled to the second ferroelectric field effect transistor to form a dynamic storage node, the reset transistor has a reset transistor source, a reset transistor gate and a reset transistor drain, the reset transistor The source of the transistor is connected to the dynamic storage node, and the gate of the reset transistor is coupled to a reset line.

The method described in item 13 of the scope of the patent application, wherein the multiple selection transistors include a first selection transistor and a second selection transistor, and the at least one pair of ferroelectric field effect transistors is controlled by the first ferroelectric field effect transistor and the second ferroelectric field effect transistor, each of the first selection transistor and the second selection transistor includes a gate, a source and a drain, and the first selection The source of the transistor is connected to the first gate of the first ferroelectric field effect transistor, and the source of the second selection transistor is connected to the second gate of the second ferroelectric field effect transistor. gate, the drain of the first selection transistor is coupled to the programming line, the drain of the second selection transistor is coupled to the programming complement line, and the drain of the first selection transistor is coupled to the programming line. The gate and the gate of the second selection transistor are coupled to a selection line, and the method further includes: programming the first weight and the second weight to the first FFET crystal and the second ferroelectric field effect transistor, programming the first weight and the The step of said second weighting further includes: by setting said stylized line to ground, setting said stylized complement line to said ground, setting said select line to high state, setting said plurality of an input line and an input complement line of the input lines are set to the high state to erase the first ferroelectric field effect transistor and the second ferroelectric field effect transistor; and after the erasing step, The second is configured by setting the input line to the ground, setting the input complement line to the ground, setting the select line to the high state, and one of the following Writing to a ferroelectric field effect transistor and said second ferroelectric field effect transistor: applying a pulse to said programmed line, applying a pulse to said programmed complement line, and not writing to said programmed line and said Either of the stylized complement lines applies a pulse.

The method according to claim 14, wherein the programming step uses a non-negative voltage.

As for the method described in claim 14, wherein the step of providing the input signal further includes: implementing a resetting step and an evaluating step.

The method described in item 16 of the scope of patent application, wherein the first ferroelectric field effect transistor includes a first drain and a first source, and the first source is connected to the first of the plurality of input lines Input line coupling, wherein the second ferroelectric field effect transistor includes a second source and a second drain, the second source is coupled to a second input line among the plurality of input lines, and the second A drain is coupled to the first drain to provide the dynamic storage node, and the method further includes: The reset line is pulsed prior to the evaluating step.

The method of claim 16, further comprising: prior to said evaluating step, pulsing said stylized line, said stylized complement line, and said select line in a high state.

The method described in item 14 of the scope of patent application, wherein the step of providing the input signal further includes: setting the selection line to a low state; while the selection line is in a low state, the input The input signal is provided on a line and the complement of the input signal is provided on the input complement line.