TWI750913B - Computing device and method - Google Patents
Computing device and method Download PDFInfo
- Publication number
- TWI750913B TWI750913B TW109140899A TW109140899A TWI750913B TW I750913 B TWI750913 B TW I750913B TW 109140899 A TW109140899 A TW 109140899A TW 109140899 A TW109140899 A TW 109140899A TW I750913 B TWI750913 B TW I750913B
- Authority
- TW
- Taiwan
- Prior art keywords
- output
- memory cells
- read
- capacitors
- input
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/41—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming static cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger
- G11C11/412—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming static cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger using field-effect transistors only
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/4063—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
- G11C11/407—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
- G11C11/408—Address circuits
- G11C11/4085—Word line control circuits, e.g. word line drivers, - boosters, - pull-up, - pull-down, - precharge
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M1/00—Analogue/digital conversion; Digital/analogue conversion
- H03M1/12—Analogue/digital converters
- H03M1/34—Analogue value compared with reference values
- H03M1/36—Analogue value compared with reference values simultaneously only, i.e. parallel type
- H03M1/361—Analogue value compared with reference values simultaneously only, i.e. parallel type having a separate comparator and reference value for each quantisation level, i.e. full flash converter type
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/4063—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
- G11C11/407—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
- G11C11/409—Read-write [R-W] circuits
- G11C11/4093—Input/output [I/O] data interface arrangements, e.g. data buffers
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/4063—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
- G11C11/407—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
- G11C11/409—Read-write [R-W] circuits
- G11C11/4094—Bit-line management or control circuits
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/41—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming static cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger
- G11C11/413—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction
- G11C11/417—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction for memory cells of the field-effect type
- G11C11/419—Read-write [R-W] circuits
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
- G11C7/1006—Data managing, e.g. manipulating data before writing or reading out, data bus switches or control circuits therefor
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M1/00—Analogue/digital conversion; Digital/analogue conversion
- H03M1/66—Digital/analogue converters
- H03M1/74—Simultaneous conversion
- H03M1/80—Simultaneous conversion using weighted impedances
- H03M1/802—Simultaneous conversion using weighted impedances using capacitors, e.g. neuron-mos transistors, charge coupled devices
Landscapes
- Engineering & Computer Science (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- Power Engineering (AREA)
- Dram (AREA)
- Semiconductor Memories (AREA)
- Static Random-Access Memory (AREA)
Abstract
Description
本發明的實施例是有關於記憶體,且特別是有關於一種計算元件及方法。 Embodiments of the present invention relate to memory, and more particularly, to a computing device and method.
本發明實施例內容大體上是關於用於資料處理(諸如乘法累加操作)的記憶陣列。記憶體內計算(compute-in-memory)或記憶體內計算(in-memory computing)系統在電腦的主要隨機存取記憶體(random-access memory;RAM)中儲存資訊,且在記憶胞位準處執行計算,而非在用於每一計算步驟的主要RAM與資料儲存區之間移動大量資料。由於當儲存資料儲存於RAM中時可更快地存取儲存資料,因此記憶體內計算允許即時分析資料,使得在企業及機器學習應用中能夠更快地報告及做出決策。正在持續努力改進記憶體內計算系統的效能。 Embodiments of the present invention generally relate to memory arrays used for data processing, such as multiply-accumulate operations. A compute-in-memory or in-memory computing system stores information in a computer's primary random-access memory (RAM) and executes at the memory cell level Computation rather than moving large amounts of data between the main RAM and data storage areas for each computational step. Because stored data can be accessed faster when stored in RAM, in-memory computing allows for instant analysis of data, enabling faster reporting and decision-making in enterprise and machine learning applications. Efforts are ongoing to improve the performance of in-memory computing systems.
本發明實施例提供一種計算元件,包括:記憶陣列,包括以記憶胞的列及行分組的多個記憶胞,所述多個記憶胞中的每一者包括用以儲存資料的記憶體單元及具有讀取賦能輸入端及輸 出端的讀取埠;多個讀取賦能線,各自連接至記憶胞的各別列的所述讀取埠的所述讀取賦能輸入端且用以將輸入訊號傳輸至所述讀取賦能輸入端;多個資料輸出線,各自連接至記憶胞的各別行的所述讀取埠的所述輸出端;以及輸出介面,包括計算模組,所述計算模組包括多個電容器,每一電容器可連接至所述多個資料輸出線中的各別一者且具有電容,所述多個電容器中的至少兩者具有彼此不同的電容,所述輸出介面經組態以准許所述多個電容器共用儲存於其上的電荷。 Embodiments of the present invention provide a computing device including: a memory array including a plurality of memory cells grouped in columns and rows of memory cells, each of the plurality of memory cells including a memory cell for storing data and With read enable input and output a read port at the output end; a plurality of read enable lines, each connected to the read enable input of the read port of the respective row of memory cells and used to transmit input signals to the read An enabling input; a plurality of data output lines, each connected to the output of the read port of the respective row of the memory cells; and an output interface, including a computing module, the computing module including a plurality of capacitors , each capacitor is connectable to a respective one of the plurality of data output lines and has a capacitance, at least two of the plurality of capacitors have capacitances different from each other, the output interface is configured to permit all The plurality of capacitors share the charges stored thereon.
本發明實施例提供一種計算方法,包括:將多個多位元權重儲存在具有多個記憶胞的記憶陣列中,所述多個記憶胞以列及行組織且各自具有用以在節點處儲存訊號的記憶體單元及具有讀取賦能輸入端及輸出端的讀取埠,所述讀取埠用以在所述讀取賦能輸入端處的激活訊號之後在所述輸出端處產生指示儲存於所述記憶體單元中的所述節點處的所述訊號的訊號且將所述輸出端與所述節點隔離,所述記憶陣列更具有多個讀取賦能線,每一讀取賦能線連接至所述多個記憶胞的列的所述讀取賦能輸入端,其中儲存所述多個多位元權重中的每一者包括將所述多位元權重儲存在共用所述多個讀取賦能線中的各別一者的記憶胞的列中,所述記憶陣列更具有多個資料輸出線,每一資料輸出線連接至所述多個記憶胞的行的所述讀取埠的所述輸出端;將多個脈衝訊號序列施加至所述多個讀取賦能線中的所述各別一者以在記憶胞的各別列的所述讀取埠的多個輸出端中的每一者上產生輸出訊號;將記憶胞的多個行中的每一者的所述讀取埠的所述多個輸出端上的所述輸出訊號組合,且藉由顯著性因子加權所組合的輸出訊號, 所述顯著性因子中的至少兩者彼此不同;組合所加權的輸出訊號以產生類比輸出;以及將所述類比輸出轉換為數位輸出。 An embodiment of the present invention provides a computing method, comprising: storing a plurality of multi-bit weights in a memory array having a plurality of memory cells, the plurality of memory cells are organized in columns and rows and each has a memory cell for storing at a node Signal memory cell and read port with read enable input and output for generating an instruction store at the output after an activation signal at the read enable input The signal of the signal at the node in the memory cell and isolating the output from the node, the memory array further has a plurality of read enable lines, each read enable Lines are connected to the read enable inputs of the columns of the plurality of memory cells, wherein storing each of the plurality of multi-bit weights includes storing the multi-bit weights in a shared In the row of memory cells of a respective one of the read enable lines, the memory array further has a plurality of data output lines, each data output line connected to the read of the row of the plurality of memory cells taking the output of the port; applying a plurality of pulse signal sequences to the respective one of the plurality of read enable lines to obtain a plurality of the read ports in the respective rows of memory cells generating an output signal on each of the outputs; combining the output signals on the plurality of outputs of the read port of each of the plurality of rows of memory cells, and by salience The combined output signal weighted by the factor, at least two of the significance factors are different from each other; combining the weighted output signals to generate an analog output; and converting the analog output to a digital output.
本發明實施例提供一種計算方法,包括:將多個多位元權重儲存在具有多個記憶胞的記憶陣列中,所述多個記憶胞以列及行組織且各自具有用以在節點處儲存訊號的記憶體單元及具有讀取賦能輸入端及輸出端的讀取埠,所述讀取埠用以在所述讀取賦能輸入端處的激活訊號之後在所述輸出端處產生指示儲存於所述記憶體單元中的所述節點處的所述訊號的訊號且將所述輸出端與所述節點隔離;將輸入訊號同時乘以所述多個多位元權重中的每一者的每一位元以在所述讀取埠的所述輸出端處產生多個輸出訊號;對每一行中的所述多個記憶胞的所述讀取埠的所述輸出端處的所述多個輸出訊號進行求和;藉由顯著性因子加權每一行中的所述多個記憶胞的所述讀取埠的所述輸出端處的所述多個輸出訊號的總和中的每一者以產生各別加權總和,所述顯著性因子彼此不同;以及組合所述加權總和以產生類比輸出訊號。 An embodiment of the present invention provides a computing method, comprising: storing a plurality of multi-bit weights in a memory array having a plurality of memory cells, the plurality of memory cells are organized in columns and rows and each has a memory cell for storing at a node Signal memory cell and read port with read enable input and output for generating an instruction store at the output after an activation signal at the read enable input the signal of the signal at the node in the memory cell and isolating the output from the node; simultaneously multiplying the input signal by the value of each of the plurality of multi-bit weights Each bit generates a plurality of output signals at the output of the read port; for the plurality of memory cells in each row at the output of the read port summing the output signals; weighting each of the sums of the output signals at the outputs of the read ports of the memory cells in each row by a significance factor to generating individual weighted sums, the significance factors being different from each other; and combining the weighted sums to generate an analog output signal.
100:系統/記憶陣列 100: System/Memory Array
110:8電晶體靜態隨機存取記憶體胞 110:8 Transistor Static Random Access Memory Cell
120:6T記憶胞 120:6T memory cell
122:p型金屬氧化半導體場效應電晶體 122: p-type metal oxide semiconductor field effect transistor
124:n型MOS場效電晶體 124: n-type MOS field effect transistor
126:第一反相器 126: first inverter
132:PMOS 132: PMOS
134:NMOS 134: NMOS
136:第二反相器 136: Second inverter
142、144:寫入存取電晶體 142, 144: write access transistor
150:讀取埠 150: read port
152:讀取電晶體 152: read transistor
154:讀取存取電晶體 154: read access transistor
156、156[0]、156[1]、156[62]、156[63]、156[i]:讀取字元線 156, 156[0], 156[1], 156[62], 156[63], 156[i]: read word lines
160:寫入字元線 160: write word line
170、180:寫入位元線 170, 180: write bit lines
190、190[0]、190[1]、190[2]、190[3]、190[j]:讀取位元線 190, 190[0], 190[1], 190[2], 190[3], 190[j]: read bit lines
200:CIM系統 200: CIM System
210:輸入介面 210: Input interface
212:數位計數器 212: digital counter
214:驅動器 214: Drive
216:讀取/寫入介面 216: read/write interface
220:輸出介面 220: Output interface
222:補償模組 222: Compensation module
224:計算模組 224: Computing Module
226:感測放大器 226: Sense Amplifier
228:類比輸出端 228: analog output
230:四位元寬分段 230: four-bit wide segment
260、260[i]:子集 260, 260[ i ]: subset
250:FinFET結構 250: FinFET structure
252:p摻雜鰭 252: p-doped fin
254:n摻雜鰭 254:n-doped fin
256:多晶閘極 256: polygate
270:類比數位轉換器 270: Analog to Digital Converter
272[0]、272[7]、272[l]、SA0、SA1、SA2、SA3、SA4、SA5、SA6、SA7、SA8、SA9、SA10、SA11、SA12、SA13、SA14:比較器 272[0], 272[7], 272[ 1 ], SA0, SA1, SA2, SA3, SA4, SA5, SA6, SA7, SA8, SA9, SA10, SA11, SA12, SA13, SA14: Comparator
310:預充電時段 310: Precharge period
320:RBL取樣時段 320: RBL sampling period
330:電荷共用時段 330: charge sharing period
340:ADC評估時段 340: ADC evaluation period
500:計算 500: Calculate
510、520、530、540、550:步驟 510, 520, 530, 540, 550: Steps
Cm[0]、Cm[1]、Cm[2]、Cm[3]、Cm[j]:計算電容器 Cm [0], Cm [1], Cm [2], Cm [3], Cm [j]: Calculated capacitors
Cn[1]、Cn[2]、Cn[3]、Cn[j]:補償電容器 Cn [1], Cn [2], Cn [3], Cn [j]: Compensation capacitors
Cu:單位電容器 C u : unit capacitor
i:列 i : column
I 胞:胞電流 I cell : cell current
j:索引 j : index
N0、N1、N2、N3、Q:節點 N0, N1, N2, N3, Q: Node
PCH:預充電訊號 PCH: Precharge signal
QB:輸出端 QB: output terminal
S0A、S0B、S1、SH:開關元件 S0A, S0B, S1, SH: switching elements
VDD:參考電壓 VDD: reference voltage
當結合隨附圖式閱讀時,將自以下詳細描述最佳地理解本發明實施例的態樣。應注意,根據業界中的標準慣例,各種特徵未按比例繪製。事實上,出於論述的清楚起見,可任意增加或減小各種特徵的尺寸。 Aspects of embodiments of the invention are best understood from the following detailed description when read in conjunction with the accompanying drawings. It should be noted that, in accordance with standard practice in the industry, the various features are not drawn to scale. In fact, the dimensions of the various features may be arbitrarily increased or decreased for clarity of discussion.
圖1為示出根據一些實施例的記憶體內計算系統中的一部分中的具有與寫入位元線(write bit line;WBL)隔離的讀取位元線(read bit line;RBL)的記憶胞的實例的示意圖。 1 is a diagram illustrating a memory cell having a read bit line (RBL) isolated from a write bit line (WBL) in a portion of an in-memory computing system in accordance with some embodiments Schematic diagram of an example.
圖2A為根據一些實施例的記憶體內計算系統的方塊圖及記憶體內計算系統的一部分的更詳細方塊圖,繪示四位元精度權重計算子系統。 2A is a block diagram of an in-memory computing system and a more detailed block diagram of a portion of an in-memory computing system showing a four-bit precision weight computing subsystem, according to some embodiments.
圖2B為根據一些實施例的在圖2A中所繪示的系統中標記為「B」的八電晶體(eight-transistor;8T)RAM胞的示意圖。 2B is a schematic diagram of an eight-transistor (8T) RAM cell labeled "B" in the system depicted in FIG. 2A, according to some embodiments.
圖2C為根據一些實施例的圖2B的且在圖2A中所繪示的系統中標記為「C」的八電晶體(8T)RAM胞的示意性佈局。 2C is a schematic layout of the eight-transistor (8T) RAM cell of FIG. 2B and labeled "C" in the system depicted in FIG. 2A, according to some embodiments.
圖3A繪示根據一些實施例的處於RBL取樣狀態的記憶體內計算系統的一部分。 3A illustrates a portion of an in-memory computing system in an RBL sampling state, according to some embodiments.
圖3B繪示根據一些實施例的處於電荷共用狀態的圖3A中所繪示的記憶體內計算系統的一部分。 3B illustrates a portion of the in-memory computing system depicted in FIG. 3A in a charge sharing state, according to some embodiments.
圖3C繪示根據一些實施例的圖3A及圖3B中所繪示的記憶體內計算系統的一部分中的各種訊號的位準的時間演化。 3C illustrates the temporal evolution of the levels of various signals in a portion of the in-memory computing system depicted in FIGS. 3A and 3B, according to some embodiments.
圖4繪示根據一些實施例的用於處理RBL線上的電壓的類比/數位轉換方案。 4 illustrates an analog/digital conversion scheme for processing voltages on the RBL line, according to some embodiments.
圖5為根據一些實施例的概括計算方法的流程圖。 Figure 5 is a flow diagram summarizing a computing method in accordance with some embodiments.
以下揭露提供用於實施所提供主題的不同特徵的許多不同實施例或實例。下文描述組件及配置的具體實例以簡化本發明實施例。當然,此等組件及配置僅為實例且並不意欲為限制性的。舉例而言,在以下描述中,第一特徵在第二特徵上方或上的形成可包含第一特徵及第二特徵直接接觸地形成的實施例,且亦可包含額外特徵可在第一特徵與第二特徵之間形成以使得第一特徵與 第二特徵可不直接接觸的實施例。另外,本發明實施例可在各種實例中重複附圖標記及/或字母。此重複是出於簡單及清楚的目的,且本身並不指示所論述的各種實施例及/或組態之間的關係。 The following disclosure provides many different embodiments or examples for implementing different features of the provided subject matter. Specific examples of components and configurations are described below to simplify embodiments of the invention. Of course, these components and configurations are examples only and are not intended to be limiting. For example, in the following description, the formation of a first feature over or on a second feature may include embodiments in which the first feature and the second feature are formed in direct contact, and may also include additional features that may be formed between the first feature and the second feature. formed between the second features such that the first feature and Embodiments where the second feature may not be in direct contact. Additionally, embodiments of the invention may repeat reference numerals and/or letters in various instances. This repetition is for the purpose of simplicity and clarity, and does not in itself indicate a relationship between the various embodiments and/or configurations discussed.
本發明實施例中繪示的具體實例是關於記憶體內計算。記憶體內計算的應用的實例為乘法累加操作,其中數字的輸入數組乘以數字(權重)的另一數組(例如,行)中的各別元素(由所述各別元素加權),並且乘積一起相加(累加)以產生輸出總和。此在數學上類似於兩個向量的點積(或純量積),在此過程中,兩個向量的分量彼此逐對相乘,且對分量對的乘積進行求和。在諸如人工神經網路的某些人工智慧(artificial intelligence;AI)系統中,數字數組可由多行權重加權。由每一行進行的加權產生各別輸出總和。總和的輸出數組由此藉由多行矩陣中的權重自數字的輸入數組產生。 The specific examples shown in the embodiments of the present invention are related to in-memory computing. An example of an application of in-memory computing is a multiply-accumulate operation, where an input array of numbers is multiplied by a respective element (weighted by the respective element) in another array (eg, a row) of numbers (weights), and the products are multiplied together Add up (accumulate) to produce the output sum. This is mathematically analogous to the dot product (or scalar product) of two vectors, in which the components of the two vectors are multiplied pairwise with each other, and the product of the pair of components is summed. In some artificial intelligence (AI) systems, such as artificial neural networks, arrays of numbers may be weighted by multiple rows of weights. The weighting performed by each row yields an individual output sum. The output array of sums is thus generated from the input array of numbers by the weights in the multi-row matrix.
積體電路記憶體的常見類型為靜態隨機存取記憶體(static random access memory;SRAM)元件。典型的SRAM記憶體元件具有記憶胞陣列。在一些實例中,每一記憶胞使用連接於較高參考電位與較低參考電位(例如,接地)之間的六個電晶體(six transistors;6T),使得兩個儲存節點中的一者可被待儲存的資訊佔據,其中互補資訊儲存於另一儲存節點處。將SRAM胞中的每一位元儲存於形成兩個交叉耦接的反相器的電晶體中的四個上。另兩個電晶體連接至記憶胞字元線(word line;WL)以在讀取及寫入操作期間藉由將胞選擇性地連接至其位元線(bit line;BL)而控制對記憶胞的存取。當啟用字元線時,連接至位元線的感測放大器感測及輸出所儲存資訊。在處理記憶胞資料時,通常 使用連接至位元線的輸入/輸出(input/output;I/O)電路系統。當將多個WL激活且位元胞最初儲存相反的值時,兩個位元線可為較低/較高。 A common type of integrated circuit memory is a static random access memory (SRAM) device. A typical SRAM memory device has an array of memory cells. In some examples, each memory cell uses six transistors (6T) connected between a higher reference potential and a lower reference potential (eg, ground), so that one of the two storage nodes can be Occupied by information to be stored, wherein complementary information is stored at another storage node. Each bit in the SRAM cell is stored on four of the transistors forming the two cross-coupled inverters. The other two transistors are connected to the memory cell word line (WL) to control memory usage during read and write operations by selectively connecting the cell to its bit line (BL) access to cells. When a word line is enabled, a sense amplifier connected to the bit line senses and outputs the stored information. When dealing with memory cell data, usually Use input/output (I/O) circuitry connected to bit lines. When multiple WLs are activated and the bit cells initially store opposite values, the two bit lines can be lower/high.
在諸如記憶體內計算的多位元應用中,當多個字元線同時激活時,6T位元胞的穩定性會降低。當多個字元線同時激活時,兩個位元線電壓將被拉低。此可使得擾亂位元胞的穩定性且使得其狀態發生翻轉。另外,使用基於邏輯規則的SRAM位元胞具有顯著的區域開銷,除了其他事情以外,此是由於中間計算所需的儲存。又另外,使用已知記憶體配置的二進位輸入/權重/輸出對於記憶體內計算的一般用法可能過於簡單,因為待由用於記憶體內計算中的演算法解決的許多問題需要多位元計算步驟。本發明實施例中揭露的某些實施例提供具有直接結果的多位元記憶體內計算,而無需中間儲存空間,且不擾亂每一胞的穩定性。 In multi-bit applications such as in-memory computing, the stability of 6T bit cells is reduced when multiple word lines are activated simultaneously. When multiple word lines are activated at the same time, the two bit line voltages will be pulled low. This can cause the stability of the bit cell to be disturbed and its state to flip. Additionally, using logic-rule-based SRAM bit cells has significant area overhead due, among other things, to the storage required for intermediate computations. Still additionally, using binary inputs/weights/outputs of known memory configurations may be too simplistic for general usage of in-memory computing, since many problems to be solved by algorithms used in in-memory computing require multi-bit computing steps . Certain embodiments disclosed in the embodiments of the present invention provide multi-bit in-memory computations with direct results without intermediate storage space and without disturbing the stability of each cell.
根據本發明實施例的一些態樣,一種記憶體內計算(compute-in-memory;CIM)系統包含記憶陣列,其中每一記憶胞具有相互隔離的讀取位元線(RBL)及寫入位元線(WBL),經由所述讀取位元線可讀取儲存資訊,經由所述寫入位元線可將資訊寫入至胞。舉例而言,8T SRAM胞,其向6T SRAM添加連接至讀取字元線(RWL)及RBL的2T讀取埠。由於8T位元胞的RBL與6T記憶胞解耦,因此同時打開的多個RWL不擾亂儲存節點電壓。一些所揭露實施例提供具有包含多個RWL及RBL的8T SRAM胞陣列的CIM系統。 According to some aspects of the present embodiments, a compute-in-memory (CIM) system includes a memory array, wherein each memory cell has mutually isolated read bit lines (RBLs) and write bits Line (WBL), through which the stored information can be read, and through the write bit line, information can be written to the cells. For example, an 8T SRAM cell, which adds to a 6T SRAM a 2T read port connected to the read word line (RWL) and RBL. Since the RBL of the 8T bit cell is decoupled from the 6T memory cell, multiple RWLs turned on simultaneously do not disturb the storage node voltage. Some disclosed embodiments provide a CIM system with an 8T SRAM cell array including multiple RWLs and RBLs.
根據本發明實施例的某些態樣,具有多位元輸入的CIM系統可用多個RWL脈衝實現。舉例而言,在一些實施例中,乘法 累加操作中的輸入訊號可藉由數目與輸入成比例的數個RWL脈衝實現。在一些實例中,可使用4位元輸入,但其他位元寬度在本發明實施例的範疇內。舉例而言,輸入0(00002)由0個RWL脈衝表示,輸入310(00112)由3個RWL脈衝表示,輸入1510(11112)由15個RWL脈衝表示等。 According to some aspects of embodiments of the present invention, a CIM system with multi-bit inputs can be implemented with multiple RWL pulses. For example, in some embodiments, the input signal in a multiply-accumulate operation may be implemented by a number of RWL pulses proportional to the number of the input. In some examples, a 4-bit input may be used, although other bit widths are within the scope of embodiments of the present invention. For example, input 0 (0000 2 ) is represented by 0 RWL pulses, input 3 10 (0011 2 ) is represented by 3 RWL pulses, input 15 10 (1111 2 ) is represented by 15 RWL pulses, etc.
在一些實施例中,輸入訊號可乘以配置於行中的多位元(例如,四位元)權重(亦即,權重值)。多位元加權輸入的累加可藉由對來自對應於多位元權重的每一位元的行中的所有胞的共同RBL進行充電來實現;每一RBL上的電壓由此指示來自連接至RBL的每一胞的電流的總和,且由此指示輸入的總和,每一輸入由與行相關聯的二進位權重加權。由此對RBL執行乘法累加功能,且RBL電壓與具有多位元輸入的加權位元的逐位元乘積成比例。隨後,使用二進位加權電容器(亦即根據多位元權重中的各別顯著性位置設定大小的電容器)對多位元權重的每一行執行RBL之間的電荷共用。因此,權重的最高有效位元(most significant bit;MSB)比權重的最低有效位元(least significant bit;LSB)對最終輸出的貢獻大。電荷共用因此產生反映每一RBL的正確顯著性的類比電壓。舉例而言,在四位元權重的行的情況下,大多數MSB對最終電壓的貢獻將為來自LSB的貢獻的八(23)倍;來自第二MSB的貢獻將為來自LSB的貢獻的四(22)倍;以及來自第三MSB(或第二LSB)的貢獻將為來自LSB的貢獻的兩(21)倍。 In some embodiments, the input signal may be multiplied by a multi-bit (eg, four-bit) weight (ie, weight value) configured in the row. The accumulation of multi-bit weighted inputs can be accomplished by charging the common RBL from all cells in the row corresponding to each bit of the multi-bit weight; The sum of the currents for each cell of , and thus the sum of the inputs, each weighted by the binary weight associated with the row. A multiply-accumulate function is thus performed on the RBL, and the RBL voltage is proportional to the bit-by-bit product of the weighted bits with the multi-bit input. Then, charge sharing between RBLs is performed for each row of the multi-bit weights using binary-weighted capacitors (ie, capacitors sized according to the respective salience positions in the multi-bit weights). Therefore, the most significant bit (MSB) of the weight contributes more to the final output than the least significant bit (LSB) of the weight. Charge sharing thus produces analog voltages that reflect the correct significance of each RBL. For example, in the case of a four-bit weighted row, the contribution from the majority of the MSBs to the final voltage will be eight ( 23 ) times the contribution from the LSB; the contribution from the second MSB will be Four (2 2 ) times; and the contribution from the third MSB (or the second LSB) will be two (2 1 ) times the contribution from the LSB.
在某些其他實施例中,在一些實例中,諸如快閃式ADC的類比/數位轉換器(analog-to-digital converter;ADC)用於將RBL 上的電壓(在諸如上述的二進位加權電荷共用之後)轉換成多位元數位輸出。在一些實施例中,對於n位元輸出,2 n -1個比較器用於ADC實施。舉例而言,對於4位元輸出實例,15個比較器用於快閃式ADC實施。在一些實施例中,每一比較器具有其自身的輸入電容器。此等輸入電容器可用作用於電荷共用的上述二進位加權電容器。在某些實施例中,每一RBL連接至的輸入電容器的數目與關聯於RBL的輸出位元的位置值有關(例如,成比例)。舉例而言,對於4位元輸出,用於MSB的RBL連接至8(23)個輸入電容器;用於LSB的RBL連接至1(20)個輸入電容器。連接至每一RBL的總電容由此與對應於RBL的位置值成比例。其他位元寬度輸出在本發明實施例的範疇內。 In some other embodiments, an analog-to-digital converter (ADC) such as a flash ADC is used to convert the voltage on RBL (in a binary weighted charge such as described above) after sharing) into a multi-bit digital output. In some embodiments, for an n- bit output, 2n -1 comparators are used for the ADC implementation. For example, for the 4-bit output example, 15 comparators are used for the flash ADC implementation. In some embodiments, each comparator has its own input capacitor. These input capacitors can be used as the aforementioned binary weighting capacitors for charge sharing. In some embodiments, the number of input capacitors to which each RBL is connected is related (eg, proportional) to the position value of the output bit associated with the RBL. For example, for a 4-bit output, the RBL for the MSB is connected to 8 (2 3 ) input capacitors; the RBL for the LSB is connected to 1 (2 0 ) input capacitors. The total capacitance connected to each RBL is thus proportional to the position value corresponding to the RBL. Other bit width outputs are within the scope of embodiments of the present invention.
參考圖1至圖4,在下文進一步解釋此等實施例的詳細態樣之前,提供一些實例實施例的概述。在諸如人工智慧的某些應用中,提出一種模型系統。將輸入(例如,數字)集合供應至處理輸入及產生輸出的模型系統。將輸出與所要輸出進行比較,且若輸出與所要輸出不夠接近,則調整模型系統且重複製程,直至模型系統的輸出足夠接近所要輸出為止。舉例而言,為具有可讀取的機器,可向模型系統提供字母的片段的集合。系統獲取片段(輸入)且根據演算法處理片段,且輸出系統判定所接收之字母。若輸出字母不同於輸入字母,則系統可經調整且再次測試,直至輸出以足夠高的百分比的次數匹配輸入為止。 1-4, an overview of some example embodiments is provided before detailed aspects of these embodiments are explained further below. In some applications such as artificial intelligence, a model system is proposed. A set of inputs (eg, numbers) is supplied to a model system that processes the inputs and produces outputs. The output is compared to the desired output, and if the output is not close enough to the desired output, the model system is adjusted and the process repeated until the output of the model system is close enough to the desired output. For example, to have a machine that can read, the model system can be provided with a collection of fragments of letters. The system takes the segment (input) and processes the segment according to the algorithm, and outputs the letter the system determines received. If the output letter is different from the input letter, the system can be adjusted and tested again until the output matches the input a sufficiently high percentage of times.
對於一些應用,模型系統可為乘法累加系統,所述乘法累加系統藉由將每一輸入與值(有時被稱作「權重」)相乘來處理輸入集合,且一起對乘積進行求和(累加)。系統可包含以列及行 配置的部件的二維陣列,部件中的每一者儲存權重,且能夠接收輸入及產生輸出,所述輸出為輸入與所儲存權重的算術乘積。模型系統可具有供應至整列部件且將部件的每一行的輸出相加在一起的每一輸入。 For some applications, the model system may be a multiply-accumulate system that processes a set of inputs by multiplying each input by a value (sometimes called a "weight") and summing the products together ( cumulative). The system can contain columns and rows A two-dimensional array of configured components, each of which stores a weight and is capable of receiving an input and producing an output that is the arithmetic product of the input and the stored weight. The model system may have each input that supplies an entire column of parts and adds together the outputs of each row of parts.
舉例而言,圖1中所繪示的系統(100)具有多個8電晶體(8T)靜態隨機存取記憶體(SRAM)胞(110)(圖1中僅示出兩個胞)的行。每一胞(110)連接至輸入線RWL(156),且兩個胞連接至同一輸出線RBL(190)。每一胞亦具有節點Q,所述節點Q藉由SRAM胞維持在指示儲存於胞中的值(權重)的電壓下。如可自圖1中的圖式容易地理解,對於每一胞,對於輸入RWL處的二進位「1」,若Q為「1」,則胞(110)將自RBL吸取電流,且若Q為「0」,則不吸取電流;對於RWL處的二進位「0」,不管Q處的值如何,胞(110)不吸取電流。若在給定時間段內汲取的高於臨限值的電流量(亦即,汲取的一定電荷量)被視為輸出「1」,則單個胞(110)的輸出由此藉由下表1給定:
自此表顯而易見,輸出為輸入與權重的乘積。 It is obvious from this table that the output is the product of the input and the weight.
此外,由於同一行中的胞(110)共用同一RBL,因此RBL上的電流為連接至其的胞(110)的電流的總和。因此,每一RBL上的訊號表示輸入(RWL)與各別儲存權重的二進位乘積的 總和。 Furthermore, since cells (110) in the same row share the same RBL, the current on the RBL is the sum of the currents of the cells (110) connected to it. Thus, the signal on each RBL represents the binary product of the input (RWL) and the respective stored weight. sum.
參考圖2A,在使用多位元(在此實例中四位元)權重以供乘法累加操作的系統中,每一輸入(RWL)提供至多個(例如,四個)胞(110),每一胞儲存多位元權重的一個位元。每一RBL連接至具有相同位置值(亦即,20、21、22、23等)的胞(110)的行。進一步參考圖3A,每一RBL連接至一對電容器的組合-用於RBL[j]的計算電容器Cm[j]及補償電容器Cn[j],每一對彼此並聯連接且在獲取每一RBL上的訊號時連接至各別RBL。所有RBL的並聯組合的總電容相同,在此實例中為9*Cu,其中Cu為單位電容。出於下文所解釋的原因,RBL[0]、RBL[1]、RBL[2]以及RBL[3]的計算電容器分別具有1、2、4以及8倍Cu的電容,且RBL[0]、RBL[1]、RBL[2]以及RBL[3]的補償電容器分別具有8、7、5以及1倍Cu的電容,使得並聯組合的總電容為9*Cu。 Referring to Figure 2A, in a system that uses multi-bit (four-bit in this example) weights for multiply-accumulate operations, each input (RWL) is provided to multiple (eg, four) cells (110), each The cell stores one bit of the multi-bit weight. RBL is connected to each of the values having the same position (i.e., 20, 21, 22, 23, etc.) cell line (110). With further reference to Figure 3A, RBL is connected to each of a pair of capacitors in combination - for RBL [J] is calculated capacitor C m [j] and the compensation capacitor C n [j], each connected in parallel with each other and at each acquisition The signal on the RBL is connected to the respective RBL. The total capacitance of the parallel combination of all RBLs is the same, in this example 9*C u , where C u is the unit capacitance. For reasons explained below, the computational capacitors for RBL[0], RBL[1], RBL[2], and RBL[3] have capacitances of 1, 2, 4, and 8 times Cu , respectively, and RBL[0] The compensation capacitors for , RBL[1], RBL[2], and RBL[3] have capacitances of 8, 7, 5, and 1 times Cu , respectively, so that the total capacitance combined in parallel is 9* Cu .
給出每一RBL上的電容(此實例中為9*Cu),由於流向對應行中的胞(110)的電流而引起的每一節點N0、節點N1、節點N2或節點N3(圖3A)處的電壓降(假定電容器已預充電)與輸入(RWL)及行的各別儲存的二進位權重的二進位乘積的總和成比例。且由於每一RBL的總電容相同,因此每一RBL的比例常數相同。同時,由於計算電容器Cm中的每一者的電容與各別RBL的位置值成比例,因此來自每一計算電容器Cm的電荷損耗亦與RBL的位置值成比例。 Given the capacitance on each RBL (9*C u in this example), each node N0, node N1, node N2, or node N3 due to the current flowing to the cell (110) in the corresponding row (FIG. 3A ) (assuming the capacitor is precharged) is proportional to the sum of the binary products of the input (RWL) and the respective stored binary weights of the row. And since the total capacitance of each RBL is the same, the proportionality constant of each RBL is the same. At the same time, since the capacitance of each of the calculation capacitors Cm is proportional to the position value of the respective RBL, the charge loss from each calculation capacitor Cm is also proportional to the position value of the RBL.
接著,進一步參考圖3B,將計算電容器Cm自補償電容器Cn斷開且彼此並聯連接,亦即,將節點N0、節點N1、節點N2以及節點N3連接在一起。儲存於計算電容器上的電荷由此共用, 且N0、N1、N2以及N3處的電壓穩定至位準V=Q總/C總,其中Q總為所有計算電容器上的電荷的總和,且C總為並聯組合的總電容,亦即,在此實例中為15*Cu。由於計算電容器具有與各別RBL的位置值成比例的電容,因此Q總及由此V亦以及由於放電而引起的電壓降△V具有來自每一RBL的與RBL的位置值成比例的貢獻。亦即△V=Σ j 2 J I j ,其中I j 為第j個RBL的電流,且與輸入(RWL)及第j個RBL的各別儲存權重的二進位乘積的總和成比例。因此△V與輸入及儲存於胞(110)中的各別多位元權重之間的乘積的總和成比例。 Next, with further reference to FIG. 3B, the calculation capacitor Cm is disconnected from the compensation capacitor Cn and connected in parallel with each other, ie, node N0, node Nl, node N2, and node N3 are connected together. The charge stored on the capacitor thus calculated common, and N0, N1, N2 and N3 at a stable voltage level to the total V = Q / C total, where Q is the total sum of the charges on all capacitors calculated, and the total C is the total capacitance combined in parallel, ie, 15*C u in this example. Since the calculation capacitor has a capacitance proportional to the position value of the respective RBL, the total Q and thus V and thus the voltage drop ΔV due to discharge have a contribution from each RBL proportional to the position value of the RBL. I.e., △ V = Σ j 2 J I j, where I j is the j-th current RBL, and the second product of the j-th weight of the respective stored weights RBL binary input (RWL) is proportional to the sum. ΔV is thus proportional to the sum of the products between the input and the respective multi-bit weights stored in cell (110).
最後,額外參考圖4,節點N0、節點N1、節點N2以及節點N3處的電壓藉由類比/數位轉換器(ADC)轉換成數位輸出,以獲得對應於輸入與儲存於胞(110)中的各別多位元權重之間的乘積的總和的數位輸出。 Finally, with additional reference to FIG. 4, the voltages at node N0, node N1, node N2, and node N3 are converted into digital outputs by analog/digital converters (ADCs) to obtain voltages corresponding to the input and stored in cell (110). Digital output of the sum of the products between the individual multi-bit weights.
為更詳細地解釋上文所概述的系統及其操作,在一些實施例中,記憶體內計算系統包含記憶陣列(100)以及其他組件,所述記憶陣列包含記憶胞(110)的列及行(其可為實體或邏輯的列及行),所述其他組件諸如用於將數位輸入轉換成計數器脈衝序列(例如,使用二進位計數器)的數位輸入介面(圖1中未繪示)及用於累加加權輸入且輸出累加加權輸入的數位表示的輸出介面(圖1中未繪示),如下文所解釋。 To explain the system and its operation outlined above in more detail, in some embodiments, an in-memory computing system includes a memory array (100) that includes columns and rows (110) of memory cells, and other components. It can be physical or logical columns and rows), other components such as a digital input interface (not shown in An output interface (not shown in FIG. 1 ) that accumulates weighted inputs and outputs a digital representation of the accumulated weighted inputs, as explained below.
在此實例中,每一記憶胞(110)包含6T記憶胞(120)及讀取埠(150)。6T記憶胞(120)包含:第一反相器(126),所述第一反相器由在較高參考電壓(諸如VDD)與較低參考電壓 (諸如接地)之間串聯連接(亦即,用串聯的源極-汲極電流路徑串聯聯接)的p型金屬氧化物半導體(metal-oxide-semiconductor;MOS)場效電晶體(p-type metal-oxide-semiconductor field-effect transistor;PMOS)(122)及n型MOS場效電晶體(n-type MOS field-effect transistor;NMOS)(124)製成;第二反相器(136),所述第二反相器由在較高參考電壓(諸如VDD)與較低參考電壓(諸如接地)之間串聯連接的PMOS(132)及NMOS(134)製成;以及兩個寫入存取電晶體(142、144),所述寫入存取電晶體在此實例中為NMOS。反相器(126、136)反向耦接,亦即,其中一者的輸出端(Q、QB)(亦即,源極/汲極電流路徑之間的接面)耦接至另一者的輸入端(亦即,閘極)(QB、Q);寫入存取電晶體(142、144)各自具有其連接於反向耦接反相器(126、136)的各別接面與各別寫入位元線(WBL(170)、WBLB(180))之間的源極/汲極電流路徑及其連接至寫入字元線(WWL)(160)的閘極。 In this example, each memory cell (110) includes a 6T memory cell (120) and a read port (150). The 6T memory cell (120) includes: a first inverter (126) which is connected between a higher reference voltage (such as VDD) and a lower reference voltage p-type metal-oxide-semiconductor (MOS) field effect transistors (ie, connected in series with series source-drain current paths) between (such as ground) -oxide-semiconductor field-effect transistor; PMOS) (122) and n-type MOS field-effect transistor (NMOS) (124) are made; the second inverter (136), so The second inverter is made of PMOS (132) and NMOS (134) connected in series between a higher reference voltage (such as VDD) and a lower reference voltage (such as ground); and two write access circuits. Crystals (142, 144), the write access transistors are NMOS in this example. The inverters (126, 136) are inversely coupled, ie, the output (Q, QB) of one (ie, the junction between the source/drain current paths) is coupled to the other Inputs (ie, gates) (QB, Q) of ; write access transistors (142, 144) each have their respective junctions connected to back-coupled inverters (126, 136) and The source/drain current paths between the respective write bit lines (WBL ( 170 ), WBLB ( 180 )) and their gates connected to the write word line (WWL) ( 160 ).
在此實例中,每一讀取埠(150)包含彼此串聯連接且連接於較低參考電壓與資料輸出線(有時被稱作讀取位元線(RBL))之間的讀取電晶體(152)及讀取存取電晶體(154)。在此實例中,讀取電晶體(152)為NMOS,且其閘極連接至6T記憶胞(120)的反向輸出端(QB);在此實例中,讀取存取電晶體(154)為NMOS,且其閘極連接至讀取字元線(RWL)。可使用其他類型的電晶體及連接。舉例而言,PMOS可用於讀取電晶體(152)及讀取存取電晶體(154)中的兩者或任一者;讀取電晶體(152)的閘極可連接至6T記憶胞(120)的非反向輸出端(Q)。 In this example, each read port (150) includes read transistors connected in series with each other and between a lower reference voltage and a data output line (sometimes referred to as a read bit line (RBL)) (152) and read access transistor (154). In this example, the read transistor (152) is an NMOS and its gate is connected to the inverting output (QB) of the 6T memory cell (120); in this example, the read access transistor (154) It is an NMOS and its gate is connected to the read word line (RWL). Other types of transistors and connections can be used. For example, a PMOS can be used for both or either of a read transistor (152) and a read access transistor (154); the gate of the read transistor (152) can be connected to a 6T memory cell ( 120) of the non-inverting output (Q).
在操作中,向記憶胞(110)寫入位元,將資料位元(1 或0)(例如,對應於1或0的電壓)施加至WBL且將其反向值施加至WBLB。將寫入訊號(例如,1)施加至寫入存取電晶體(142、144)以使電晶體導通,由此將資料位元儲存在6T記憶胞(120)的輸出端(Q)處且將資料位元的反向值儲存在(QB)處。寫入存取電晶體(142、144)可在其後斷開,且維持Q處的值及QB處的反向值。為讀取所儲存的資料位元,寫入存取電晶體(142、144)斷開(WL=0),且讀取存取電晶體(154)藉由施加至RWL的讀取訊號接通(導通)。對應於QB(或Q)處的電壓的胞電流(I 胞)(其反過來表示6T記憶胞(120)中的儲存值(1或0))由此在RBL中產生且藉由輸出介面(圖1中未繪示)中的電路系統感測。 In operation, bits are written to memory cells (110), data bits (1 or 0) (eg, voltages corresponding to 1 or 0) are applied to WBL and their inverse value is applied to WBLB. A write signal (eg, 1) is applied to the write access transistors (142, 144) to turn the transistors on, thereby storing the data bit at the output (Q) of the 6T memory cell (120) and Store the reverse value of the data bit at (QB). The write access transistors (142, 144) may then be turned off and maintain the value at Q and the inverted value at QB. To read the stored data bits, write access transistors (142, 144) are turned off (WL=0) and read access transistor (154) is turned on by the read signal applied to RWL (on). The cell current (I cell ) corresponding to the voltage at QB (or Q) (which in turn represents the stored value (1 or 0) in the 6T memory cell (120)) is thus generated in the RBL and via the output interface ( circuit system sensing in FIG. 1 ).
由於RBL藉由每一讀取埠(150)中的讀取電晶體(152)與反相器(126、136)的輸出端Q或輸出端QB隔離(亦即,RBL上或中的電壓及電流在Q或QB處實質上沒有影響),且/或由於寫入存取電晶體(142、144)斷開(WL=0),因此多個RWL可同時激活(亦即,使多個讀取存取電晶體(154)接通)而不擾亂Q或QB處的電壓。 Since the RBL is isolated from the output Q or output QB of the inverters (126, 136) by the read transistor (152) in each read port (150) (ie, the voltage on or in the RBL and current has substantially no effect at Q or QB), and/or since the write access transistors (142, 144) are off (WL=0), multiple RWLs can be active simultaneously (ie, enable multiple reads The access transistor (154) is turned on) without disturbing the voltage at Q or QB.
根據一些實施例,如圖2A中所繪示,CIM系統(200)包含諸如上文所描述的記憶陣列(100)的記憶陣列,所述記憶陣列包含8T記憶胞(110)。在此實例中,記憶陣列(100)為64×64個8T記憶胞(110)的陣列,亦即以64列(列i,i=0至63)乘64行(行j,j=0至63)配置的記憶胞(110),但可使用包含各種大小的二維及三維陣列的任何其他陣列。每一8T記憶胞(110)可具有上文參考圖1的所描述且進一步在圖2B中所示出的結構。 每一8T記憶胞(110)可具有任何適合的實體結構。舉例而言,每一8T記憶胞(110)的電晶體可為場效應電晶體(FET)。在一個實例中,如圖2C中所示,FET可以所謂的FinFET結構形成,其中經摻雜半導體形成脊或「鰭」,所述脊或「鰭」充當FET的主動區,且沿所述脊或「鰭」可形成源極區及汲極區。諸如經摻雜多晶矽(「多晶」)的導電材料圍繞鰭的頂部部分且充當閘極。舉例而言,如圖2C中所繪示,記憶胞(110)的電晶體可在FinFET結構(250)中沿p摻雜鰭(252)(針對PMOS)及n摻雜鰭(254)(針對NMOS)形成,其中多晶閘極(256)橫跨鰭(252、254)形成。 According to some embodiments, as depicted in Figure 2A, the CIM system (200) includes a memory array such as the memory array (100) described above, the memory array including 8T memory cells (110). In this example, the memory array ( 100 ) is an array of 64×64 8T memory cells ( 110 ), that is, 64 columns (columns i , i = 0 to 63 ) by 64 rows (row j , j = 0 to 63 ) 63) configured memory cells (110), but any other array including two- and three-dimensional arrays of various sizes may be used. Each 8T memory cell (110) may have the structure described above with reference to Figure 1 and further shown in Figure 2B. Each 8T memory cell (110) can have any suitable physical structure. For example, the transistor of each 8T memory cell (110) may be a field effect transistor (FET). In one example, as shown in FIG. 2C, a FET can be formed in a so-called FinFET structure in which a doped semiconductor forms a ridge or "fin" that acts as the active region of the FET and along which Or "fins" may form source and drain regions. A conductive material such as doped polysilicon ("poly") surrounds the top portion of the fin and acts as a gate. For example, as depicted in Figure 2C, the transistors of memory cell (110) may be along p-doped fins (252) (for PMOS) and n-doped fins (254) (for PMOS) in FinFET structure (250). NMOS) is formed with a poly gate (256) formed across the fins (252, 254).
在一些實施例中,陣列(100)中的記憶胞(110)具有相同構造。在其他實施例中,陣列(100)中的記憶胞(110)可彼此不同。舉例而言,各別讀取埠(150)中的電晶體(152、154)之間的大小比率可在記憶胞間不同,使得由同一RWL訊號產生的電流不同。 In some embodiments, the memory cells (110) in the array (100) are of the same configuration. In other embodiments, the memory cells (110) in the array (100) may be different from each other. For example, the size ratios between the transistors (152, 154) in the respective read ports (150) may vary from memory cell to cell, such that the currents generated by the same RWL signal vary.
在此實例中,CIM系統(200)更包含輸入介面(210),所述輸入介面在此實例中包含數位計數器(212)陣列及對應的驅動器(214)陣列。在此實例中,存在64個4位元計數器(212),64×64記憶胞陣列(記憶胞陣列100)中的每一列各有一個4位元計數器;每一計數器(212)每計數循環輸出對應於計數器輸入處的數目(在此情況下,4位元二進位數)的數個脈衝。舉例而言,0(00002)的輸入產生0個脈衝,310(00112)的輸入產生3個RWL脈衝,1510(11112)的輸入產生15個RWL脈衝等。對應於每一計數器(212)的驅動器(214)根據來自計數器的輸出脈衝來驅 動對應的RBL(190[j](j=0至63))。由此將RWL脈衝的序列施加至對應RWL(156[i](i=0至63)),每計數循環的所述RWL脈衝的數目指示各別計數器(212)的輸入處的數位數目。 In this example, the CIM system (200) further includes an input interface (210), which in this example includes an array of digital counters (212) and a corresponding array of drivers (214). In this example, there are 64 4-bit counters (212), one for each column in a 64x64 memory cell array (memory cell array 100); each counter (212) loops output per count A number of pulses corresponding to the number (in this case, a 4-bit binary number) at the counter input. For example, an input of 0 (0000 2 ) produces 0 pulses, an input of 3 10 (0011 2 ) produces 3 RWL pulses, an input of 15 10 (1111 2 ) produces 15 RWL pulses, etc. The driver (214) corresponding to each counter (212) drives the corresponding RBL (190[ j ]( j =0 to 63)) according to the output pulse from the counter. A sequence of RWL pulses is thus applied to the corresponding RWL (156[ i ]( i =0 to 63)), the number of said RWL pulses per count cycle indicating the number of digits at the input of the respective counter (212).
在一些實施例中,CIM系統(200)更包含連接至記憶陣列(100)的用於與記憶陣列相關聯的習知讀取及寫入操作的讀取/寫入(RW)介面(216)。 In some embodiments, the CIM system (200) further includes a read/write (RW) interface (216) connected to the memory array (100) for conventional read and write operations associated with the memory array .
在一些實施例中,CIM系統(200)亦包含輸出介面(220),所述輸出介面在一些實例中包含連接至記憶陣列(100)的補償模組(222)及連接至補償模組(222)的計算模組(224)。如下文更詳細地描述,補償模組用以與計算模組(224)一起形成均一環境,亦即,用於RBL的預充電及取樣的相同總電容;計算模組(224)用以計算指示RBL或其某些組合上的訊號值的量。舉例而言,如下文更詳細地描述,在一些實施例中,計算模組(224)用以計算若干RBL的加權總和或加權平均值。在一些實施例中,此經由電容器之間的電荷共用來完成,所述電容器根據呈二進位數的各別RBL的相對有效位置(最高有效位元(MSB)至最低有效位元(LSB))設定大小。舉例而言,在一些實施例中,電容器的相對大小為自MSB至LSB的23:22:21:20。因此,所得訊號的振幅對應於由多位元權重加權的多位元輸入訊號的總和。 In some embodiments, the CIM system (200) also includes an output interface (220), which in some instances includes a compensation module (222) connected to the memory array (100) and a compensation module (222) connected to the memory array (100). ) computing module (224). As described in more detail below, the compensation module is used to form a uniform environment with the calculation module (224), that is, the same total capacitance used for precharging and sampling of the RBL; the calculation module (224) is used to calculate the indication The amount of signal value on the RBL or some combination thereof. For example, as described in more detail below, in some embodiments, a calculation module (224) is used to calculate a weighted sum or weighted average of several RBLs. In some embodiments, this is done via charge sharing between capacitors according to the relative significant position (most significant bit (MSB) to least significant bit (LSB)) of the respective RBL in binary digits Set size. For example, in some embodiments, the relative size of the capacitor is from the MSB to the LSB 23: 22: 21: 20. Thus, the amplitude of the resulting signal corresponds to the sum of the multi-bit input signals weighted by the multi-bit weights.
在一些實施例中,如在圖2A中及更詳細地在圖3A及圖3B中針對CIM系統(200)的四位元寬分段(230)所繪示,補償模組(222)包含補償電容器Cn[j](針對每一四位元寬分段(230),j=0至3)集合,每一電容器與各別RBL(190[j](j=0至3))相關聯。如下文更詳細地描述,在一些實施例中,補償電容器在 計算製程的某些階段中與計算電容器(在下文描述)逐對並聯連接。設定補償電容器的大小,使得呈現給RBL的總電容相同,使得任何給出的RBL中的相同電流將在RBL處上引起相同電壓。諸如任何切換電晶體的一對開關元件(S0A及S0B)與每一補償電容器Cn[j]相關聯,其中S0A將補償電容器Cn[j]連接至各別RBL(190[j]),且S0B將RBL(190[j])連接至計算模組(224)。每一補償電容器Cn[j]在一端處經由S0A連接至各別RBL(190[j]),且在另一端處連接至電壓參考,諸如接地。 In some embodiments, the compensation module ( 222 ) includes compensation as depicted in FIG. 2A and in more detail in FIGS. 3A and 3B for the four-bit wide segment ( 230 ) of the CIM system ( 200 ). A set of capacitors Cn [ j ] (j =0 to 3 for each four-bit wide segment (230)), each associated with a respective RBL(190[ j ]( j =0 to 3)) . As described in more detail below, in some embodiments, compensation capacitors are connected pair-wise in parallel with computational capacitors (described below) during certain stages of the computational process. The compensation capacitors are sized so that the total capacitance presented to the RBL is the same so that the same current in any given RBL will cause the same voltage at the RBL. A pair of switching elements (S0A and S0B), such as any switching transistors, are associated with each compensation capacitor Cn [ j ], where S0A connects the compensation capacitors Cn [ j ] to the respective RBL(190[ j ]), And the S0B connects the RBL (190[ j ]) to the computing module (224). Each compensation capacitor Cn [ j ] is connected via SOA to a respective RBL(190[ j ]) at one end and to a voltage reference, such as ground, at the other end.
計算模組(224)包含用於對每一RBL上的電流進行積分的積分器集合。在一些實施例中,積分器包含計算電容器Cm[j](針對每一四位元寬分段(230),j=0至3),每一計算電容器Cm[j]與各別RBL(190[j](j=0至3)及對應補償電容器Cn[j]相關聯。在一些實施例中,計算電容器與補償電容器組合使用以將電容提供至各別RBL,以建立指示RBL的加權輸入的總和的電壓。如上文所簡要解釋,在一些實施例中,計算電容器與各別補償電容器配對以在計算製程的某些步驟期間向每一RBL呈現相同電容。如下文進一步解釋,在一些實施例中,計算電容器相對於彼此設定大小以將顯著性賦予至各別RBL。諸如任何切換電晶體的一對開關元件(SH及S1)與每一計算電容器Cm[j]相關聯,其中SH經由S0B將計算電容器Cm[j]連接至各別RBL(190[j]),且S1經由SH將計算電容器Cm[j]連接至類比輸出端(228)。每一計算電容器Cm[j]在一端處經由各別SH及S1連接至類比輸出端(228),且在另一端處連接至電壓參考,諸如接地。 The calculation module (224) includes a set of integrators for integrating the current on each RBL. In some embodiments, the integrator includes computational capacitors Cm [ j ] (j = 0 to 3 for each four-bit wide segment (230)), each computational capacitor Cm [ j ] and a respective RBL (190[ j ] ( j =0 to 3) are associated with corresponding compensation capacitors Cn [ j ]. In some embodiments, the calculation capacitors are used in combination with the compensation capacitors to provide capacitances to the respective RBLs to establish the indicative RBLs The voltage of the sum of the weighted inputs of . As briefly explained above, in some embodiments, computing capacitors are paired with respective compensation capacitors to present the same capacitance to each RBL during certain steps of the computing process. As explained further below, In some embodiments, the compute capacitors are sized relative to each other to impart significance to the respective RBL. A pair of switching elements (SH and S1), such as any switching transistors, are associated with each compute capacitor Cm [ j ] , where SH connects computational capacitors Cm [ j ] to respective RBLs (190[ j ]) via S0B, and S1 connects computational capacitors Cm [ j ] via SH to analog outputs (228). Each computational capacitor Cm [ j ] is connected at one end to the analog output (228) via respective SH and S1, and at the other end to a voltage reference, such as ground.
在一些實施例中,如圖3A及圖3B中所繪示,每一列i 中的記憶胞(110)的子集(例如,列[62]的240或列[i]的260[i](i=0至63),每一子集對應於RWL[i])可儲存(例如,藉由自WWL寫入至記憶胞(110)多位元權重。舉例而言,在使用四位元權重的實施例中,四個記憶胞(110)的每一子集(260[i])可儲存四位元權重W i =(W i[3] W i[2] W i[1] W i[0])2,其中W i[j]表示藉由第i個RWL(156[i])寫入至記憶胞[110]的二進位數位,所述二進位數位藉由第j個RBL(190[j])讀取。舉例而言,針對列[0](260[0])中的W 0=01012(=510)的權重,儲存於子集(260[0])中的位元為W 0[3]=0、W 0[2]=1、W 0[1]=0以及W 0[0]=1。類似地,針對列[1](260[1])中的W 1=10112(=1110)的權重,儲存於子集(260[1])中的位元為W 1[3]=1、W 1[2]=0、W 1[1]=1以及W 1[0]=1。 In some embodiments, as depicted in Figures 3A and 3B, a subset of memory cells (110) in each column i (eg, 240 of column[62] or 260[ i ] of column[i]) ( i = 0 to 63), each subset corresponding to RWL[ i ]) can store (eg, by writing from WWL to memory cell (110) multi-bit weights. For example, when using four-bit weights embodiment, four memory cells (110) of each subset (260 [i]) can store four yuan weights W i = (W i [3 ] W i [2] W i [1] W i [0]) 2, where W is i [j] represents a writing to the memory cell [110] the bit binary number by the i-th RWL (156 [i]), the binary number by the j-th bit of RBL ( 190[ j ]) reads. For example, weights for W 0 =0101 2 (=5 10 ) in column[0] (260[0]), stored in subset (260[0]) The bits are W 0[3] = 0, W 0[2] = 1, W 0[1] = 0, and W 0[0] = 1. Similarly, for column [1] (260[1]) in The weight of W 1 =1011 2 (=11 10 ), the bits stored in the subset (260[1]) are W 1[3] =1, W 1[2] =0, W 1[1] =1 and W 1[0] =1.
由此,圖2中所繪示的實例CIM系統(200)中的記憶胞(110)的陣列(100)可儲存以64列及16行配置的1024(64×16)個四位元權重。 Thus, the array ( 100 ) of memory cells ( 110 ) in the example CIM system ( 200 ) depicted in FIG. 2 can store 1024 (64×16) nibble weights arranged in 64 columns and 16 rows.
在一些實施例中,計算電容器Cm[j]的電容根據其在計算模組中的各別相對位置(亦即,索引j)選擇。舉例而言,在圖3A及圖3B中所繪示的實施例中,對應於RBL[j]的第j個計算電容器Cm[j]的電容為2 j Cu,其中Cu為單位電容,所述單位電容可具有適合於特定應用的任何值。由此,計算電容器Cm[0]的電容為1*Cu,計算電容器Cm[1]的電容為2*Cu,計算電容器Cm[2]的電容為4*Cu,以及計算電容器Cm[3]的電容為8*Cu。 In some embodiments, the capacitance of the computing capacitor Cm [ j ] is selected according to its respective relative position (ie, index j ) in the computing module. For example, in the embodiment shown in FIGS. 3A and 3B , the capacitance of the j- th calculation capacitor C m [ j ] corresponding to RBL[ j ] is 2 j C u , where C u is the unit capacitance , the unit capacitance can have any value suitable for a particular application. From this, the capacitance of capacitor Cm [0] is calculated as 1*Cu , the capacitance of capacitor Cm [1] is calculated as 2*Cu , the capacitance of capacitor Cm [2] is calculated as 4*Cu , and the calculation The capacitance of the capacitor C m [3] is 8*C u .
在一些實施例中,補償電容器Cn[j]的電容根據其在計算模組中的各別相對位置(亦即,索引j)選擇。在一些實施例中,選擇補償電容器Cn[j]的電容使得Cn[j]+Cm[j]=恆定,亦即,當每一 對補償電容器及計算電容器並聯連接時,RBL呈現有相同的電容(恆定電容),且Cn[j]為固定總電容與各別計算電容Cm[j]之間的差。舉例而言,在圖3A及圖3B中所繪示的將呈現給每一RBL的總電容選擇為9*Cu的實施例中,對應於RBL[j]的第j個補償電容器Cn[j]的電容為9*Cu-Cm[j]。由此,補償電容器Cn[0]的電容為8*Cu,補償電容器Cn[1]的電容為7*Cu,補償電容器Cn[2]的電容為5*Cu,以及補償電容器Cn[3]的電容為1*Cu。 In some embodiments, the capacitance of the compensation capacitor Cn [ j ] is selected according to its respective relative position (ie, index j ) in the calculation module. In some embodiments, the capacitance of the compensation capacitor Cn [ j ] is chosen such that Cn [ j ]+ Cm [ j ]=constant, that is, when each pair of compensation and calculation capacitors are connected in parallel, the RBL exhibits The same capacitance (constant capacitance), and Cn [ j ] is the difference between the fixed total capacitance and the individually calculated capacitance Cm [ j]. For example, in the embodiment depicted in Figures 3A and 3B in which the total capacitance presented to each RBL is chosen to be 9* Cu , the jth compensation capacitor Cn [ corresponding to RBL[ j] The capacitance of j ] is 9*C u - C m [ j ]. Thus, the capacitance of compensation capacitor Cn [0] is 8* Cu , the capacitance of compensation capacitor Cn [1] is 7* Cu , the capacitance of compensation capacitor Cn [2] is 5* Cu , and the compensation The capacitance of the capacitor C n [3] is 1*C u .
在一些實施例中,輸出介面(220)更包含用於每一RBL(190)的感測放大器(226),以用於增強來自RBL(190)的類比訊號。在一些實施例中,輸出介面(220)更包含用於與儲存於記憶胞(110)的各別子集(260)的一行多位元權重相關聯的RBL的每一子集的類比/數位轉換器(ADC)(270)。在一些實施例中,如圖4中所繪示,可使用快閃式ADC,每一快閃式ADC具有數個電壓比較器(272[l],l=0至2 n-1,其中n為多位元權重中的二進位數位的數目),其中比較器的2 j 用於第j個RBL(190[j])。舉例而言,在使用四位元權重的應用中,如圖4中所繪示,可使用15比較器快閃式ADC(272[l],l=0至14)。將來自RBL[0]的訊號連接至一個比較器SA7(在此實例中為272[7]);在此實例中,將來自RBL[1]的訊號連接至兩個比較器SA6及比較器SA8(272[6]及272[8]);在此實例中,將來自RBL[2]的訊號連接至四個比較器SA5、比較器SA9、比較器SA4以及比較器SA10(272[5]、272[9]、272[4]以及272[10]);以及在此實例中,將來自RBL[3]的訊號連接至八個比較器SA3、比較器SA11、比較器SA2、比較器SA12、比較器SA1、比較器SA13、比較器SA0以及比較器SA14(272[3]、
272[11]、272[2]、272[12]、272[1]、272[13]、272[0]以及272[14])。
In some embodiments, the output interface (220) further includes a sense amplifier (226) for each RBL (190) for enhancing the analog signal from the RBL (190). In some embodiments, the output interface (220) further includes analogs/bits for each subset of RBLs associated with a row of multi-bit weights stored in the respective subsets (260) of the memory cells (110) Converter (ADC) (270). In some embodiments, as shown in FIG. 4, flash ADCs may be used, each with several voltage comparators (272[ l ], l =0 to 2n -1 , where n weight of a multi-bit binary number is the number of bits), wherein the
在一些實施例中,比較器(272)各自包含輸入電容器,且彼等輸入電容器可用作計算電容器Cm。舉例而言,在圖4中所繪示的實施例中,假定每一比較器(272)的輸入電容器具有單位電容Cu,隨後比較器SA7的輸入電容器可用作計算電容器Cm[0];SA6及SA8的輸入電容器可用(例如,並聯連接)作Cm[1];SA5、SA9、SA4以及SA10的輸入電容器可用(例如,並聯連接)作Cm[2];以及SA3、SA11、SA2、SA12、SA1、SA13、SA0以及SA14的輸入電容器可用(例如,並聯連接)作Cm[3]。如上所述,連接至每一RBL的總電容由此與對應於RBL的位置值成比例。分佈圖案(亦即,與每一RBL(190[j])的連接的子集遠離彼此定位的分佈圖案)(諸如圖4中所繪示的將每一RBL(190)連接至比較器(272)的分佈圖案)最小化用作單位電容器(Cu)的輸入電容器的電壓相依性。 In some embodiments, the comparators ( 272 ) each include input capacitors, and those input capacitors can be used as a calculation capacitor Cm . For example, in the embodiment depicted in FIG. 4, assuming that the input capacitor of each comparator (272) has a unit capacitance Cu , the input capacitor of comparator SA7 can then be used as the calculation capacitor Cm [0] ; the input capacitors of SA6 and SA8 can be used (eg, connected in parallel) as Cm [1]; the input capacitors of SA5, SA9, SA4 and SA10 can be used (eg, connected in parallel) as Cm [2]; and SA3, SA11, The input capacitors of SA2, SA12, SA1, SA13, SA0, and SA14 can be used (eg, connected in parallel) as Cm [3]. As mentioned above, the total capacitance connected to each RBL is thus proportional to the position value corresponding to the RBL. A distribution pattern (that is, a distribution pattern with a subset of connections to each RBL (190[j ]) located away from each other) (such as that depicted in FIG. 4) connects each RBL (190) to a comparator (272 ) distribution pattern) minimizes the voltage dependence of the input capacitor used as the unit capacitor (C u ).
在一些實施例中,諸如加權總和的記憶體內計算的記憶體內計算可使用揭露於本發明實施例中的CIM系統執行。更具體言之,輸入的總和(例如,64個輸入)X i 可各自由多位元(例如,四位元)權重(W i ) k 加權,且加權輸入Xi(Wi) k 可一起進行求和以產生輸出S k ,所述輸出S k 為多位元權重的第k行的加權總和。亦即,S k =Σ i X i (W i ) k 。 In some embodiments, in-memory computations such as weighted sum computations may be performed using the CIM system disclosed in embodiments of the present invention. More specifically, the sum of the inputs (eg, 64 inputs) X i may each be weighted by a multi-bit (eg, four-bit) weight ( W i ) k , and the weighted inputs Xi ( Wi ) k may be computed together. and to generate an output S k, S k is the weighted sum of the output of multibit heavy weight of the k-th row. That is, S k = Σ i X i (W i) k.
如上文所描述,在一些實施例中,至CIM系統(200)的數位輸入可由脈衝序列表示或可轉換為脈衝序列,其中每一RWL處的每計數循環的脈衝的數目指示輸入的振幅。此外,由於RBL與6T記憶胞(120)解耦,因此RWL可同時激活,RWL可同時 激活。進一步如上文所描述,根據一些實施例,如圖3A及圖3B中所繪示,諸如四位元權重W i =(W i[3] W i[2] W i[1] W i[0])2的多位元權重可儲存於記憶胞(110)的列(190[i])的子集(260[i])中。在一些實施例中,如圖5中所概述,涉及將多位元權重應用於各別輸入的計算(500)可按以下方式執行: 首先(510),將多位元權重(例如,W i =(W i[3] W i[2] W i[1] W i[0])2)集合儲存於記憶胞陣列中,每一記憶胞具有用以在節點處儲存訊號的記憶體單元(諸如6T SRAM胞)及具有讀取賦能輸入線(諸如RWL)及輸出線(諸如RBL)的讀取埠,讀取埠用以在讀取賦能輸入端處的激活訊號之後在輸出端處產生指示儲存於記憶體單元中的節點處的訊號的訊號,且將輸出線與節點隔離。 As described above, in some embodiments, the digital input to the CIM system ( 200 ) may be represented by or converted to a pulse sequence, where the number of pulses per count cycle at each RWL indicates the amplitude of the input. Furthermore, since the RBL is decoupled from the 6T memory cell (120), the RWL can be activated simultaneously, and the RWL can be activated simultaneously. Further, as described above, in accordance with some embodiments, as shown in FIG. 3A and 3B depicted, such as a four yuan weights W i = (W i [3 ] W i [2] W i [1] W i [0 ] ) 2 may be stored in subset (260[ i ]) of column (190[ i]) of memory cell (110). In some embodiments, as outlined in FIG. 5, the computation (500) involving applying multi-bit weights to respective inputs may be performed as follows: First (510), the multi-bit weights (eg, W i =( W i [3] W i [2] W i [1] W i [0] ) 2 ) The set is stored in an array of memory cells, each memory cell having a memory cell ( such as a 6T SRAM cell) and a read port with a read enable input line (such as RWL) and an output line (such as RBL) for the read enable signal at the output after the read enable signal Signals indicative of signals stored at nodes in the memory cells are generated and output lines are isolated from the nodes.
隨後(520),將脈衝訊號集合各自施加至儲存各別多位元權重的記憶胞集合的讀取賦能輸入端,以在各別記憶胞的輸出線處產生訊號集合,每一脈衝訊號指示各別輸入數目,輸出訊號集合指示藉由所儲存的多位元權重對脈衝訊號進行的操作(例如,乘積)。 Then (520), each set of pulse signals is applied to the read enable input of the set of memory cells storing the respective multi-bit weights to generate a set of signals at the output lines of the respective memory cells, each pulse signal indicating The respective input number, output signal set indicates the operation (eg, product) performed on the pulse signal by the stored multi-bit weights.
隨後(530),來自共用每一輸出線的記憶胞的讀取輸出線的組合讀取埠輸出(例如,組合電流)經量測(例如,藉由RBL取樣,在下文詳細地描述)且給出對應於與輸出線相關聯的加權位元(例如,W i [j]的j)的顯著性(亦即,位置值)的顯著性因子。舉例而言,四位元權重的MSB具有位置值8(亦即,23);顯著性因子可為位置值自身或位置值的某一倍數。如上文所描述,藉由使用對應計算電容器的相對大小可以為每一RBL給出顯著性因 子。 Then (530), the combined read port output (eg, combined current) from the read output lines of the memory cells sharing each output line is measured (eg, by RBL sampling, described in detail below) and given to significant factor corresponding to the significant properties (i.e., position values) to an output line associated weighting bits (e.g., W i [j] of j) a. For example, the MSB of a four-bit weight has a position value of 8 (ie, 2 3 ); the significance factor may be the position value itself or some multiple of the position value. As described above, a significance factor can be given for each RBL by using the relative size of the corresponding calculation capacitors.
隨後(步驟540),將來自各別讀取輸出線的組合讀取埠輸出與各別顯著性因子成比例地組合(例如,藉由電荷共用,如下文詳細描述)以產生計算輸出訊號。 Then (step 540), the combined read port outputs from the respective read output lines are combined proportionally to the respective significance factors (eg, by charge sharing, as described in detail below) to generate a computed output signal.
隨後(550),將計算輸出訊號轉換為數位輸出(例如,藉由15比較器類比/數位轉換器(ADC))。 Then (550), the calculated output signal is converted to a digital output (eg, by a 15-comparator analog/digital converter (ADC)).
如圖3A及圖3B以及圖3C中所繪示,在一些實施例中,RBL上的類比訊號可經量測且用於如下計算輸入的加權總和: 首先,在預充電時段(310)期間,將預充電訊號PCH施加至每一RBL[j]及計算電容器Cm[j]與補償電容器Cn[j]的並聯組合,亦即,其中S0A、S0B以及SH導通(打開)並且S1未導通(斷開)。由於每一組合具有相同電容(亦即,9×Cu),因此將所有組合充電至相同的總電荷,且所有四個節點N3、節點N2、節點N1以及節點N0處的電壓上升至相同位準V PCH。接著,在RBL取樣時段(320)期間(參看圖3A及圖3C),PCH斷開,且將輸入脈衝序列施加至各別RWL,且針對其中儲存有「1」的每一記憶胞(110),各別RWL上的每一脈衝引起胞電流I 胞,所述胞電流I 胞由藉由I 胞的振幅及每一輸入脈衝的持續時間判定的固定量為RBL放電。共用相同RBL的所有記憶胞貢獻胞電流,且因此貢獻總放電,直至所述所有記憶胞儲存「1」秒的程度。每一節點N0、節點N1、節點N2以及節點N3處的電壓由此下降了由各別RBL的總放電判定且與具有多位元輸入的加權位元的逐位元乘積成比例的量。 As shown in Figures 3A and 3B and 3C, in some embodiments, the analog signal on the RBL may be measured and used to calculate the weighted sum of the inputs as follows: First, during the precharge period (310), Precharge signal PCH is applied to each RBL[ j ] and parallel combination of calculation capacitor Cm [ j ] and compensation capacitor Cn [ j ], i.e., with S0A, S0B, and SH conducting (open) and S1 not conducting (disconnect). Since each combination has the same capacitance (ie, 9×C u ), all combinations are charged to the same total charge, and the voltages at all four nodes N3, N2, N1, and N0 rise to the same bit quasi- V PCH . Then, during the RBL sampling period (320) (see Figures 3A and 3C), the PCH is turned off, and the input pulse train is applied to the respective RWL, and for each memory cell having a "1" stored therein (110) , each pulse causes the cell current I RWL respective cell, the cell current I cell RBL is discharged by a fixed amount by the duration and the amplitude I of each cell of the input pulse determined. All memory cells that share the same RBL contribute to the cellular current, and thus the total discharge, to the point where all of the memory cells store "1" second. The voltage at each node N0, node Nl, node N2, and node N3 thus drops by an amount proportional to the bit-by-bit product of weighted bits with multi-bit inputs, determined by the total discharge of the respective RBL.
隨後,在電荷共用時段(330)期間(參看圖3B及圖3C), S0A及S0B斷開,SH保持打開以及S1開啟。因此,將補償電容器Cn[j](j=0至3)切斷,且將計算電容器Cm[j](j=0至3)並聯連接於接地與輸出端(228)之間。節點N0、節點N1、節點N2以及節點N3亦連接在一起且連接至輸出端(228)。輸出端(228)處的電壓V out(亦即,跨計算電容器Cm[j]的電壓)由此為儲存於補償電容器Cm[j]中的總電荷除以補償電容器Cm[j]的電容的總和。 Then, during the charge sharing period (330) (see FIGS. 3B and 3C ), S0A and S0B are turned off, SH remains turned on, and S1 is turned on. Therefore, the compensation capacitor Cn [ j ] ( j =0 to 3) is switched off, and the calculation capacitor Cm [ j ] ( j =0 to 3) is connected in parallel between ground and the output (228). Node N0, Node N1, Node N2, and Node N3 are also connected together and connected to the output (228). Voltage V out (i.e., calculated across the capacitor C m [j] of the voltage) at the output (228) is thus stored in the compensation capacitor C m [j] divided by the total charge of the compensation capacitor C m [j] The sum of the capacitances.
由於Cm[j]的電容為2 j Cu/9Cu=第j個RBL上的總電容(9Cu)的2 j /9,因此每一計算電容器Cm[j]自預充電步驟吸收儲存於每一RBL上的總電荷的2 j /9。Cm[3]由此在預充電時段結束時具有的電荷為計算電容器Cm[0]具有的八倍、計算電容器Cm[2]具有的四倍以及計算電容器Cm[1]具有的兩倍。出於相同原因,在RBL取樣時段期間,計算電容器Cm[3]針對相同輸入及相同加權位元值損失的電荷為Cm[0]損失的八倍、Cm[2]損失的四倍以及Cm損失的[1]損失的兩倍。由此,在電荷共用時段結束時,Cm[3]對總電荷Σ j Q[j]的貢獻為Cm[0]的貢獻的八倍、Cm[2]的貢獻的四倍以及Cm[1]的貢獻的兩倍。電壓V out或電壓降△V=V PCH-V out由此表示二進位加權總和,其中為每一RBL指定與RBL在二進位權重中的有效位置成比例的權重,W i =(W i[3] W i[2] W i[1] W i[0])2,或更一般而言,W i =(...W i[j]...W i[2] W i[1] W i[0])2。舉例而言,在圖3A及圖3B中所繪示的實施例中,在類比輸出端(228)處及由此在電荷共用之後的節點N3、節點N2、節點N1以及節點N0處的電壓為RCL[3]的8/15(最高有效位元(MSB))、RCL[2]的4/15、RCL[1]的2/15
以及RCL[0]的1/15(最低有效位元(LSB))。
Since C m [j] a capacitance of 2 j C u / 9C u = total capacitance (9C u) on the j-
隨後,在ADC評估時段(340)期間(SAE訊號「打開」以使ADC(270)在上述電荷共用之後將RBL(190)上的(亦即,N3、N2、N1以及N0處的)電壓轉換為數位輸出訊號,在此實例中,對應於電壓的四數位二進位數(00002至11112)。涉及具有多位元輸入及多位元權重的乘法及累加的記憶體內計算由此完成。 Then, during the ADC evaluation period (340) (SAE signal "on" so that the ADC (270) converts the voltages on the RBL (190) (ie, at N3, N2, N1, and N0) after the above-described charge sharing are digital output signal, in this example, corresponds to a four digit binary number of voltage (0000 2-1111 2) relates to a multibit input and heavy weight multiply-and-accumulate multibit memory thereby completing the calculation of the body.
由於RBL與各別6T記憶胞解耦,因此多個RWL可同時激活以施加儲存於記憶胞(110)中的權重,而不擾亂任何記憶胞的儲存狀態。與必須一次一個地施加RWL的情況相比,計算速度由此提高。 Since the RBLs are decoupled from the respective 6T memory cells, multiple RWLs can be activated simultaneously to apply the weights stored in the memory cells (110) without disturbing the storage state of any of the memory cells. Computational speed is thereby increased compared to the case where the RWLs have to be applied one at a time.
由此,根據一些所揭露實施例,一種計算元件包含:記憶陣列,所述記憶陣列具有以記憶胞的列及行分組的記憶胞集合,記憶胞中的每一者具有用以儲存資料的記憶體單元及具有讀取賦能輸入端及輸出端的讀取埠;讀取賦能線,每一讀取賦能線連接至記憶胞的各別列的讀取埠的讀取賦能輸入端且用以將輸入訊號傳輸至所述讀取賦能輸入端;資料輸出線,每一資料輸出線連接至記憶胞的各別行的讀取埠的輸出端;輸出介面,具有計算模組,所述計算模組包含電容器集合,每一電容器可連接至資料輸出線中的各別一者且具有電容,電容器中的至少兩者具有彼此不同的電容,輸出介面經組態以准許電容器共用儲存於其上的電荷。 Thus, according to some disclosed embodiments, a computing element includes a memory array having a set of memory cells grouped in columns and rows of memory cells, each of the memory cells having a memory to store data A body unit and a read port with a read enable input and an output; read enable lines, each read enable line is connected to the read enable input of the read port of the respective row of the memory cells and used to transmit the input signal to the read enabling input end; data output lines, each data output line is connected to the output end of the read port of the respective row of the memory cell; the output interface has a computing module, so The computing module includes a set of capacitors, each capacitor can be connected to a respective one of the data output lines and has a capacitance, at least two of the capacitors have capacitances that are different from each other, and the output interface is configured to allow the capacitors to be stored in common in charge on it.
在相關實施例中,所述的計算元件更包括輸入介面,所述輸入介面連接至所述多個讀取賦能線且經組態以在所述多個讀取賦能線的至少子集中的每一者上產生多個脈衝。 In a related embodiment, the computing element further includes an input interface connected to the plurality of read enable lines and configured to be in at least a subset of the plurality of read enable lines Multiple pulses are generated on each of the .
在相關實施例中,所述輸入介面包括多個計數器,每一計數器具有用以接收數位輸入資料的二進位資料輸入且具有連接至所述多個讀取賦能線中的各別一者的輸出,所述計數器經組態以產生脈衝的數目,所述數目指示所述數位輸入資料的值。 In a related embodiment, the input interface includes a plurality of counters, each counter having a binary data input for receiving digital input data and having a data input connected to a respective one of the plurality of read enable lines Output, the counter is configured to generate a number of pulses indicative of the value of the digital input data.
在相關實施例中,所述輸出介面更包括補償模組,所述補償模組包括多個電容器,每一電容器可連接至所述多個資料輸出線中的各別一者且具有電容,所述輸出介面可組態以針對所述多個資料輸出線中的每一者將所述計算模組中的各別電容器連接至所述補償模組中的各別電容器以形成具有總電容的電容性組合,對於所述多個資料輸出線的至少子集,所述電容性組合的所述總電容相同。 In a related embodiment, the output interface further includes a compensation module, the compensation module includes a plurality of capacitors, and each capacitor can be connected to a respective one of the plurality of data output lines and has a capacitance, so The output interface is configurable to connect, for each of the plurality of data output lines, respective capacitors in the computing module to respective capacitors in the compensation module to form a capacitance having a total capacitance A capacitive combination, the total capacitance of the capacitive combination is the same for at least a subset of the plurality of data output lines.
在相關實施例中,所述的計算元件更包括連接至所述記憶陣列且用以自所述多個記憶胞讀取資料及將所述資料寫入至所述多個記憶胞的數位讀取/寫入(RW)介面。 In a related embodiment, the computing element further includes a digital read connected to the memory array for reading data from and writing data to the plurality of memory cells /Write (RW) interface.
在相關實施例中,所述多個記憶胞中的每一者為具有六電晶體靜態隨機存取記憶體(SRAM)記憶體單元的八電晶體SRAM胞,所述六電晶體SRAM記憶體單元具有彼此反向耦接的兩個反相器及兩個存取電晶體,每一者將所述兩個反相器之間的各別接面可切換地連接至各別資料線,待寫入至所述六電晶體SRAM記憶體單元的資料經由所述各別資料線傳輸,所述讀取埠具有第一電晶體及第二電晶體,每一電晶體具有控制電極及主要電流路徑,所述控制電極用以控制流過所述主要電流路徑的電流,所述主要電流路徑串聯連接於所述資料輸出線與電壓參考點之間,所述第一電晶體中的一者的所述控制電極連接至用於所述 記憶胞的所述讀取賦能線,且所述第一電晶體中的一者的所述控制電極連接至所述兩個反相器之間的接面。 In a related embodiment, each of the plurality of memory cells is an eight-transistor SRAM cell having a six-transistor static random access memory (SRAM) memory cell, the six-transistor SRAM memory cell has two inverters and two access transistors coupled oppositely to each other, each switchably connecting a respective junction between the two inverters to a respective data line, to be written Data into the six-transistor SRAM memory cell is transmitted via the respective data lines, the read port has a first transistor and a second transistor, each transistor having a control electrode and a primary current path, The control electrode is used to control the current flowing through the main current path, the main current path is connected in series between the data output line and a voltage reference point, the one of the first transistors The control electrode is connected to the The read enable line of a memory cell, and the control electrode of one of the first transistors is connected to a junction between the two inverters.
在相關實施例中,所述輸出介面更包括類比/數位轉換器(ADC),所述類比/數位轉換器具有多個類比輸入及用於所述多個類比輸入中的每一者的輸入電容器,其中所述計算模組中的所述多個電容器中的每一者至少部分地包括所述輸入電容器中的各別一者或所述輸入電容器的各別子集,所述輸入電容器中的所述各別一者或所述輸入電容器的所述各別子集中的每一者可連接至各別資料輸出線。 In a related embodiment, the output interface further includes an analog/digital converter (ADC) having a plurality of analog inputs and an input capacitor for each of the plurality of analog inputs , wherein each of the plurality of capacitors in the computing module at least partially comprises a respective one of the input capacitors or a respective subset of the input capacitors, a respective one of the input capacitors The respective one or each of the respective subsets of the input capacitors may be connected to respective data output lines.
在相關實施例中,所述ADC的多個輸入電容器以線性陣列配置,其中可連接至所述多個資料輸出線中的一者的所述多個輸入電容器的至少一個子集至少包含輸入電容器的第一子集及輸入電容器的第二子集,各子集可連接至所述多個資料輸出線中的各別一者,所述輸入電容器的所述第一子集中的至少兩個輸入電容器藉由所述第二子集中的至少一個輸入電容器分隔開。 In a related embodiment, the plurality of input capacitors of the ADC are configured in a linear array, wherein at least a subset of the plurality of input capacitors connectable to one of the plurality of data output lines includes at least an input capacitor a first subset of and a second subset of input capacitors, each subset connectable to a respective one of the plurality of data output lines, at least two inputs of the first subset of the input capacitors The capacitors are separated by at least one input capacitor in the second subset.
在相關實施例中,所述輸出介面經組態以:在第一時段期間,將每一資料輸出線連接至所述補償模組中的所述多個電容器中的一者與所述計算模組中的所述多個電容器中的對應一者的並聯組合;以及在所述第一時段之後的第二時段期間,將所述計算模組中的每一電容器自所述補償模組中的各別電容器及自各別資料輸出線斷開,且並聯連接所述計算模組中的所述多個電容器。 In a related embodiment, the output interface is configured to: connect each data output line to one of the plurality of capacitors in the compensation module and the computing module during a first period of time a parallel combination of a corresponding one of the plurality of capacitors in a group; and during a second period after the first period, dividing each capacitor in the calculation module from a capacitor in the compensation module Respective capacitors are disconnected from respective data output lines, and the plurality of capacitors in the computing module are connected in parallel.
在相關實施例中,所述的計算元件更包括輸入介面,所述輸入介面連接至所述多個讀取賦能線且經組態以在所述第一時段期間在所述多個讀取賦能線的至少子集中的每一者上產生多個 脈衝。 In a related embodiment, the computing element further includes an input interface connected to the plurality of read enable lines and configured to read the plurality of read enable lines during the first period of time Generate multiple on each of at least a subset of enable lines pulse.
在相關實施例中,所述多個記憶胞彼此相同,且所述計算模組中的所述多個電容器中的至少兩者具有彼此相差2 n 倍的電容,其中n為整數。 In a related embodiment, the plurality of memory cells are identical to each other, and at least two of the plurality of capacitors in the computing module have capacitances that differ from each other by a factor of 2 n , where n is an integer.
根據一些所揭露實施例,一種計算方法包含將多個多位元權重儲存在具有記憶胞的記憶陣列中,所述多個記憶胞以列及行組織且各自具有用以在節點處儲存訊號的記憶體單元及具有讀取賦能輸入端及輸出端的讀取埠,讀取埠用以在讀取賦能輸入端處的激活訊號之後在輸出端處產生指示儲存於記憶體單元中的節點處的訊號的訊號,且將輸出端與節點隔離,記憶陣列更具有多個讀取賦能線,每一讀取賦能線連接至記憶胞的列的讀取賦能輸入端,其中儲存多個多位元權重中的每一者包含將多位元權重儲存在共用讀取賦能線中的各別一者的記憶胞的列中,記憶陣列更具有資料輸出線,每一資料輸出線連接至記憶胞的行的讀取埠的輸出端;將脈衝訊號序列施加至各別讀取賦能線以在記憶胞的各別列的讀取埠的多個輸出端中的每一者上產生輸出訊號;將記憶胞的多個行中的每一者的讀取埠的多個輸出端上的輸出訊號組合,且藉由顯著性因子加權所組合的輸出訊號,顯著性因子中的至少兩者彼此不同;將所加權的輸出訊號組合以產生類比輸出;以及將類比輸出轉換為數位輸出。 According to some disclosed embodiments, a computing method includes storing a plurality of multi-bit weights in a memory array having memory cells organized in columns and rows and each having a signal for storing a signal at a node A memory cell and a read port having a read enable input and an output, the read port is used to generate an indication at the output at the node stored in the memory cell after an activation signal at the read enable input The output terminal is isolated from the node, the memory array further has a plurality of read enable lines, each read enable line is connected to the read enable input terminal of the row of memory cells, which stores a plurality of read enable lines. Each of the multi-bit weights includes storing the multi-bit weights in a row of memory cells that share a respective one of the read enable lines, the memory array further has data output lines, each data output line connecting to the outputs of the read ports of the rows of cells; applying a sequence of pulse signals to the respective read enable lines to generate on each of the multiple outputs of the read ports of the respective columns of the cells output signal; combining the output signals on the multiple outputs of the read ports of each of the multiple rows of the memory cell, and weighting the combined output signals by a significance factor, at least two of the significance factors are different from each other; combine the weighted output signals to generate an analog output; and convert the analog output to a digital output.
在相關實施例中,將所述多個多位元權重儲存在所述記憶陣列中包括將所述多個多位元權重中的相同顯著性的所有位元儲存在所述多個記憶胞的行中;以及藉由所述顯著性因子加權所述所組合的輸出訊號包括根據儲存於連接至各別資料輸出線的所 述多個記憶胞的所述行中的所述所有位元的所述顯著性來加權所述所組合的輸出訊號中的每一者。 In a related embodiment, storing the plurality of multi-bit weights in the memory array includes storing all bits of the same significance in the plurality of multi-bit weights in the memory cells of the plurality of memory cells and weighting the combined output signal by the significance factor comprises according to all data stored in the lines connected to the respective data output lines Each of the combined output signals is weighted by the significance of the all bits in the row of the plurality of memory cells.
在相關實施例中,藉由所述顯著性因子加權所述所組合的輸出訊號包括以2 j 倍加權所述所組合的輸出訊號,其中j表示儲存於各別行中的加權位元的顯著性位置,其中j=0表示最低有效位元。 In a related embodiment, weighting the combined output signal by the significance factor includes weighting the combined output signal by a factor of 2 j , where j represents the significance of the weighted bits stored in the respective rows sex position, where j = 0 represents the least significant bit.
在相關實施例中,藉由所述顯著性因子加權所述所組合的輸出訊號包括自所述多個資料輸出線中的各別一者對電容器進行充電或放電,所述電容器具有根據儲存於連接至各別資料輸出線的所述多個記憶胞的所述行中的所述所有位元的所述顯著性的電容。 In a related embodiment, weighting the combined output signal by the significance factor includes charging or discharging a capacitor from a respective one of the plurality of data output lines, the capacitor having a value stored in the significant capacitance of the all bits in the row of the plurality of memory cells connected to respective data output lines.
在相關實施例中,藉由所述顯著性因子加權所述所組合的輸出訊號包括對具有電容2 j Cu的電容器進行充電或放電,其中j表示儲存於各別行中的所述加權位元的所述顯著性位置,其中j=0表示所述最低有效位元且Cu為單位電容。 In a related embodiment, weighting the combined output signal by the significance factor includes charging or discharging a capacitor having capacitance 2j Cu , where j represents the weighted bits stored in respective rows where j = 0 represents the least significant bit and Cu is the unit capacitance.
在相關實施例中,組合所述所加權的輸出訊號以產生所述類比輸出包括在所述電容器之間共用電荷。 In a related embodiment, combining the weighted output signals to generate the analog output includes sharing charge among the capacitors.
在相關實施例中,自所述多個資料輸出線中的所述各別一者對所述電容器進行充電或放電包括對所述電容器進行充電或放電,同時自所述多個資料輸出線對額外電容器進行充電或放電,藉由所述多個資料輸出線中的每一者進行充電或放電的所述電容器具有總電容,對於所述多個資料輸出線中的所有者,所述總電容相同。 In a related embodiment, charging or discharging the capacitor from the respective one of the plurality of data output lines includes charging or discharging the capacitor while simultaneously from the plurality of data output line pairs Additional capacitors are charged or discharged, the capacitors charged or discharged by each of the plurality of data output lines have a total capacitance that for the owner of the plurality of data output lines same.
根據一些所揭露實施例,一種計算方法包含:將多位元 權重儲存在具有多個記憶胞的記憶陣列中,所述多個記憶胞以列及行組織且各自具有用以在節點處儲存訊號的記憶體單元及具有讀取賦能輸入端及輸出端的讀取埠,讀取埠用以在讀取賦能輸入端處的激活訊號之後在輸出端處產生指示儲存於記憶體單元中的節點處的訊號的訊號,且將輸出端與節點隔離;使輸入訊號同時乘以多位元權重中的每一者的每一位元以在讀取埠中的每一者的輸出端處產生輸出訊號;對每一行中的記憶胞的讀取埠的輸出端處的輸出訊號進行求和;藉由不同顯著性因子加權每一行中的記憶胞的讀取埠的輸出端處的輸出訊號的總和中的每一者,以產生各別加權總和;以及組合加權總和以產生類比輸出訊號。 According to some disclosed embodiments, a computing method includes: converting a multi-bit Weights are stored in a memory array having a plurality of memory cells organized in columns and rows and each having a memory cell for storing a signal at a node and a read enable input and output access port, the read port is used to generate at the output a signal indicative of the signal at the node stored in the memory cell after reading the activation signal at the enable input, and isolate the output from the node; make the input The signal is simultaneously multiplied by each bit of each of the multi-bit weights to produce an output signal at the output of each of the read ports; the output of the read port for the memory cells in each row sum the output signals at ; weight each of the sums of the output signals at the outputs of the read ports of the memory cells in each row by a different significance factor to produce an individual weighted sum; and combine the weights summed to produce an analog output signal.
在相關實施例中,將所述輸入訊號同時乘以所述多個多位元權重中的每一者的每一位元包括經由連接至所述多個記憶胞的所述列的所述讀取賦能輸入端的讀取賦能線將所述輸入訊號同時施加至所述多個記憶胞的每一列;對每一行中的所述多個記憶胞的所述讀取埠的所述輸出端處的所述多個輸出訊號進行求和包括組合各別資料輸出線中的所述多個記憶胞的所述讀取埠的所述輸出端處的電流;藉由顯著性因子加權每一行中的所述多個記憶胞的所述讀取埠的所述輸出端處的所述多個輸出訊號的所述總和中的每一者包括自各別資料輸出線對具有電容的電容器進行充電或放電,所述電容器的所述電容彼此不同;以及組合所述加權總和包括在所述電容器之間共用電荷。 In a related embodiment, simultaneously multiplying the input signal by each bit of each of the plurality of multi-bit weights includes reading via the read of the row connected to the plurality of memory cells Take the read enable line of the enable input end to apply the input signal to each column of the plurality of memory cells simultaneously; to the output end of the read port of the plurality of memory cells in each row Summing the plurality of output signals at includes combining the currents at the outputs of the read ports of the plurality of memory cells in respective data output lines; weighting in each line by a significance factor Each of the sums of the plurality of output signals at the outputs of the read ports of the plurality of memory cells includes charging or discharging a capacitor with capacitance from a respective data output line , the capacitances of the capacitors are different from each other; and combining the weighted sums includes sharing charge among the capacitors.
前文概述若干實施例的特徵,使得所屬領域中具通常知識者可更佳地理解本發明實施例的態樣。所屬領域中具通常知識者應瞭解,其可容易地使用本發明實施例作為設計或修改用於進 行本文中所引入的實施例的相同目的及/或實現相同優點的其他製程及結構的基礎。所屬領域中具通常知識者亦應認識到,此類等效構造並不脫離本發明實施例的精神及範疇,且所屬領域中具通常知識者可在不脫離本發明實施例的精神及範疇的情況下在本文中作出各種改變、替代以及更改。 The foregoing summarizes the features of several embodiments so that those skilled in the art may better understand aspects of the embodiments of the present invention. Those of ordinary skill in the art will appreciate that the embodiments of the invention may be readily employed as designs or modifications for further A basis for other processes and structures that perform the same purposes and/or achieve the same advantages of the embodiments introduced herein. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the embodiments of the present invention, and those skilled in the art can make changes without departing from the spirit and scope of the embodiments of the present invention. Variations, substitutions and alterations are made herein under circumstances.
500:計算 500: Calculate
510、520、530、540、550:步驟 510, 520, 530, 540, 550: Steps
Claims (10)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962941330P | 2019-11-27 | 2019-11-27 | |
US62/941,330 | 2019-11-27 | ||
US17/034,701 US11322195B2 (en) | 2019-11-27 | 2020-09-28 | Compute in memory system |
US17/034,701 | 2020-09-28 |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202127325A TW202127325A (en) | 2021-07-16 |
TWI750913B true TWI750913B (en) | 2021-12-21 |
Family
ID=75974417
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW109140899A TWI750913B (en) | 2019-11-27 | 2020-11-23 | Computing device and method |
Country Status (3)
Country | Link |
---|---|
US (2) | US11322195B2 (en) |
CN (1) | CN112951294B (en) |
TW (1) | TWI750913B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020139895A1 (en) * | 2018-12-24 | 2020-07-02 | The Trustees Of Columbia University In The City Of New York | Circuits and methods for in-memory computing |
US11694745B1 (en) * | 2019-10-18 | 2023-07-04 | Gigajot Technology, Inc. | SRAM with small-footprint low bit-error-rate readout |
US11322195B2 (en) * | 2019-11-27 | 2022-05-03 | Taiwan Semiconductor Manufacturing Company, Ltd. | Compute in memory system |
US11714570B2 (en) * | 2020-02-26 | 2023-08-01 | Taiwan Semiconductor Manufacturing Company, Ltd. | Computing-in-memory device and method |
US11586896B2 (en) * | 2020-03-02 | 2023-02-21 | Infineon Technologies LLC | In-memory computing architecture and methods for performing MAC operations |
US11372622B2 (en) * | 2020-03-06 | 2022-06-28 | Qualcomm Incorporated | Time-shared compute-in-memory bitcell |
US11693560B2 (en) * | 2021-01-22 | 2023-07-04 | Taiwan Semiconductor Manufacturing Company, Ltd. | SRAM-based cell for in-memory computing and hybrid computations/storage memory architecture |
TWI752823B (en) * | 2021-02-17 | 2022-01-11 | 國立成功大學 | Memory system |
CN113488092A (en) * | 2021-07-02 | 2021-10-08 | 上海新氦类脑智能科技有限公司 | Circuit for realizing multi-bit weight storage and calculation based on SRAM (static random Access memory) and storage and analog calculation system |
TWI788964B (en) * | 2021-08-20 | 2023-01-01 | 大陸商深圳市九天睿芯科技有限公司 | Subunit, MAC array, bit width reconfigurable modulus hybrid in-memory computing module |
WO2023224596A1 (en) * | 2022-05-16 | 2023-11-23 | The Trustees Of Princeton University | Shared column adcs for in-memory-computing macros |
CN115458010B (en) * | 2022-08-19 | 2023-12-01 | 南方科技大学 | Operation unit suitable for nonvolatile memory storage and calculation integrated array |
CN115935878B (en) * | 2023-01-06 | 2023-05-05 | 上海后摩智能科技有限公司 | Multi-bit data calculating circuit, chip and calculating device based on analog signals |
TWI847837B (en) * | 2023-06-26 | 2024-07-01 | 立鴻半導體股份有限公司 | Processing system |
CN117316237B (en) * | 2023-12-01 | 2024-02-06 | 安徽大学 | Time domain 8T1C-SRAM memory cell and memory circuit for timing tracking quantization |
CN117608519B (en) * | 2024-01-24 | 2024-04-05 | 安徽大学 | Signed multiplication and multiply-accumulate operation circuit based on 10T-SRAM |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190042160A1 (en) * | 2018-09-28 | 2019-02-07 | Intel Corporation | Compute in memory circuits with time-to-digital computation |
US20190043560A1 (en) * | 2018-09-28 | 2019-02-07 | Intel Corporation | In-memory multiply and accumulate with global charge-sharing |
TW201944423A (en) * | 2018-04-09 | 2019-11-16 | 美商安納富來希股份有限公司 | Logic compatible embedded flash memory |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3231842B2 (en) * | 1992-06-23 | 2001-11-26 | 株式会社 沖マイクロデザイン | Serial access memory |
US7286395B2 (en) | 2005-10-27 | 2007-10-23 | Grandis, Inc. | Current driven switched magnetic storage cells having improved read and write margins and magnetic memories using such cells |
JP4498374B2 (en) * | 2007-03-22 | 2010-07-07 | 株式会社東芝 | Semiconductor memory device |
US7835173B2 (en) | 2008-10-31 | 2010-11-16 | Micron Technology, Inc. | Resistive memory |
JP5359804B2 (en) * | 2009-11-16 | 2013-12-04 | ソニー株式会社 | Nonvolatile semiconductor memory device |
US9007818B2 (en) | 2012-03-22 | 2015-04-14 | Micron Technology, Inc. | Memory cells, semiconductor device structures, systems including such cells, and methods of fabrication |
KR20140092537A (en) * | 2013-01-16 | 2014-07-24 | 삼성전자주식회사 | Memory cell and memory device having the same |
US9697877B2 (en) * | 2015-02-05 | 2017-07-04 | The Board Of Trustees Of The University Of Illinois | Compute memory |
US9852783B1 (en) | 2016-09-23 | 2017-12-26 | Qualcomm Technologies, Inc. | Metal-oxide semiconductor (MOS) transistor offset-cancelling (OC), zero-sensing (ZS) dead zone, current-latched sense amplifiers (SAs) (CLSAs) (OCZS-SAs) for sensing differential voltages |
US10725777B2 (en) * | 2016-12-06 | 2020-07-28 | Gsi Technology, Inc. | Computational memory cell and processing array device using memory cells |
US20190228825A1 (en) * | 2018-01-24 | 2019-07-25 | Microsemi Soc Corp. | Vertical resistor based sram cells |
CN110364203B (en) * | 2019-06-20 | 2021-01-05 | 中山大学 | Storage system supporting internal calculation of storage and calculation method |
US20200410334A1 (en) * | 2019-06-25 | 2020-12-31 | Sandisk Technologies Llc | Binary weighted voltage encoding scheme for supporting multi-bit input precision |
US11322195B2 (en) * | 2019-11-27 | 2022-05-03 | Taiwan Semiconductor Manufacturing Company, Ltd. | Compute in memory system |
-
2020
- 2020-09-28 US US17/034,701 patent/US11322195B2/en active Active
- 2020-11-23 TW TW109140899A patent/TWI750913B/en active
- 2020-11-26 CN CN202011352312.0A patent/CN112951294B/en active Active
-
2022
- 2022-05-02 US US17/734,701 patent/US12073869B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW201944423A (en) * | 2018-04-09 | 2019-11-16 | 美商安納富來希股份有限公司 | Logic compatible embedded flash memory |
US20190042160A1 (en) * | 2018-09-28 | 2019-02-07 | Intel Corporation | Compute in memory circuits with time-to-digital computation |
US20190043560A1 (en) * | 2018-09-28 | 2019-02-07 | Intel Corporation | In-memory multiply and accumulate with global charge-sharing |
Also Published As
Publication number | Publication date |
---|---|
TW202127325A (en) | 2021-07-16 |
US11322195B2 (en) | 2022-05-03 |
US20210158854A1 (en) | 2021-05-27 |
US12073869B2 (en) | 2024-08-27 |
CN112951294B (en) | 2024-05-14 |
US20220262424A1 (en) | 2022-08-18 |
CN112951294A (en) | 2021-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI750913B (en) | Computing device and method | |
US10825510B2 (en) | Multi-bit dot product engine | |
TWI750038B (en) | Memory device, computing device and computing method | |
US12002539B2 (en) | Memory device and memory array structure using charge sharing for multi-bit convolutional neural network based computing-in-memory applications, and computing method thereof | |
CN109979503B (en) | Static random access memory circuit structure for realizing Hamming distance calculation in memory | |
US11693560B2 (en) | SRAM-based cell for in-memory computing and hybrid computations/storage memory architecture | |
CN114186676A (en) | Memory pulse neural network based on current integration | |
CN113467751A (en) | Analog domain in-memory computing array structure based on magnetic random access memory | |
CN114743580B (en) | Charge sharing memory computing device | |
CN115080501A (en) | SRAM (static random Access memory) storage integrated chip based on local capacitance charge sharing | |
CN114038492B (en) | Multiphase sampling memory internal computing circuit | |
US20230045840A1 (en) | Computing device, memory controller, and method for performing an in-memory computation | |
CN114895869A (en) | Multi-bit memory computing device with symbols | |
Zhou et al. | A 28 nm 81 Kb 59–95.3 TOPS/W 4T2R ReRAM computing-in-memory accelerator with voltage-to-time-to-digital based output | |
CN116451758B (en) | Weighted summation in-memory computing circuit and memory | |
CN114974351B (en) | Multi-bit memory computing unit and memory computing device | |
Jiang et al. | A 16nm 128kB high-density fully digital In Memory Compute macro with reverse SRAM pre-charge achieving 0.36 TOPs/mm 2, 256kB/mm 2 and 23. 8TOPs/W | |
US20230410862A1 (en) | In-memory computation circuit using static random access memory (sram) array segmentation | |
CN118312468B (en) | In-memory operation circuit with symbol multiplication and CIM chip | |
Kang et al. | An Analog Neuromorphic On-Chip Training System with IGZO TFT-Based 6T1C 367-State Synaptic Memory Achieving 0.99-R 2 Linearity and 104-Times Enhanced Retention Time | |
Lee et al. | Intrinsic Capacitance based Multi bit Computing in Memory | |
Zhang et al. | An 8T SRAM Array with Configurable Word Lines for In-Memory Computing Operation. Electronics 2021, 10, 300 | |
Li | An energy efficient compute-in-memory SRAM for low power CNN based machine learning application | |
JPH11110974A (en) | Dynamic semiconductor storage |