TWI771014B - Memory circuit and operating method thereof - Google Patents

Memory circuit and operating method thereof Download PDF

Info

Publication number
TWI771014B
TWI771014B TW110118621A TW110118621A TWI771014B TW I771014 B TWI771014 B TW I771014B TW 110118621 A TW110118621 A TW 110118621A TW 110118621 A TW110118621 A TW 110118621A TW I771014 B TWI771014 B TW I771014B
Authority
TW
Taiwan
Prior art keywords
data element
data elements
memory
bits
adder
Prior art date
Application number
TW110118621A
Other languages
Chinese (zh)
Other versions
TW202203053A (en
Inventor
池育德
藤原英弘
史毅駿
李伯浩
陳炎輝
李嘉富
琮永 張
Original Assignee
台灣積體電路製造股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 台灣積體電路製造股份有限公司 filed Critical 台灣積體電路製造股份有限公司
Publication of TW202203053A publication Critical patent/TW202203053A/en
Application granted granted Critical
Publication of TWI771014B publication Critical patent/TWI771014B/en

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C8/00Arrangements for selecting an address in a digital store
    • G11C8/12Group selection circuits, e.g. for memory block selection, chip selection, array selection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • G06F7/501Half or full adders, i.e. basic adder cells for one denomination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/53Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
    • G06F7/5318Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel with column wise addition of partial products, e.g. using Wallace tree, Dadda counters
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/401Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
    • G11C11/4063Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
    • G11C11/407Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
    • G11C11/4074Power supply or voltage generation circuits, e.g. bias voltage generators, substrate voltage generators, back-up power, power control circuits
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1006Data managing, e.g. manipulating data before writing or reading out, data bus switches or control circuits therefor
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1006Data managing, e.g. manipulating data before writing or reading out, data bus switches or control circuits therefor
    • G11C7/1012Data reordering during input/output, e.g. crossbars, layers of multiplexers, shifting or rotating
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1015Read-write modes for single port memories, i.e. having either a random port or a serial port
    • G11C7/1036Read-write modes for single port memories, i.e. having either a random port or a serial port using data shift registers

Abstract

A memory circuit includes a selection circuit, a column of memory cells, and an adder tree. The selection circuit is configured to receive input data elements, each input data element including a number of bits equal to H, and output a selected set of kth bits of the H bits of the input data elements. Each memory cell of the column of memory cells includes a first storage unit configured to store a first weight data element and a first multiplier configured to generate a first product data element based on the first weight data element and a first kth bit of the selected set of kth bits. The adder tree is configured to generate a summation data element based on each of the first product data elements. A method of operating memory circuit is also disclosed herein.

Description

記憶體電路及其操作方法 Memory circuit and method of operation

本揭示內容是關於一種記憶體電路及其操作方法。 The present disclosure relates to a memory circuit and a method of operation thereof.

記憶體陣列通常用於儲存及存取用於各種類型的運算(諸如邏輯或數學運算)的資料。為了執行此等運算,資料位元在記憶體陣列及用於執行運算的電路之間移動。在一些情況下,運算包括多層運算,且第一運算的結果被用作第二運算中的輸入資料。 Memory arrays are commonly used to store and access data for various types of operations, such as logical or mathematical operations. To perform these operations, data bits are moved between the memory array and the circuits used to perform the operations. In some cases, the operation includes multiple layers of operations, and the results of the first operation are used as input data in the second operation.

本揭示內容包含一種記憶體電路。記憶體電路包括用以接收多個輸入資料元(輸入資料元中的各輸入資料元包括等於N的數目個位元)且輸出輸入資料元中的各輸入資料元的H個位元中的第k個位元的經選擇的集合的選擇電路、記憶體單元行(記憶體單元行的各記憶體單元包括用以儲存第一權重資料元的第一儲存單元及用以基於第一權重資料元及第k個位元的經選擇的集合中的第一第k個位元產生第一乘積資料元的第一乘法器)及用以基於第一乘 積資料元中的各者產生求和資料元的加法器樹。 The present disclosure includes a memory circuit. The memory circuit includes a plurality of input data elements for receiving a plurality of input data elements (each input data element of the input data elements includes a number of bits equal to N) and outputting the first of the H bits of each input data element of the input data elements. A selection circuit for a selected set of k bits, a row of memory cells (each memory cell of a row of memory cells includes a first storage cell for storing a first weight data element and a first weight data element for and the first kth bit in the selected set of kth bits to generate the first multiplier of the first product data element) and used to multiply based on the first Each of the product data elements produces an adder tree that sums the data elements.

本揭示內容包含一種操作記憶體電路的方法。方法包括在記憶體單元行處接收輸入資料元中的各輸入資料元的H數目個位元中的第k個位元的集合;使用記憶體單元行的各記憶體單元將資料元中的對應輸入資料元的第k個位元與儲存在記憶體單元中的第一權重資料元相乘,由此產生對應第一乘積資料元;使用加法器樹基於第一乘積資料元的各者產生求和資料元。 The present disclosure includes a method of operating a memory circuit. The method includes receiving, at the row of memory cells, a set of the kth bit of the H number of bits of each input data element in the input data element; The kth bit of the input data element is multiplied by the first weight data element stored in the memory unit, thereby generating a corresponding first product data element; using an adder tree to generate a calculation based on each of the first product data elements; and data elements.

本揭示內容包含一種記憶體電路。記憶體電路包括一選擇電路,選擇電路用以針對各包含H個位元的多個輸入資料元,將第k個位元的經選擇的集合依序輸出至多個記憶體單元行中的各記憶體單元行的對應記憶體單元;多個加法器樹,加法器樹中的各加法器樹耦接至記憶體單元行中的對應記憶體單元行;多個累加器,累加器中的各累加器耦接至加法器樹的對應加法器樹。各記憶體單元行的各記憶體單元包括乘法器,乘法器用以基於第k個位元的經選擇的集合中的對應第k個位元及儲存在記憶體單元中的權重資料元產生乘積資料元;加法器樹中的各加法器樹用以針對第k個位元的各依序輸出集合,基於對應記憶體單元行的乘積資料元的各者產生求和資料元;累加器中的各累加器用以基於由加法器樹中的對應加法器樹產生的求和資料元產生部分和。 The present disclosure includes a memory circuit. The memory circuit includes a selection circuit for sequentially outputting a selected set of the kth bit to each memory in the plurality of memory cell rows for a plurality of input data elements each including H bits Corresponding memory cells in a row of memory cells; a plurality of adder trees, each adder tree in the adder tree coupled to a corresponding row of memory cells in the row of memory cells; a plurality of accumulators, each of the accumulators in the accumulators The adder is coupled to a corresponding adder tree of the adder tree. Each memory cell of each memory cell row includes a multiplier for generating product data based on the corresponding kth bit in the selected set of the kth bit and the weight data element stored in the memory cell element; each adder tree in the adder tree is used for each sequential output set of the kth bit to generate a summation data element based on each of the product data elements corresponding to the row of memory cells; each of the accumulators in the accumulator Accumulators are used to generate partial sums based on summing data elements generated by corresponding adder trees in the adder tree.

100A:記憶體電路 100A: Memory circuit

100B:記憶體電路 100B: Memory circuit

110:選擇電路 110: Selection circuit

120A:記憶體陣列 120A: Memory Array

120B:記憶體陣列 120B: Memory array

122:加法器樹 122: Adder tree

130:I/O電路 130: I/O circuit

140:累加器 140: accumulator

150:控制電路 150: Control circuit

152:處理器 152: Processor

154:電腦可讀儲存媒體 154: Computer-readable storage media

200:選擇電路 200: Selection circuit

200R:資料暫存器 200R: Data register

300A:記憶體單元 300A: memory unit

300B:記憶體單元 300B: Memory unit

400:加法器樹 400: Adder tree

500:累加器 500: accumulator

900:方法 900: Method

910:操作 910: Operation

920:操作 920:Operation

930:操作 930: Operation

940:操作 940: Operation

950:操作 950:Operation

1000:方法 1000: Method

1010:操作 1010: Operation

1020:操作 1020: Operations

1030:操作 1030: Operation

1040:操作 1040: Operation

1050:操作 1050: Operation

1060:操作 1060:Operation

1070:操作 1070:Operation

本揭示內容的態樣將在結合附圖閱讀時自以下詳 細描述最佳地瞭解。應注意,根據行業中的標準慣例,各種特徵未按比例繪製。實際上,各種特徵的尺寸可為了論述清楚經任意地增大或減小。 Aspects of the present disclosure will emerge from the following details when read in conjunction with the accompanying drawings. A detailed description is best understood. It should be noted that in accordance with standard practice in the industry, the various features are not drawn to scale. In fact, the dimensions of the various features may be arbitrarily increased or decreased for clarity of discussion.

第1A圖及第1B圖係根據一些實施例的記憶體電路的圖。 1A and 1B are diagrams of memory circuits according to some embodiments.

第2圖係根據一些實施例的選擇電路的圖。 FIG. 2 is a diagram of a selection circuit according to some embodiments.

第3A圖及第3B圖係根據一些實施例的記憶體單元的圖。 3A and 3B are diagrams of memory cells according to some embodiments.

第4圖係根據一些實施例的加法器樹的圖。 Figure 4 is a diagram of an adder tree according to some embodiments.

第5圖係根據一些實施例的累加器的圖。 Figure 5 is a diagram of an accumulator according to some embodiments.

第6圖係根據一些實施例的記憶體陣列的部分的圖。 FIG. 6 is a diagram of a portion of a memory array in accordance with some embodiments.

第7A圖及第7B圖係根據一些實施例的記憶體電路的部分的圖。 7A and 7B are diagrams of portions of memory circuits according to some embodiments.

第8圖係根據一些實施例的記憶體電路工作電壓的圖。 FIG. 8 is a graph of memory circuit operating voltages in accordance with some embodiments.

第9圖係根據一些實施例的操作記憶體電路的方法的流程圖。 9 is a flowchart of a method of operating a memory circuit in accordance with some embodiments.

第10圖係根據一些實施例的操作記憶體電路的方法的流程圖。 10 is a flowchart of a method of operating a memory circuit in accordance with some embodiments.

以下揭示內容提供用於實施所提供標的物的不同特徵的許多不同實施例或實例。組件、值、操作、材料、配置、或類似者的特定實例將在下文描述以簡化本揭示內容。當然,此等各者僅為實例且不欲為限制性的。考慮其他組件、值、操作、材料、配置、或類似者。舉例而言,在以下描述中第一特徵於第二特徵上方或上的形成可包括第一及第二特徵直接接觸地形成的實施例,且亦可包括額 外特徵可形成於第一特徵與第二特徵之間使得第一特徵及第二特徵可不直接接觸的實施例。另外,本揭示內容在各種實例中可重複參考數位及/或字母。此重複係出於簡單及清楚的目的,且本身並不指明所論述的各種實施例及/或組態之間的關係。 The following disclosure provides many different embodiments or examples for implementing different features of the provided subject matter. Specific examples of components, values, operations, materials, configurations, or the like are described below to simplify the present disclosure. Of course, these are examples only and are not intended to be limiting. Consider other components, values, operations, materials, configurations, or the like. For example, in the following description the formation of a first feature over or over a second feature may include embodiments in which the first and second features are formed in direct contact, and may also include additional Embodiments in which the outer feature may be formed between the first feature and the second feature such that the first feature and the second feature may not be in direct contact. In addition, the disclosure may repeat reference to digits and/or letters in various instances. This repetition is for the purpose of simplicity and clarity, and does not in itself indicate the relationship between the various embodiments and/or configurations discussed.

此外,為了方便用於描述如諸圖中圖示的一個元件或特徵與另一元件(多個)或特徵(多個)的關係的描述,在本文中可使用空間相對術語,諸如「在......下面」、「在......之下」、「下部」、「在......之上」、「上部」、及類似者。空間相對術語意欲涵蓋除了諸圖中所描繪的定向以外的元件在使用或操作時的不同定向。設備可另外定向(例如,旋轉90度或處於其他定向),且本文中所使用的空間相對描述符可類似地加以相應解釋。 Furthermore, to facilitate the description used to describe the relationship of one element or feature to another element(s) or feature(s) as illustrated in the figures, spatially relative terms such as "in. "below", "below", "below", "above", "above", and the like. Spatially relative terms are intended to encompass different orientations of elements in use or operation in addition to the orientation depicted in the figures. The device may be otherwise oriented (eg, rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted similarly accordingly.

在各種實施例中,記憶體電路的記憶體陣列包括記憶體儲存單元及數學運算單元兩者,且由此用以執行記憶體內運算,從而基於輸入資料元及儲存的權重資料元產生部分和。與記憶體陣列不包括用以執行記憶體內運算的元件的方法相比,此類記憶體電路能夠使用較小的面積及較低的功率位凖來產生部分和。在各種應用中,例如卷積類神經網路(convolutional neural network;CNN)應用中,記憶體電路使儲存的權重資料元的陣列能夠有效地應用於針對輸入資料元的一或多個集合的乘積累加(multiply and accumulate;MAC)運算中。 In various embodiments, the memory array of the memory circuit includes both memory storage units and mathematical operation units, and is thereby used to perform in-memory operations to generate partial sums based on input data elements and stored weight data elements. Such memory circuits are capable of generating partial sums using a smaller area and lower power bits than methods in which the memory array does not include elements to perform in-memory operations. In various applications, such as convolutional neural network (CNN) applications, memory circuitry enables an array of stored weight data elements to be efficiently applied to the product of one or more sets of input data elements Accumulation (multiply and accumulate; MAC) operation.

第1A圖及第1B圖係根據一些實施例的記憶體電 路100A及100B各自的圖。各記憶體電路100A及100B包括耦接至輸入資料匯流排IDB及對應記憶體陣列120A或120B的選擇電路110、耦接至對應記憶體陣列120A或120B的輸入/輸出(input/output;I/O)電路130及M數目個累加器140、及經由控制信號匯流排CTRLB耦接至以下各者的控制電路150:選擇電路110、對應記憶體陣列120A或120B、I/O電路130、及各累加器140。 FIGS. 1A and 1B are memory circuits according to some embodiments. Figures for each of roads 100A and 100B. Each of the memory circuits 100A and 100B includes a selection circuit 110 coupled to the input data bus IDB and the corresponding memory array 120A or 120B, an input/output (I/O) coupled to the corresponding memory array 120A or 120B O) circuit 130 and M number of accumulators 140, and control circuit 150 coupled via control signal bus CTRLB to selection circuit 110, corresponding memory array 120A or 120B, I/O circuit 130, and each Accumulator 140 .

各記憶體陣列120A及120B包括對應於M個累加器140的M個行C1至CM。記憶體陣列120A包括N數目個記憶體單元BCX列,各記憶體單元BCX列包括單個輸入端子(未標記)及單個輸出端子(未標記),各輸入端子由此對應於記憶體陣列120A的N個資料列中的一者。記憶體陣列120B包括N/2個記憶體單元BX2列,各記憶體單元BX2列包括兩個輸入端子(未標記)及單個輸出端子(未標記),各輸入端子由此對應於記憶體陣列120B的N個資料列中的一者。如下所述,各記憶體電路100A及100B由此用以在輸入資料匯流排IDB上接收複數N個輸入資料元A1至AN,各輸入資料元A1至AN包括一等於H的數目個位元。 Each of the memory arrays 120A and 120B includes M rows C1 through CM corresponding to the M accumulators 140 . Memory array 120A includes N number of memory cell BCX columns, each memory cell BCX column includes a single input terminal (not labeled) and a single output terminal (not labeled), each input terminal thus corresponding to N of memory array 120A. one of the data columns. Memory array 120B includes N/2 columns of memory cells BX2, each column BX2 of memory cells includes two input terminals (not labeled) and a single output terminal (not labeled), each input terminal thus corresponds to memory array 120B One of the N data columns of . As described below, each of the memory circuits 100A and 100B is thus configured to receive a plurality of N input data elements A1 through AN on the input data bus IDB, each input data element A1 through AN including a number of bits equal to H.

第1表描繪輸入資料元A1至AN的資料結構,該N個輸入資料元A1至AN中的各者包括H個資料位元。 Table 1 depicts the data structure of input data elements A1 through AN, each of the N input data elements A1 through AN including H data bits.

Figure 110118621-A0305-02-0008-1
Figure 110118621-A0305-02-0008-1

如下所述,記憶體電路100A及100B用以使得在操作中,各記憶體陣列120A及120B的各行C1至CM同時自選擇電路110接收各輸入資料元A1至AN的相同編號的位元(第k個位元),即位元A1k至ANk的一集合。各行基於接收的位元A1k至ANk的集合及儲存在對應記憶體單元BCX或BX2中的權重資料元執行數學運算,由此產生對應於行C1至CM的M數目個求和資料元SD1至SDM。 As described below, memory circuits 100A and 100B are configured to allow, in operation, each row C1 to CM of each memory array 120A and 120B to simultaneously receive the same numbered bits of each input data element A1 to AN from select circuit 110 (th k bits), that is, a set of bits A1k to ANk. Each row performs a mathematical operation based on the received set of bits A1k to ANk and the weight data elements stored in the corresponding memory cell BCX or BX2, thereby generating an M number of summation data elements SD1 to SDM corresponding to rows C1 to CM .

計數器k在H個位元中的各者之間迴圈,例如自1至H,使得選擇電路110以依序選擇的方式輸出位元A1k至ANk的集合,且各行針對計數器k的各值對位A1k至ANk的經選擇的集合重複數學運算,由此產生H個求和資料元SD1至SDM的序列。累加器140用以基於求和資料元SD1至SDM的序列產生對應部分和PS1至PSM,及在對應輸出埠O1至OM上輸出部分和PS1至PSM。 Counter k loops between each of the H bits, eg, from 1 to H, such that selection circuit 110 outputs the set of bits A1k to ANk in a sequentially selected fashion, with each row for each value pair of counter k The selected set of bits A1k through ANk repeats the mathematical operation, thereby producing a sequence of H summed data elements SD1 through SDM. The accumulator 140 is used to generate the corresponding partial sums PS1 to PSM based on the sequence of summed data elements SD1 to SDM, and output the partial sums PS1 to PSM on the corresponding output ports O1 to OM.

在第1A圖描繪的實施例中,記憶體陣列120A包括記憶體單元BCX,該些記憶體單元BCX用以各接收輸入資料元A1至AN的第k個位元的依序經選擇的集合中的一個位元,及在第1B圖描繪的實施例中,記憶體陣列120B包括記憶體單元BX2,該些記憶體單元BX2用以 各接收輸入資料元A1至AN的第k個位元的依序經選擇的集合中的相鄰資料元的兩個位元。各記憶體電路100A及100B由此用以能夠執行一方法(例如,下文關於第9圖及第10圖論述的方法900或1000中的一或兩者)的一些或全部,藉由該方法執行記憶體內運算。 In the embodiment depicted in FIG. 1A, memory array 120A includes memory cells BCX for each receiving in a sequentially selected set of the kth bit of input data elements A1 through AN , and in the embodiment depicted in FIG. 1B, memory array 120B includes memory cells BX2 for Each receives two bits of adjacent data elements in the sequentially selected set of the kth bit of input data elements A1 through AN. Each memory circuit 100A and 100B is thereby used to be able to perform some or all of a method (eg, one or both of methods 900 or 1000 discussed below with respect to FIGS. 9 and 10) by which the method is performed In-memory operations.

在各種實施例中,記憶體電路100A或100B包括於類神經網路中,例如CNN、感測器(例如磁、影像、振動、或陀螺感測器)、射頻(radio-frequency;RF)裝置、或其他積體電路(integrated circuit;IC)裝置。 In various embodiments, the memory circuit 100A or 100B is included in a neural-like network, such as a CNN, sensors (eg, magnetic, imaging, vibration, or gyroscopic sensors), radio-frequency (RF) devices , or other integrated circuit (integrated circuit; IC) devices.

為了說明的目的,簡化了各記憶體電路100A及100B。在各種實施例中,記憶體電路100A或100B中的一或兩者包括除第1A圖及第1B圖中描繪的那些元件之外的各種元件,或以其他方式佈置以執行下述操作。 Each of the memory circuits 100A and 100B is simplified for illustrative purposes. In various embodiments, one or both of the memory circuits 100A or 100B include various elements other than those depicted in Figures 1A and 1B, or are otherwise arranged to perform the operations described below.

基於一或多個直接信號連接及/或一或多個間接信號連接,兩或多個電路元件被認為係耦接的,該些信號連接包括兩或多個電路元件之間的一或多個邏輯裝置,例如反相器或邏輯閘。在一些實施例中,兩或多個耦接的電路元件之間的信號通訊能夠藉由一或多個邏輯裝置經修改,例如,經反相或使其有條件。 Two or more circuit elements are considered coupled based on one or more direct signal connections and/or one or more indirect signal connections, including one or more between the two or more circuit elements Logic devices such as inverters or logic gates. In some embodiments, signal communication between two or more coupled circuit elements can be modified, eg, inverted or conditional, by one or more logic devices.

選擇電路110係一種電子電路,該電子電路包括耦接至輸入資料匯流排IDB的一或多個資料暫存器(第1A圖及第1B圖中未示出),及耦接至該一或多個資料暫存器及控制信號匯流排CTRLB的一或多個多工器或類似電路(第1A圖及第1B圖中未示出)。 The selection circuit 110 is an electronic circuit including one or more data registers (not shown in FIGS. 1A and 1B ) coupled to the input data bus IDB, and coupled to the one or more data registers (not shown in FIGS. 1A and 1B ) A plurality of data registers and one or more multiplexers or similar circuits for the control signal bus CTRLB (not shown in Figures 1A and 1B).

資料暫存器,在一些實施例中亦稱為緩衝器,係一種電子電路,該電子電路用以臨時儲存一或多個資料元的一些或全部,例如各輸入資料元A1至AN的H個位元。在各種實施例中,資料暫存器包括用以輸入及輸出資料位元的端子的單個集合,或用於輸入及輸出資料位元的端子的分開集合。 A data register, also referred to as a buffer in some embodiments, is an electronic circuit used to temporarily store some or all of one or more data elements, such as H of each input data element A1 to AN bits. In various embodiments, the data register includes a single set of terminals for inputting and outputting data bits, or separate sets of terminals for inputting and outputting data bits.

多工器係一種電子電路,該電子電路包括用以接收複數個信號(例如,輸入資料元A1至AN中的一者的H個位元)的端子的第一集合、用以接收一或多個控制信號(例如,控制信號CTRL)的一或多個切換裝置(例如,電晶體)、及至少一個端子,該端子用以響應於該一或多個控制信號輸出接收的信號中經選擇的一者。 A multiplexer is an electronic circuit that includes a first set of terminals for receiving a plurality of signals (eg, H bits of one of input data elements A1 through AN), for receiving one or more one or more switching devices (eg, transistors) for a control signal (eg, control signal CTRL), and at least one terminal for outputting selected ones of the received signals in response to the one or more control signals one.

由此選擇電路110用以儲存在輸入資料匯流排上接收的各資料元A1至AN的H個位元,且響應於在控制信號匯流排CTRLB上接收的一或多個控制信號CTRL,將選擇的第k個位元A1k至ANk的集合輸出至記憶體陣列120A或120B中的對應一者。對於各輸入資料元A1至AN,該對應選擇的第k個位元A1k至ANk係總計H個位元的相同的第k個位元。在一些實施例中,選擇電路110包括下文關於第2圖論述的選擇電路200。 The selection circuit 110 is thus used to store the H bits of each data element A1 to AN received on the input data bus, and in response to one or more control signals CTRL received on the control signal bus CTRLB, to select The set of k-th bits A1k to ANk of is output to the corresponding one of the memory arrays 120A or 120B. For each input data element A1 to AN, the correspondingly selected k-th bit A1k to ANk is the same k-th bit of a total of H bits. In some embodiments, selection circuit 110 includes selection circuit 200 discussed below with respect to FIG. 2 .

在一些實施例中,選擇電路110用以接收N數目(範圍自4至512)個輸入資料元A1至AN。在一些實施例中,選擇電路110用於接收N數目(範圍自32至128)個輸入資料元A1至AN。 In some embodiments, the selection circuit 110 is configured to receive an N number (ranging from 4 to 512) of input data elements A1 to AN. In some embodiments, the selection circuit 110 is configured to receive an N number (ranging from 32 to 128) of input data elements A1 to AN.

在一些實施例中,選擇電路110用以接收各輸入資料元A1至AN的H數目(範圍自1至16)個位元。在一些實施例中,選擇電路110用以接收各輸入資料元A1至AN的H數目(範圍自4至8)個位元。 In some embodiments, the selection circuit 110 is configured to receive H number (ranging from 1 to 16) bits of each input data element A1 to AN. In some embodiments, the selection circuit 110 is configured to receive H number (ranging from 4 to 8) bits of each input data element A1 to AN.

在各種實施例中,一或多個控制信號CTRL用以在操作中使選擇電路110自最低有效位元(least significant bit;LSB)至最高有效位元(most significant bit;MSB)或自MSB至LSB依序輸出第k個位元A1k至ANk的集合。在各種實施例中,一或多個控制信號CTRL用以使選擇電路110依序輸出H數目個的位元集合的全部或H數目個的位元集合的子集。在一些實施例中,各輸入資料元A1至AN包括少於H的數目個位元,且一或多個控制信號CTRL用以使選擇電路110依序輸出接收的位元數目個位元集合的全部或子集。 In various embodiments, one or more control signals CTRL are used to cause the selection circuit 110 to operate from the least significant bit (LSB) to the most significant bit (MSB) or from the MSB to The LSB sequentially outputs the set of k-th bits A1k to ANk. In various embodiments, one or more control signals CTRL are used to cause the selection circuit 110 to sequentially output all or a subset of the H number of bit sets. In some embodiments, each of the input data elements A1 to AN includes a number of bits less than H, and the one or more control signals CTRL are used to cause the selection circuit 110 to sequentially output the received number of bits of a set of bits. All or a subset.

在各種實施例中,一或多個控制信號CTRL用以使選擇電路110針對計數器k的各值,輸出第k個位元A1k至ANk的對應經選擇的集合的全部或子集。在一些實施例中,複數個輸入資料元包括少於N的數目個資料元,且一或多個控制信號CTRL用以使選擇電路110針對計數器k的各值,輸出接收的資料元數目個資料元的第k個位元A1k至ANk的對應集合的全部或子集。 In various embodiments, one or more control signals CTRL are used to cause selection circuit 110 to output all or a subset of the corresponding selected set of kth bits A1k through ANk for each value of counter k. In some embodiments, the plurality of input data elements includes a number of data elements less than N, and one or more control signals CTRL are used to cause the selection circuit 110 to output the received data element number of data for each value of the counter k All or a subset of the corresponding set of k-th bits A1k to ANk of the element.

各記憶體陣列120A及120B係一種電子電路,該電子電路包括M個行C1至CM,各行C1至CM包括下文論述的加法器樹122,及耦接至加法器樹122的對應 記憶體單元BCX或BX2。各行C1至CM的記憶體單元BCX或BX2進一步耦接至選擇電路110且由此用以使得在操作中,各行C1至CM同時接收自選擇電路110基於計數器k輸出的第k個位元A1k至ANk的經選擇的集合。 Each memory array 120A and 120B is an electronic circuit that includes M rows C1-CM, each row C1-CM including an adder tree 122 discussed below, and a corresponding adder tree coupled to the adder tree 122 Memory cell BCX or BX2. The memory cells BCX or BX2 of each row C1 to CM are further coupled to the selection circuit 110 and are thereby used so that in operation, each row C1 to CM simultaneously receives the kth bit A1k to CM from the selection circuit 110 based on the output of the counter k. The selected set of ANk.

因為各記憶體單元BCX用以接收單個資料元A1至AN的位元,記憶體陣列120A包括總計N個記憶體單元BCX列R1至RN,使得各列R1至RN對應於記憶體陣列120A的一資料列。因為各記憶體單元BX2用以接收兩個資料元A1至AN的位元,記憶體陣列120B包括總計L個記憶體單元BX2列R1至RL,數目L等於N/2,使得各列R1至RL對應於記憶體陣列120B的兩個資料列。在第1A圖及第1B圖描繪的實施例中,記憶體單元BCX或BX2的各實例包括位置指示符,例如21,對應於給定實例所在的行及列。 Because each memory cell BCX is used to receive bits of a single data element A1 to AN, the memory array 120A includes a total of N memory cell BCX rows R1 to RN, such that each row R1 to RN corresponds to a row of the memory array 120A. data column. Since each memory cell BX2 is used to receive the bits of two data cells A1 to AN, the memory array 120B includes a total of L memory cells BX2 rows R1 to RL, the number L being equal to N/2, such that each row R1 to RL Corresponding to the two data columns of the memory array 120B. In the embodiment depicted in Figures 1A and 1B, each instance of memory cell BCX or BX2 includes a position indicator, such as 21, corresponding to the row and column in which a given instance is located.

在一些實施例中,記憶體陣列120A或120B包括M數目(範圍自2至512)個行C1至CM。在一些實施例中,記憶體陣列120A或120B包括M數目(範圍自16至128)個行C1至CM。 In some embodiments, the memory array 120A or 120B includes an M number (ranging from 2 to 512) of rows C1 to CM. In some embodiments, the memory array 120A or 120B includes an M number (ranging from 16 to 128) of rows C1 to CM.

在第1A圖及第1B圖描繪的實施例中,各記憶體陣列120A及120B包括列R1至RN或列R1至RL及行C1至CM的單個陣列層。在一些實施例中,記憶體陣列120A或120B中的一或兩者除第1A圖及第1B圖描繪的單個層之外還包括一或多個陣列層(未示出),由此包 括除單個層的那些列及行之外的列及行。 In the embodiment depicted in Figures 1A and 1B, each memory array 120A and 120B includes a single array layer of columns R1-RN or columns R1-RL and rows C1-CM. In some embodiments, one or both of the memory arrays 120A or 120B include one or more array layers (not shown) in addition to the single layer depicted in Figures 1A and 1B, thereby including Include columns and rows other than those of a single layer.

記憶體單元BCX包括耦接至乘法器(第1A圖及第1B圖中未示出)的儲存元件。儲存元件係用以儲存由邏輯狀態表達的一或多個資料位元的電氣、機電、電磁、或其他裝置。在一些實施例中,邏輯狀態對應於儲存元件的部分或全部中儲存的電荷的電壓位凖。在一些實施例中,邏輯狀態對應於儲存元件的部分或全部的物理特性,例如,電阻或磁取向。 The memory cell BCX includes storage elements coupled to a multiplier (not shown in Figures 1A and 1B). A storage element is an electrical, electromechanical, electromagnetic, or other device used to store one or more bits of data expressed by logical states. In some embodiments, the logic state corresponds to the voltage level of the charge stored in some or all of the storage element. In some embodiments, the logical state corresponds to some or all of the physical properties of the storage element, such as resistance or magnetic orientation.

在一些實施例中,儲存元件包括一或多個靜態隨機存取記憶體(static random-access memory;SRAM)單元。在各種實施例中,SRAM單元,例如,五電晶體(five-transistor;5T)、六電晶體(six-transistor;6T)、八電晶體(eight-transistor;8T)、或九電晶體(nine-transistor;9T)SRAM單元,包括範圍自二至十二的數目個電晶體。在一些實施例中,SRAM單元包括多軌SRAM單元。在一些實施例中,SRAM單元包括至少兩倍於寬度的長度。 In some embodiments, the storage element includes one or more static random-access memory (SRAM) cells. In various embodiments, an SRAM cell, eg, five-transistor (5T), six-transistor (6T), eight-transistor (8T), or nine-transistor (nine) -transistor; 9T) SRAM cell comprising a number of transistors ranging from two to twelve. In some embodiments, the SRAM cells comprise multi-rail SRAM cells. In some embodiments, the SRAM cell includes a length that is at least twice the width.

在一些實施例中,儲存元件包括一或多個動態隨機存取記憶體(dynamic random-access memory;DRAM)單元、電阻隨機存取記憶體(resistive random-access memory;RRAM)單元、磁電阻隨機存取記憶體(magnetoresistive random-access memory;MRAM)單元、鐵電隨機存取記憶體(ferroelectric random-access memory;FeRAM) 單元、反或快閃單元、反及快閃單元、導電橋隨機存取記憶體(conductive-bridging random-access memory;CBRAM)單元、資料暫存器、非揮發性記憶體(non-volatile memory;NVM)單元、3D NVM單元、或其他能夠儲存位元資料的記憶體單元類型。 In some embodiments, the storage element includes one or more dynamic random-access memory (DRAM) cells, resistive random-access memory (RRAM) cells, Access memory (magnetoresistive random-access memory; MRAM) unit, ferroelectric random-access memory (ferroelectric random-access memory; FeRAM) cell, inverse or flash cell, inverse and flash cell, conductive-bridging random-access memory (CBRAM) cell, data register, non-volatile memory; NVM) cells, 3D NVM cells, or other memory cell types capable of storing bit data.

在一些實施例中,儲存元件用以儲存範圍自1至16的數目個資料位元。在一些實施例中,儲存元件用以儲存範圍自4至8的數目個資料位元。 In some embodiments, the storage element is used to store a number of data bits ranging from 1 to 16. In some embodiments, the storage element is used to store a number of data bits ranging from 4 to 8.

儲存元件包括一或多個I/O連接(未示出),經由該些I/O連接邏輯狀態在寫入操作中經程式化及在讀取操作中經存取,例如乘法運算。 The storage element includes one or more I/O connections (not shown) through which logic states are programmed in write operations and accessed in read operations, such as multiply operations.

乘法器係包括一或多個邏輯閘的電子電路,該些邏輯閘用以基於接收的資料位元(例如,選擇的第k個位元A1k至ANk中的一者)及接收的資料元(例如,儲存在儲存元件中的多位元權重資料元)執行數學運算(例如,乘法),由此產生等於輸入資料位元與輸入資料元的乘積的乘積資料元。在一些實施例中,乘法器用以產生乘積資料元,該乘積資料元包括等於接收的資料元的位元數目的數目個位元。在各種實施例中,乘法器包括一或多個及閘或反或閘或適於執行乘法運算的一些或全部的其他電路。 A multiplier is an electronic circuit that includes one or more logic gates based on a received data bit (eg, a selected one of the kth bit A1k to ANk ) and the received data element ( For example, a multi-bit weight data element stored in a storage element performs a mathematical operation (eg, multiplication), thereby producing a product data element equal to the product of the input data element and the input data element. In some embodiments, a multiplier is used to generate a product data element comprising a number of bits equal to the number of bits of the received data element. In various embodiments, the multiplier includes one or more AND gates or inverse-OR gates or other circuits suitable for performing some or all of the multiplication operations.

藉由包括耦接至乘法器且用以儲存權重資料元的儲存元件,及耦接至選擇電路110且用以接收第k個位元A1k至ANk的經選擇的集合中的一個位元的乘法器,各記憶體單元BCX用以基於第k個位元A1k至ANk的經 選擇的集合中的一個位元及對應於記憶體陣列120A內給定的記憶體單元BCX的位置的權重資料元產生乘積資料元P11至PMN。在一些實施例中,記憶體單元BCX包括下文關於第3A圖論述的記憶體單元300A。 By including a storage element coupled to the multiplier and used to store the weight data element, and coupled to the selection circuit 110 and used to receive the multiplication of a bit in the selected set of kth bits A1k-ANk , each memory cell BCX is used for processing based on the kth bit A1k to ANk A bit in the selected set and the weight data element corresponding to the location of a given memory cell BCX within memory array 120A generate product data elements P11 through PMN. In some embodiments, memory cell BCX includes memory cell 300A discussed below with respect to FIG. 3A.

記憶體單元BX2包括耦接至第一乘法器的第一儲存元件、耦接至第二乘法器的第二儲存元件及耦接至第一及第二乘法器的加法器(第1A圖及第1B圖中未示出)。第一儲存元件及乘法器用以產生第一乘積資料元,如上文關於記憶體單元BCX所論述;及第二儲存元件及乘法器用以產生第二乘積資料元,如上文關於記憶體單元BCX所論述。 The memory cell BX2 includes a first storage element coupled to the first multiplier, a second storage element coupled to the second multiplier, and an adder coupled to the first and second multipliers (FIG. 1A and FIG. Not shown in Figure 1B). A first storage element and multiplier are used to generate a first product data element, as discussed above with respect to memory cell BCX; and a second storage element and multiplier are used to generate a second product data element, as discussed above with respect to memory cell BCX .

加法器係包括一或多個邏輯閘的電子電路,該些邏輯閘用以基於接收的第一及第二資料元(例如,由第一及第二乘法器產生的第一及第二乘積資料元)執行數學運算,例如加法,由此產生與接收的第一及第二資料元的和相等的和資料元。在一些實施例中,加法器用以產生和資料元,該和資料元包括比接收的第一及第二資料元中的各者的位元數目大一的數目個位元。在各種實施例中,加法器包括一或多個全加器(full adder)閘、半加器(half adder)閘、漣波進位加法器(ripple-carry adder)電路、進位保存加法器(carry-save adder)電路、進位選擇加法器(carry-select adder)電路、進位預看加法器(carry-look-ahead adder)電路、或適於執行加法運算的一些或全部的其他電路。 An adder is an electronic circuit that includes one or more logic gates for receiving first and second data elements (eg, first and second product data generated by first and second multipliers) element) performs a mathematical operation, such as addition, thereby producing a sum data element equal to the sum of the received first and second data elements. In some embodiments, the adder is used to generate a sum data element comprising a number of bits that is one greater than the number of bits of each of the received first and second data elements. In various embodiments, the adder includes one or more of a full adder gate, a half adder gate, a ripple-carry adder circuit, a carry-save adder -save adder circuits, carry-select adder circuits, carry-look-ahead adder circuits, or other circuits suitable for performing some or all of the addition operations.

藉由包括用以基於第k個位元A1k至ANk的經選擇的集合的一第一位元及第一儲存的權重資料元產生第一乘積資料元的第一乘法器、用以基於第k個位元A1k至ANk的經選擇的集合的第二位元產生第二乘積資料元的第二乘法器、及耦接至第一乘法器及第二乘法器中的各者的加法器,各記憶體單元BX2用以基於第k個位元A1k至ANk的經選擇的集合中的第一及第二位元及對應於記憶體陣列120B內給定的記憶體單元BX2的位置的第一及第二權重資料元產生和資料元S11至SML。在一些實施例中,記憶體單元BX2包括下文關於第3B圖論述的記憶體單元300B。 by including a first multiplier to generate a first product data element based on a first bit element of the selected set of kth bits A1k-ANk and the first stored weight data element for generating the first product data element based on the kth bit The second bit of the selected set of bits A1k through ANk produces a second multiplier of the second product data element, and an adder coupled to each of the first multiplier and the second multiplier, each Memory cell BX2 is used to base the first and second bits in the selected set of kth bits A1k through ANk and the first and second bits corresponding to the location of a given memory cell BX2 within memory array 120B. The second weight data element is generated and data elements S11 to SML are generated. In some embodiments, memory cell BX2 includes memory cell 300B discussed below with respect to Figure 3B.

加法器樹122係包括多個加法器層(第1A圖及第1B圖中未示出)的電子電路,該多個加法器層中第一層用以接收複數個資料元,例如乘積資料元P11至PMN或和資料元S11至SML,且最後層包括單個加法器,該單個加法器用以基於接收的複數個資料元產生一資料元,例如求和資料元SD1至SDM。在一些實施例中,第一層及最後層之間一或多個連續層中的各者用以接收由前一層產生的第一數目個和資料元,且基於第一數目個和資料元產生第二數目個和資料元,第二數目係第一數目的一半。囙此,總層數個層包括第一及最後層及各連續層(若存在)。在一些實施例中,加法器樹122包括下文關於第4圖論述的加法器樹400。 The adder tree 122 is an electronic circuit comprising a plurality of adder layers (not shown in FIGS. 1A and 1B ), the first of which is used to receive a plurality of data elements, such as product data elements P11 to PMN or sum data elements S11 to SML, and the last layer includes a single adder for generating a data element based on the received plurality of data elements, eg summing data elements SD1 to SDM. In some embodiments, each of the one or more consecutive layers between the first layer and the last layer is used to receive a first number of sum data elements generated by the previous layer, and is generated based on the first number of sum data elements A second number of sum data elements, the second number being half of the first number. Here, the total number of layers includes the first and last layers and each successive layer (if any). In some embodiments, adder tree 122 includes adder tree 400 discussed below with respect to FIG. 4 .

加法器樹122由此用以接收一複數數目個資料元, 該複數數目等於二的一數目次冪,該數目等於總層數,資料元的數目由此係總層數的二元指數。在第1A圖描繪的實施例中,記憶體陣列120A包括加法器樹122的各實例,加法器樹122包括總層數個層,使得二的總層數次冪等於N個乘積資料元,例如P11至P1N。在第1B圖描繪的實施例中,記憶體陣列120B包括加法器樹122的各實例,加法器樹122包括總層個層,使得二的總層數次冪等於L個和資料元,例如S11至S1L。 The adder tree 122 is thus used to receive a plurality of data elements, The complex number equals two to the power of a number equal to the total number of layers, and the number of data elements is thus a binary exponent of the total number of layers. In the embodiment depicted in FIG. 1A, memory array 120A includes instances of adder tree 122 that includes a total level number of levels such that the total level power of two equals N product data elements, eg P11 to P1N. In the embodiment depicted in FIG. 1B, memory array 120B includes instances of adder tree 122 that includes a total of levels such that the total level of two powers equals L sum data elements, eg, S11 to S1L.

在一些實施例中,加法器樹122包括範圍自2至9的總層數個層。在一些實施例中,加法器樹122包括範圍自4至7的總層數個層。 In some embodiments, the adder tree 122 includes a total number of levels ranging from 2 to 9. In some embodiments, the adder tree 122 includes a total number of levels ranging from 4 to 7 levels.

在一些實施例中,加法器樹122的各層中的各加法器用以產生對應和資料元,該和資料元包括比前一層的和資料元(或在第一層的情況下,接收的複數個資料元中的資料元)的位元數目大一的數目個位元。 In some embodiments, each adder in each level of the adder tree 122 is used to generate a corresponding sum data element comprising a more A number of bits greater than the number of bits in the data element in the data element).

在第1A圖描繪的一些實施例中,加法器樹122包括第一層,該第一層用以接收乘積資料元P11至PMN,該些乘積資料元包括等於儲存在各記憶體單元BCX中的權重資料元的位元數目的第一數目個位元,及最後層,該最後層用以產生求和資料元SD1至SDM,該些求和資料元包括第二數目個位元,該第二數目等於第一數目加上等於加法器樹122中總層數的一值。 In some embodiments depicted in FIG. 1A, the adder tree 122 includes a first level for receiving product data elements P11 through PMN, the product data elements including equal to the amount stored in each memory cell BCX a first number of bits of the number of bits of weight data elements, and a last layer for generating summed data elements SD1 to SDM, the summed data elements including a second number of bits, the second The number is equal to the first number plus a value equal to the total number of levels in the adder tree 122 .

在第1B圖中描繪的一些實施例中,加法器樹122包括第一層,該第一層用以接收和資料元S11至SML, 該些和資料元包括比儲存在各記憶體單元BX2中的權重資料元的位元數目大1的第一數目個位元,及最後層,該最後層用以產生求和資料元SD1至SDM,該些求和資料元包括第二數目個位元,該第二數目等於第一數目加上等於加法器樹122中總層數的一值。 In some embodiments depicted in Figure 1B, the adder tree 122 includes a first level for receiving and data elements S11 through SML, The sum data elements include a first number of bits greater by 1 than the number of bits of the weight data elements stored in each memory unit BX2, and a last layer for generating the sum data elements SD1 to SDM , the summation data elements include a second number of bits equal to the first number plus a value equal to the total number of levels in the adder tree 122 .

I/O電路130係經由一或多個字線、一或多個位元線及/或一或多個資料線(未示出)耦接至控制信號匯流排CTRLB及耦接至記憶體陣列120A的各記憶體單元BCX或記憶體陣列120B的各記憶體單元BX2的各儲存元件的一或多個I/O連接的電子電路。I/O電路130由此用以響應於接收自控制信號匯流排CTRLB的一或多個控制信號CTRL,在寫入操作中將各記憶體單元BCX或BX2程式化為一或多個邏輯狀態,且使儲存在各記憶體單元BCX或BX2中的一或多個邏輯狀態可在讀取操作中經存取。 The I/O circuit 130 is coupled to the control signal bus CTRLB and to the memory array via one or more word lines, one or more bit lines, and/or one or more data lines (not shown) An electronic circuit to which one or more I/Os of each storage element of each memory cell BCX of 120A or each memory cell BX2 of memory array 120B are connected. The I/O circuit 130 is thus configured to program each memory cell BCX or BX2 to one or more logic states during a write operation in response to one or more control signals CTRL received from the control signal bus CTRLB, And make one or more logic states stored in each memory cell BCX or BX2 accessible in a read operation.

累加器140係耦接至控制信號匯流排CTRLB的電子電路,且包括在反饋配置中集體耦接的一或多個加法器、一或多個資料暫存器及一或多個移位器(第1A圖及第1B圖中未示出)。一或多個加法器耦接至加法器樹122,且由此用以接收求和資料元SD1至SDM中的一者,各求和資料元SD1至SDM係對應於基於計數器k自選擇電路110輸出的第k個位元A1k至ANk的依序經選擇的集合的H個求和資料元SD1至SDM序列中的一者。 Accumulator 140 is an electronic circuit coupled to the control signal bus CTRLB, and includes one or more adders, one or more data registers, and one or more shifters ( 1A and 1B not shown). One or more adders are coupled to the adder tree 122 and are thereby used to receive one of the summed data elements SD1 to SDM, each of which corresponds to the self-select circuit 110 based on the counter k One of the H summed data elements SD1 to SDM sequences of the sequentially selected set of kth bits A1k to ANk of the output.

一或多個加法器進一步用以接收自一或多個移位 器輸出的移位資料元,且基於移位資料元及求和資料元SD1至SDM中的一者產生內部和資料元。一或多個資料暫存器用以自一或多個加法器接收內部和資料元、儲存內部和資料元、及將儲存的內部和資料元輸出至一或多個移位器及輸出至輸出埠O1至OM中的對應一者。一或多個移位器用以接收自一或多個資料暫存器輸出的儲存的內部資料元,且藉由將儲存的內部資料元在MSB方向或LSB方向上移動一個位元來產生移位資料元。 The one or more adders are further for receiving from the one or more shifts The shifted data elements output by the processor are generated, and an internal sum data element is generated based on one of the shifted data elements and the summed data elements SD1 to SDM. One or more data registers for receiving internal sum data elements from one or more adders, storing internal sum data elements, and outputting the stored internal sum data elements to one or more shifters and to output ports The corresponding one of O1 to OM. One or more shifters for receiving stored internal data elements output from one or more data registers and generating shifts by shifting the stored internal data elements by one bit in the MSB direction or the LSB direction data element.

累加器140由此用以響應於在控制信號匯流排CTRLB上接收的一或多個控制信號CTRL執行累加運算,從而儲存的內部求和資料元隨著求和資料元SD1至SDM的序列中的各一者的接收而增加。一或多個控制信號CTRL基於及/或包括計數器k資訊,且由此用以使累加運算與第k個位元A1k至ANk的集合的依序選擇相協調,使得儲存的內部資料元經移位且與選擇的求和資料元SD1至SDM相加,同步於第k個位元的集合的依序產生的時序及MSB/LSB方向。 The accumulator 140 is thus used to perform an accumulation operation in response to one or more control signals CTRL received on the control signal bus CTRLB such that the stored internal summed data elements follow the sequence of the summed data elements SD1 to SDM. Each one is received and increased. One or more control signals CTRL are based on and/or include counter k information, and are thereby used to coordinate the accumulation operation with the sequential selection of the kth set of bits A1k to ANk such that the stored internal data elements are shifted bits and added to the selected summed data elements SD1 through SDM, synchronizing to the sequentially generated timing and MSB/LSB direction of the kth set of bits.

在操作中,基於在第k個位元A1k至ANk的集合的H個位元的跨度上迴圈計數器k的累加運算的執行及求和資料元SD1至SDM的對應H個實例使儲存在一或多個資料暫存器中的內部資料元作為部分和PS1至PSM中的對應一者在對應輸出埠O1至OM上輸出。 In operation, the corresponding H instances of summing data elements SD1 to SDM are stored in a The internal data elements in the data registers or registers are output on corresponding output ports O1 to OM as parts and a corresponding one of PS1 to PSM.

控制電路150係一種電子電路,該電子電路用以藉由產生控制信號CTRL及在控制信號匯流排CTRLB上 輸出控制信號CTRL控制記憶體電路100A或100B的操作。在操作中,根據上文及下文論述的實施例,藉由選擇電路110、記憶體陣列120A或120B、I/O電路130、及累加器140自控制信號匯流排CTRLB接收控制信號CTRL。在一些實施例中,控制電路150用以產生包括及/或基於一或多個計時信號的控制信號CTRL。 The control circuit 150 is an electronic circuit for generating the control signal CTRL and on the control signal bus CTRLB by the electronic circuit The output control signal CTRL controls the operation of the memory circuit 100A or 100B. In operation, the control signal CTRL is received from the control signal bus CTRLB by the selection circuit 110, the memory array 120A or 120B, the I/O circuit 130, and the accumulator 140 in accordance with the embodiments discussed above and below. In some embodiments, the control circuit 150 is used to generate a control signal CTRL that includes and/or is based on one or more timing signals.

在各種實施例中,控制電路150包括硬體處理器152及非暫態電腦可讀儲存媒體154。電腦可讀儲存媒體154除其他外,經編碼(即儲存)電腦程式碼(即一組可執行指令)。由硬體處理器152執行指令表示(至少部分地)一種記憶體電路操作工具,該記憶體電路操作工具實現例如下文關於第9圖論述的方法900及/或下文關於第10圖論述的方法1000的部分或全部(以下,所述過程及/或方法)。 In various embodiments, the control circuit 150 includes a hardware processor 152 and a non-transitory computer-readable storage medium 154 . The computer-readable storage medium 154, among other things, encodes (ie stores) computer code (ie, a set of executable instructions). Execution of instructions by hardware processor 152 represents (at least in part) a memory circuit operating tool implementing, for example, method 900 discussed below with respect to FIG. 9 and/or method 1000 discussed below with respect to FIG. 10 part or all of (hereinafter, the process and/or method).

在各種實施例中,處理器152經由I/O介面電耦接至電腦可讀儲存媒體154,且經由匯流排電耦接至網路(細節未示出)。網路介面連接至網路(未示出),使得處理器152及電腦可讀儲存媒體154能夠經由網路連接至外部元件。處理器152用以執行在電腦可讀儲存媒體154中編碼的電腦程式碼,以便使控制電路150及記憶體電路100A或100B可用於執行所述處理及/或方法的部分或全部。在一或多個實施例中,處理器152係中央處理單元(central processing unit;CPU)、多處理器、分散式處理系統、專用積體電路(application specific integrated circuit;ASIC)及/或適當的處理單元。 In various embodiments, the processor 152 is electrically coupled to the computer-readable storage medium 154 via an I/O interface and to a network via a bus (details not shown). The network interface is connected to a network (not shown) so that the processor 152 and the computer-readable storage medium 154 can be connected to external components via the network. Processor 152 is used to execute computer code encoded in computer readable storage medium 154 so that control circuit 150 and memory circuit 100A or 100B may be used to perform some or all of the processes and/or methods. In one or more embodiments, the processor 152 is a central processing unit (CPU), a multiprocessor, a distributed processing system, an application specific integrated circuit integrated circuit; ASIC) and/or a suitable processing unit.

在一或多個實施例中,電腦可讀儲存媒體154係電子、磁性、光學、電磁、紅外及/或半導體系統(或設備或裝置)。舉例而言,電腦可讀儲存媒體154包括半導體或固態記憶體、磁帶、可移動電腦磁片、RAM、SRAM、DRAM、唯獨記憶體(read-only memory;ROM)、硬磁片、及/或光碟。在使用光碟的一或多個實施例中,電腦可讀儲存媒體154包括光碟唯讀記憶體(compact disk-read only memory;CD-ROM)、光碟讀/寫(compact disk-read/write;CD-R/W)及/或數位視訊光碟(digital video disc;DVD)。 In one or more embodiments, computer-readable storage medium 154 is an electronic, magnetic, optical, electromagnetic, infrared, and/or semiconductor system (or device or device). For example, the computer-readable storage medium 154 includes semiconductor or solid-state memory, magnetic tape, removable computer disk, RAM, SRAM, DRAM, read-only memory (ROM), hard disk, and/or or disc. In one or more embodiments using optical disks, the computer-readable storage medium 154 includes compact disk-read only memory (CD-ROM), compact disk-read/write (CD) -R/W) and/or digital video disc (DVD).

在一或多個實施例中,電腦可讀儲存媒體154儲存用以使控制電路150產生控制信號的電腦程式碼,以便可用於執行所述過程及/或方法的部分或全部。在一或多個實施例中,電腦可讀儲存媒體154亦儲存有助於執行所述過程及/或方法的部分或全部的資訊。 In one or more embodiments, the computer-readable storage medium 154 stores computer program code for causing the control circuit 150 to generate control signals so as to be usable for performing some or all of the processes and/or methods. In one or more embodiments, computer-readable storage medium 154 also stores information that facilitates performing some or all of the processes and/or methods.

藉由上文論述的組態,各記憶體電路100A及100B能夠在操作中在輸入資料匯流排IDB上接收輸入資料元A1至AN、使用選擇電路110依序選擇第k個位元A1k至ANk的組合、在記憶體單元BCX或BX2的各行C1至CM處接收位元A1k至ANk的經選擇的集合的序列、及使用記憶體單元BCX或BX2及對應加法器樹122執行同步的一系列數學運算,從而在輸出埠O1至OM上輸出部分和PS1至PSM。藉由包括記憶體陣列120A或 120B,記憶體電路100A或100B各自用以執行記憶體內運算,從而基於輸入資料元A1至AN及儲存的權重資料元產生至少一個部分和PS1至PSM。與記憶體陣列不包括用以執行記憶體內運算的元件的方法相比,此類記憶體電路能夠使用較小的面積及較低的功率位凖來產生部分和。 With the configuration discussed above, each memory circuit 100A and 100B can in operation receive input data elements A1 through AN on the input data bus IDB, select the kth bit A1k through ANk in sequence using selection circuit 110 , receiving a sequence of selected sets of bits A1k through ANk at each row C1 through CM of memory cell BCX or BX2, and performing a series of mathematics for synchronization using memory cell BCX or BX2 and the corresponding adder tree 122 operation, thereby outputting the partial sum PS1 to PSM on the output ports O1 to OM. by including the memory array 120A or 120B, each of the memory circuits 100A or 100B is configured to perform in-memory operations to generate at least one partial sum PS1 to PSM based on the input data elements A1 to AN and the stored weight data elements. Such memory circuits are capable of generating partial sums using a smaller area and lower power bits than methods in which the memory array does not include elements to perform in-memory operations.

第2圖係根據一些實施例的選擇電路的圖。選擇電路200,在一些實施例中亦稱為多工電路200,可用作上文關於第1A圖及第1B圖論述的選擇電路110。選擇電路200包括耦接至輸入資料匯流排IDB的資料暫存器200R,及耦接至資料暫存器200R及控制信號匯流排CTRLB的複數N個多工器M1至MN。 FIG. 2 is a diagram of a selection circuit according to some embodiments. Selection circuit 200, also referred to as multiplexing circuit 200 in some embodiments, may be used as selection circuit 110 discussed above with respect to Figures 1A and 1B. The selection circuit 200 includes a data register 200R coupled to the input data bus IDB, and a plurality of N multiplexers M1 to MN coupled to the data register 200R and the control signal bus CTRLB.

資料暫存器200R包括耦接至輸入資料匯流排IDB的端子的第一集合(未示出)且由此用以接收包括各輸入資料元A1至AN的H個位元的位元資料,及臨時儲存位元資料。在各種實施例中,資料暫存器200R用以在操作中平行或串列地接收位元資料。資料暫存器200R包括耦接至多工器M1至MN的端子的第二集合(未示出)且由此用以在操作中輸出各輸入資料元A1至AN的各H個位元(在第2圖中描繪為A11至A1H、A21至A2H、......、AN1至ANH)至多工器M1至MN。 The data register 200R includes a first set (not shown) of terminals coupled to the input data bus IDB and is thereby used to receive H bits of bit data including each of the input data elements A1 through AN, and Temporary storage of bit data. In various embodiments, the data register 200R is used to receive bit data in parallel or in series in operation. The data register 200R includes a second set (not shown) of terminals coupled to the multiplexers M1 through MN and thereby used to output each H bits of each input data element A1 through AN in operation (in the first 2 are depicted as A11 to A1H, A21 to A2H, ..., AN1 to ANH) to multiplexers M1 to MN.

多工器M1至MN對應於輸入資料元A1至AN,使得各多工器M1至MN包括用以接收對應資料元A1至AN的H個位元的端子的集合(未標記)。各多工器M1至 MN包括對應輸出端子M1O至MNO,且由此用以在操作中響應於自控制信號匯流排CTRLB上接收的一或多個控制信號CTRL,在對應輸出端子M1O至MNO上輸出對應資料元A1至AN的第k個位元A1k至ANk的經選擇的集合。多工器M1至MN及一或多個控制信號CTRL用以在操作中同時輸出各資料元A1至AN的的相同第k個位元,由此基於上文論述的計數器k產生第k個位元A1k至ANk的集合。 The multiplexers M1-MN correspond to the input data elements A1-AN, such that each multiplexer M1-MN includes a set (not labeled) of terminals to receive the H bits of the corresponding data elements A1-AN. Each multiplexer M1 to MN includes corresponding output terminals M10-MNO, and is thereby used in operation to output corresponding data elements A1 to MNO on corresponding output terminals M10-MNO in response to one or more control signals CTRL received from control signal bus CTRLB. The selected set of kth bits Alk to ANk of AN. The multiplexers M1-MN and one or more control signals CTRL are used to simultaneously output the same k-th bit of each data element A1-AN in operation, thereby generating the k-th bit based on the counter k discussed above The set of elements A1k to ANk.

選擇電路200由此用以能夠執行上文關於選擇電路110及第1A圖及第1B圖論述的操作。藉由包括選擇電路200作為選擇電路110,各記憶體電路100A及100B能夠實現上述益處。 Selection circuit 200 is thus used to be able to perform the operations discussed above with respect to selection circuit 110 and Figures 1A and 1B. By including the selection circuit 200 as the selection circuit 110, each of the memory circuits 100A and 100B can achieve the above-mentioned benefits.

第3A圖及第3B圖係根據一些實施例的記憶體單元300A及300B各自的圖。記憶體單元300A,在一些實施例中亦稱為位元單元300A,可用作上文關於第1A圖論述的記憶體單元BCX的一或多個實例,及記憶體單元300B,在一些實施例中亦稱為位元單元300B,可用作上文關於第1B圖論述的記憶體單元BX2的一或多個實例。 Figures 3A and 3B are diagrams of each of memory cells 300A and 300B, according to some embodiments. Memory cell 300A, also referred to as bit cell 300A in some embodiments, may be used as one or more instances of memory cell BCX discussed above with respect to FIG. 1A, and memory cell 300B, in some embodiments Also referred to as bit cell 300B in , may be used as one or more instances of memory cell BX2 discussed above with respect to FIG. 1B.

為了說明的目的,簡化了各記憶體單元300A及300B。在各種實施例中,記憶體單元300A或300B中的一或兩者包括除第3A圖及第3B圖描繪的那些之外的各種元件,或以其它方式佈置以執行下文論述的操作。在各種實施例中,記憶體單元300A或300B包括至一或多個字線、一或多個位線、及/或一或多個資料線(未示出)的複 數個電連接,且由此至上文關於第1A圖及第1B圖論述的I/O電路130,經由該些電連接,下文論述的權重資料元WTmn及WTm(n+1)經儲存及/或經存取。 Each of the memory cells 300A and 300B are simplified for illustrative purposes. In various embodiments, one or both of the memory cells 300A or 300B include various elements in addition to those depicted in Figures 3A and 3B, or are otherwise arranged to perform the operations discussed below. In various embodiments, memory cell 300A or 300B includes multiplexing to one or more word lines, one or more bit lines, and/or one or more data lines (not shown) Several electrical connections, and thus to I/O circuit 130 discussed above with respect to Figures 1A and 1B, through which the weight data elements WTmn and WTm(n+1) discussed below are stored and/or or accessed.

各記憶體單元300A及300B包括耦接至乘法器MUL1的儲存單元SU1。記憶體單元300B亦包括耦接至乘法器MUL2的儲存單元SU2,及耦接至乘法器MUL1及MUL2中的各者的加法器ADD。 Each memory unit 300A and 300B includes a storage unit SU1 coupled to a multiplier MUL1. Memory cell 300B also includes storage unit SU2 coupled to multiplier MUL2, and adder ADD coupled to each of multipliers MUL1 and MUL2.

儲存單元SU1用以儲存權重資料元WTmn,及儲存單元SU2用以儲存權重資料元WTm(n+1)。在一些實施例中,指示符m對應於M數目個行C1至CM中的一者,且指示符n對應於記憶體陣列120A或120B的N數目個資料列中的一者。 The storage unit SU1 is used for storing the weight data element WTmn, and the storage unit SU2 is used for storing the weight data element WTm(n+1). In some embodiments, indicator m corresponds to one of the M number of rows C1-CM, and indicator n corresponds to one of the N number of data columns of memory array 120A or 120B.

在各種實施例中,各儲存單元SU1及SU2用以儲存各自的權重資料元WTmn或WTm(n+1),該些權重資料元包括一單個位元或多個位元。在一些實施例中,儲存單元SU1或SU2中的一或兩者用以儲存包括範圍自1至16的數目個位元的對應權重資料元WTmn或WTm(n+1)。在一些實施例中,儲存單元SU1或SU2中的一或兩者用以儲存包括範圍自4至8的數目個位元的對應權重資料元WTmn或WTm(n+1)。在一些實施例中,儲存單元SU1或SU2中的一或兩者用以儲存包括可程式化數目個位元的對應權重資料元WTmn或WTm(n+1)。 In various embodiments, the storage units SU1 and SU2 are used to store respective weight data elements WTmn or WTm(n+1), the weight data elements including a single bit or multiple bits. In some embodiments, one or both of the storage units SU1 or SU2 are used to store the corresponding weight data element WTmn or WTm(n+1) comprising a number of bits ranging from 1 to 16. In some embodiments, one or both of the storage units SU1 or SU2 are used to store the corresponding weight data element WTmn or WTm(n+1) comprising a number of bits ranging from 4 to 8. In some embodiments, one or both of the storage units SU1 or SU2 are used to store the corresponding weight data element WTmn or WTm(n+1) comprising a programmable number of bits.

各乘法器MUL1及MUL2用以執行乘法運算,該乘法運算包括與給定的乘法器MUL1或MUL2耦接至的 對應儲存單元SU1或SU2的位元數目相等的數目個位元。乘法器MUL1用以自儲存單元SU1接收權重資料元WTmn及在第3A圖及第3B圖中表示為Ank的第k個位元A1k至ANk中的一第一者,且將乘積輸出為乘積資料元Pmn。 Each of the multipliers MUL1 and MUL2 is used to perform a multiplication operation including a given multiplier MUL1 or MUL2 coupled to The number of bits corresponding to the number of bits of the storage unit SU1 or SU2 is equal. The multiplier MUL1 is used to receive the weight data element WTmn and a first of the k-th bits A1k to ANk denoted Ank in Figures 3A and 3B from the storage unit SU1, and output the product as product data Yuan Pmn.

在一些實施例中,例如,其中記憶體單元300A用作記憶體單元BCX的那些實施例中,基於指示符m及n的乘積資料元Pmn對應於上文關於第1A圖論述的乘積資料元P11至Pmn中的一者。記憶體單元300A由此用以能夠執行上文關於記憶體單元BCX及第1A圖論述的操作。 In some embodiments, such as those in which memory cell 300A is used as memory cell BCX, product data element Pmn based on indicators m and n corresponds to product data element P11 discussed above with respect to FIG. 1A to one of the Pmns. Memory cell 300A is thus used to be able to perform the operations discussed above with respect to memory cell BCX and Figure 1A.

乘法器MUL2用以自儲存單元SU2接收權重資料元WTm(n+1)及在第3B圖中表示為第A(n+1)k的第k個位元A1k至ANk中的第二一者,且將乘積輸出為乘積資料元Pm(n+1)。 The multiplier MUL2 is used to receive the weight data element WTm(n+1) and the second one of the kth bits A1k to ANk denoted as A(n+1)k in FIG. 3B from the storage unit SU2 , and output the product as the product data element Pm(n+1).

加法器ADD用以接收具有對應乘法器MUL1或MUL2的位元數目個位元的各乘積資料元Pmn及Pm(n+1),執行加法運算,及將和輸出為具有比各乘積資料元Pmn及Pm(n+1)的位元數目大一的數目個位元的和資料元Sm1。在一些實施例中,指示符1對應於記憶體陣列120B的L數目個記憶體單元BX2列中的一者。 The adder ADD is used to receive each product data element Pmn and Pm(n+1) having a bit number corresponding to the multiplier MUL1 or MUL2, perform an addition operation, and output the sum as having a ratio of each product data element Pmn And the number of bits of Pm(n+1) is greater than the sum data element Sm1 of the number of bits. In some embodiments, indicator 1 corresponds to one of the L number of memory cells BX2 columns of memory array 120B.

在一些實施例中,例如,記憶體單元300B用作記憶體單元BX2的那些實施例中,基於指示符m及1的和資料元Sm1對應於上文關於第1B圖論述的和資料元 S11至Sm1中的一者。記憶體單元300B由此用以能夠執行上文關於記憶體單元BX2及第1B圖論述的操作。 In some embodiments, such as those in which memory cell 300B is used as memory cell BX2, the sum data element Sm1 based on indicators m and 1 corresponds to the sum data element discussed above with respect to FIG. 1B One of S11 to Sm1. Memory cell 300B is thus used to be able to perform the operations discussed above with respect to memory cell BX2 and Figure IB.

藉由包括記憶體單元300A作為記憶體單元BCX的一或多個實例或包括記憶體單元300B作為記憶體單元BX2的一或多個實例,對應記憶體電路100A或100B能夠實現上述益處。 By including memory cell 300A as one or more instances of memory cell BCX or including memory cell 300B as one or more instances of memory cell BX2, the corresponding memory circuit 100A or 100B can realize the above-described benefits.

第4圖係根據一些實施例的加法器樹400的圖。加法器樹400可用作上文關於第1A圖及第1B圖論述的加法器樹122。加法器樹400包括u數目個的加法器層ADD1至ADDu。 Figure 4 is a diagram of an adder tree 400 in accordance with some embodiments. Adder tree 400 may be used as adder tree 122 discussed above with respect to Figures 1A and 1B. The adder tree 400 includes u number of adder layers ADD1 to ADDu.

第一加法器層包括用以接收U(=2u)數目個和資料元SUM11至SUM1U的加法器ADD1,第一層由此包括U/2數目個加法器ADD1。在一些實施例中,例如,其中加法器樹400用作記憶體陣列120A的行C1至CM中的加法器樹122的那些實施例中,和資料元SUM11至SUM1U對應於由記憶體單元BCX的對應行輸出的複數個乘積資料元,例如,上文關於第1A圖論述的由行C1輸出的乘積資料元P11至P1N。在一些實施例中,例如,其中加法器樹400用作記憶體陣列120B的行C1至CM中的加法器樹122的那些實施例中,和資料元SUM11至SUM1U對應於由記憶體單元BX2的對應行輸出的複數個和資料元,例如,上文關於第1B圖論述的由行C1輸出的和資料元S11至S1L。 The first adder layer includes adders ADD1 to receive U(=2 u ) number and data elements SUM11 to SUM1U, the first layer thus includes U/2 number of adders ADD1 . In some embodiments, such as those in which adder tree 400 is used as adder tree 122 in rows C1 to CM of memory array 120A, and data elements SUM11 to SUM1U correspond to the Corresponding to a plurality of product data elements output by a row, eg, product data elements P11 through P1N output by row C1 discussed above with respect to FIG. 1A . In some embodiments, such as those in which adder tree 400 is used as adder tree 122 in rows C1 through CM of memory array 120B, and data elements SUM11 through SUM1U correspond to the The plurality of sum data elements output by the corresponding line, eg, the sum data elements S11 through S1L output by line C1 discussed above with respect to FIG. 1B.

各加法器ADD1用以對和資料元的對應接收對 (例如和資料元SUM11至SUM1U中的SUM11及SUM12)執行加法運算,且將和輸出為和資料元SUM21至SUM2(U/2)中的對應一者。加法器ADD1用以接收和資料元SUM11至SUM1U,和資料元SUM11至SUM1U包括第一位元數目個位元,例如,上文關於第1A圖論述的乘積資料元P11至PMN的位元數目個位元或上文關於第1B圖論述的和資料元S11至SML的位元數目個位元,且輸出包括比第一位元數目大一的第二位元數目個位元的和資料元SUM21至SUM2(U/2)。 Each adder ADD1 is used to pair and receive the corresponding pair of data elements An addition operation is performed (eg, with SUM11 and SUM12 of data elements SUM11 to SUM1U), and the sum is output as a corresponding one of sum data elements SUM21 to SUM2(U/2). The adder ADD1 is used to receive the sum data elements SUM11 to SUM1U, and the sum data elements SUM11 to SUM1U include the first number of bits, eg, the number of bits of the product data elements P11 to PMN discussed above with respect to FIG. 1A bits or the number of bits of sum data elements S11 through SML discussed above with respect to Figure 1B, and the output includes sum data element SUM21 of a second number of bits that is one greater than the first number of bits to SUM2(U/2).

第二加法器層包括U/4數目個加法器ADD2。各加法器ADD2用以對和資料元的對應接收對(例如和資料元SUM21至SUM2(U/2)中的SUM21及SUM22)執行加法運算,且將和輸出為和資料元SUM31至SUM3(U/4)中的對應一者。加法器ADD2用以接收包括第二位元數目個位元的和資料元SUM21至SUM2(U/2),且輸出包括比第二位元數目大一的第三位元數目個位元的和資料元SUM31至SUM3(U/4)。 The second adder layer includes U/4 number of adders ADD2. Each adder ADD2 is used to perform an addition operation on the corresponding received pair of the sum data elements (such as SUM21 and SUM22 in the sum data elements SUM21 to SUM2 (U/2)), and output the sum as the sum data elements SUM31 to SUM3 (U /4) in the corresponding one. The adder ADD2 is used for receiving the sum data elements SUM21 to SUM2(U/2) including the second number of bits, and outputting the sum including the third number of bits which is one greater than the second number of bits Data elements SUM31 to SUM3 (U/4).

最後加法器層包括單個加法器ADDu,該單個加法器用以對自前一層的加法器接收的一對和資料元SUMu1及SUMu2執行加法運算,且將和作為求和資料元SDm輸出。加法器ADDu用以接收包括第四位元數目個位元的和資料元SUMu1及SUMu2,且輸出包括比第四位元數目大一且等於第一位元數目加上數目u的第五位元數目個位元的求和資料元SDm。在一些實施例中,例如, 其中加法器樹400用作加法器樹122的那些實施例中,求和資料元SDm對應於上文關於第1A圖及第1B圖論述的求和資料元SD1至SDm中的一者。 The final adder layer includes a single adder ADDu to perform an addition operation on a pair of sum data elements SUMu1 and SUMu2 received from the adder of the previous layer, and output the sum as a sum data element SDm. The adder ADDu is used to receive the sum data elements SUMu1 and SUMu2 including the fourth number of bits, and the output includes a fifth bit greater than the fourth number of bits and equal to the first number of bits plus the number u The summation data element SDm of the number of bits. In some embodiments, for example, In those embodiments in which adder tree 400 is used as adder tree 122, summing data element SDm corresponds to one of summing data elements SD1-SDm discussed above with respect to Figures 1A and 1B.

在各種實施例中,加法器樹400包括第4圖中描繪的第二層及最後層之間的一或多個附加加法器層,各附加層經與上文論述的第一層、第二層及最後層的組態一致的組態,使得在運算中,基於接收的和資料元SUM11至SUM1U產生求和資料元SDm。在一些實施例中,加法器樹400不包括第二加法器層ADD2,且由此包括總計u=2個層,使得在運算中,基於總計U=4個和資料元SUM11至SUM1U產生求和資料元SDm。 In various embodiments, adder tree 400 includes one or more additional adder layers between the second and last layers depicted in FIG. 4, each additional layer being The configurations of the layers and the last layer are configured so that in the operation, the sum data elements SDm are generated based on the received sum data elements SUM11 to SUM1U. In some embodiments, the adder tree 400 does not include the second adder layer ADD2, and thus includes a total of u=2 layers, such that in operation a sum is generated based on a total of U=4 sum data elements SUM11 to SUM1U Data element SDm.

在一些實施例中,加法器樹400由此包括範圍自2至9的總層數個層。在一些實施例中,加法器樹400由此包括範圍自4至7的總層數個層。 In some embodiments, the adder tree 400 thus includes a total number of levels ranging from 2 to 9. In some embodiments, the adder tree 400 thus includes a total number of levels ranging from 4 to 7 levels.

加法器樹400由此用以能夠執行上文關於加法器樹122及第1A圖及第1B圖論述的操作。藉由包括加法器樹400作為加法器樹122,各記憶體電路100A及100B能夠實現上述益處。 Adder tree 400 is thus used to enable the operations discussed above with respect to adder tree 122 and Figures 1A and 1B. By including the adder tree 400 as the adder tree 122, each of the memory circuits 100A and 100B can realize the above-described benefits.

第5圖係根據一些實施例的累加器500的圖。累加器500,在一些實施例中亦稱為部分和電路500,可用作上文關於第1A圖及第1B圖論述的累加器140。累加器500包括耦接至資料暫存器R1及移位器SH1中的各者的加法器ADDA。移位器SH1亦耦接至資料暫存器R1,使得加法器ADDA、資料暫存器R1及移位器SH1由此在 反饋配置中集體耦接。 Figure 5 is a diagram of an accumulator 500 in accordance with some embodiments. Accumulator 500, also referred to as partial sum circuit 500 in some embodiments, may be used as accumulator 140 discussed above with respect to Figures 1A and 1B. Accumulator 500 includes an adder ADDA coupled to each of data register R1 and shifter SH1 . The shifter SH1 is also coupled to the data register R1, so that the adder ADDA, the data register R1 and the shifter SH1 are thus in Collective coupling in feedback configuration.

加法器ADDA用以在運算中接收上文關於第4圖論述的求和資料元SDm。在一些實施例中,例如,其中累加器500用作累加器140的那些實施例中,求和資料元SDm對應於上文關於第1A圖及第1B圖論述的求和資料元SD1至SDM中的一者。 The adder ADDA is used to receive, in operation, the summation data element SDm discussed above with respect to FIG. 4 . In some embodiments, such as those in which accumulator 500 is used as accumulator 140, summation data element SDm corresponds to summation data elements SD1 through SDM discussed above with respect to Figures 1A and 1B one of the.

加法器ADDA進一步用以在運算中接收自移位器SH1輸出的移位資料元SDE,及基於移位資料元SDE及求和資料元SDm產生內部和資料元IDE。資料暫存器R1用以自加法器ADDA接收內部和資料元IDE、儲存內部和資料元IDE、及將儲存的內部和資料元IDE輸出至移位器SH1及輸出埠Om。移位器SH1用以接收自資料暫存器R1輸出的儲存的內部資料元IDE,及藉由在MSB方向或LSB方向上將儲存的內部資料元IDE移位一個位元產生移位資料元SDE。 The adder ADDA is further configured to receive the shifted data element SDE output from the shifter SH1 in operation, and to generate an internal sum data element IDE based on the shifted data element SDE and the summed data element SDm. The data register R1 is used to receive the internal and data element IDE from the adder ADDA, store the internal and data element IDE, and output the stored internal and data element IDE to the shifter SH1 and the output port Om. The shifter SH1 is used to receive the stored internal data element IDE output from the data register R1, and generate the shifted data element SDE by shifting the stored internal data element IDE by one bit in the MSB direction or the LSB direction .

累加器500由此用以響應於在控制信號匯流排CTRLB(第5圖中未示出)上接收的一或多個控制信號CTRL執行累加運算,從而儲存的內部和資料元IDE隨著接收的求和資料元SDm的序列中的各者增加。由此,在求和資料元SDm的複數個實例上執行累加運算使儲存在資料暫存器R1中的內部資料元IDE在輸出埠Om上輸出為求和資料元SDm的該些實例的部分和PSm。 The accumulator 500 is thus configured to perform accumulation operations in response to one or more control signals CTRL received on the control signal bus CTRLB (not shown in Figure 5) such that the stored internal and data elements IDE follow the received data. Each of the sequence of sum data elements SDm is incremented. Thus, an accumulation operation is performed on the multiple instances of the summation data element SDm so that the internal data element IDE stored in the data register R1 outputs on the output port Om as the partial sum of the instances of the summation data element SDm PSm.

在一些實施例中,例如,其中累加器500用作累加器140的那些實施例中,輸出埠Om上輸出的部分和 PSm對應於上文關於第1A圖及第1B圖論述的對應輸出埠O1至Om上輸出的部分和PS1至PSM中的一者。 In some embodiments, such as those in which accumulator 500 is used as accumulator 140, the partial sum of the output on output port Om PSm corresponds to the portion of the output on corresponding output ports O1-Om discussed above with respect to Figures 1A and 1B and one of PS1-PSM.

累加器500由此用以能夠執行上文關於累加器140及第1A圖及第1B圖論述的運算。藉由包括累加器500作為累加器140,各記憶體電路100A及100B能夠實現上述益處。 Accumulator 500 is thus used to be able to perform the operations discussed above with respect to accumulator 140 and Figures 1A and 1B. By including the accumulator 500 as the accumulator 140, each of the memory circuits 100A and 100B can realize the above-mentioned benefits.

第6圖係根據一些實施例的記憶體陣列120A或120B(120A/120B)的部分的圖。第6圖包括記憶體單元BCX或BX2(BCX/BX2)的多個實例及加法器樹122的實例,各實例在上文關於第1A圖及第1B圖中經論述。在第6圖描繪的實施例中,記憶體陣列120A/120B亦包括耦接於記憶體單元BCX/BX2與加法器樹122之間的多工器MA。為了說明的目的,簡化了第6圖。 FIG. 6 is a diagram of a portion of a memory array 120A or 120B (120A/120B) in accordance with some embodiments. Figure 6 includes multiple instances of memory cells BCX or BX2 (BCX/BX2) and instances of adder tree 122, each of which is discussed above with respect to Figures 1A and 1B. In the embodiment depicted in FIG. 6, the memory array 120A/120B also includes a multiplexer MA coupled between the memory cells BCX/BX2 and the adder tree 122. Figure 6 is simplified for illustrative purposes.

多工器用以將記憶體單元BCX/BX2中的一或多者選擇性地耦接至加法器樹122,使得在操作中,自記憶體單元BCX/BX2輸出的資料元(例如,上文關於第1A圖及第1B圖中論述的乘積資料元P11至PMN或和資料元S11至SML)響應於在控制信號匯流排CTRLB上接收的一或多個控制信號CTRL(第6圖中未示出)而經選擇性地傳播至加法器樹122。在各種實施例中,記憶體單元BCX/BX2包括於C1至CM的相同行中或包括於C1至CM的分開的行中,加法器樹122由此在C1至CM的兩個行之間共用。 The multiplexer is used to selectively couple one or more of the memory cells BCX/BX2 to the adder tree 122 such that in operation, data elements output from the memory cells BCX/BX2 (eg, above with respect to Product data elements P11 through PMN or sum data elements S11 through SML, discussed in Figures 1A and 1B, are responsive to one or more control signals CTRL (not shown in Figure 6) received on control signal bus CTRLB ) is selectively propagated to the adder tree 122. In various embodiments, memory cells BCX/BX2 are included in the same row of C1-CM or in separate rows of C1-CM, whereby adder tree 122 is shared between the two rows of C1-CM .

藉由上文論述的組態,記憶體電路100A或100B 包括記憶體陣列120A/120B,該記憶體陣列包括在多個記憶體單元BCX/BX2之間共用的至少一個加法器樹122。在此實施例中,與其中記憶體陣列不包括在多個記憶體單元之間共用的至少一個加法器樹的方法相比,記憶體電路100A或100B由此能夠使用較小的面積來產生部分和。 With the configuration discussed above, memory circuit 100A or 100B A memory array 120A/120B is included that includes at least one adder tree 122 shared among a plurality of memory cells BCX/BX2. In this embodiment, the memory circuit 100A or 100B is thus able to use a smaller area to generate sections compared to approaches in which the memory array does not include at least one adder tree shared among multiple memory cells and.

第7A圖及第7B圖係根據一些實施例的記憶體電路100A或100B(100A/100B)的部分的圖。第7A圖及第7B圖的各者描繪非限制性實例,該非限制性實例中兩或多個部分和PS1至PSM經組合,且為了說明的目的簡化了該實例。 7A and 7B are diagrams of portions of memory circuits 100A or 100B (100A/100B), according to some embodiments. Each of FIGS. 7A and 7B depicts a non-limiting example in which two or more parts and PS1-PSM are combined, and the example is simplified for illustrative purposes.

在第7A圖描繪的實施例中,記憶體電路100A/100B包括對應記憶體陣列120A/120B及累加器140的兩個實例,以上各者在上文關於第1A圖及第1B圖中經論述。在第7A圖描繪的實施例中,累加器140的第一實例的輸出埠O2耦接至累加器140的第二實例,使得在運算中,部分和PS2由累加器140的第二實例接收且包括於輸出埠O1上輸出的部分和PS1中。在一些實施例中,累加器140的兩個實例用以在運算中選擇性地輸出部分和PS1及PS2而不包括部分和PS2於部分和PS1中,例如,響應於在控制信號匯流排CTRLB上接收的一或多個控制信號CTRL(第7A圖及第7B圖中未示出)。 In the embodiment depicted in Figure 7A, memory circuits 100A/100B include two instances of corresponding memory arrays 120A/120B and accumulators 140, each of which is discussed above with respect to Figures 1A and 1B . In the embodiment depicted in FIG. 7A, output port O2 of the first instance of accumulator 140 is coupled to the second instance of accumulator 140 such that in operation, the partial sum PS2 is received by the second instance of accumulator 140 and Included in the output on output port O1 and in PS1. In some embodiments, two instances of accumulator 140 are used to selectively output partial sums PS1 and PS2 without including partial sum PS2 in partial sum PS1 in an operation, eg, in response to a signal on control signal bus CTRLB One or more control signals CTRL (not shown in FIGS. 7A and 7B ) are received.

在第7B圖描繪的實施例中,記憶體電路100A/100B包括行C1至C4中的各者(包括累加器140的對應實例),行C1至C4中的各者用以在運算中接收輸 入資料匯流排IDB上的輸入資料元A1至AN及輸出對應部分和PS1至PS4,如上文關於第1A圖及第1B圖所論述。在第7B圖描繪的實施例中,記憶體電路100A/100B亦包括加法器ADDSUM,該加法器用以在運算中接收各部分和PS1至PS4,且基於各部分和PS1至PS4產生組合的部分和OSUM。在一些實施例中,記憶體電路100A/100B用以在運算中選擇性地輸出部分和PS1至PS4而不輸出部分和OSUM,例如,響應於在控制信號匯流排CTRLB上接收的一或多個控制信號CTRL。 In the embodiment depicted in FIG. 7B, memory circuits 100A/100B include each of rows C1-C4 (including corresponding instances of accumulators 140) used to receive inputs in operations Input data elements A1 through AN and output counterparts and PS1 through PS4 on the input data bus IDB, as discussed above with respect to Figures 1A and 1B. In the embodiment depicted in FIG. 7B, the memory circuit 100A/100B also includes an adder ADDSUM for receiving the partial sums PS1 to PS4 in operation and generating a combined partial sum based on the partial sums PS1 to PS4 OSUM. In some embodiments, memory circuit 100A/100B is used to selectively output partial sum PS1 to PS4 without outputting partial sum OSUM in an operation, eg, in response to one or more received on control signal bus CTRLB control signal CTRL.

第7A圖及第7B圖中描繪的非限制性實例的各者中,在運算中,自記憶體電路100A/100B輸出的部分和PS1或OSUM係基於各輸入資料元A1至AN與行C1至CM中的兩或多者中的記憶體單元BCX/BX2組合。部分和PS1或OSUM由此基於記憶體單元BCX/BX2中儲存的權重資料元的組合的位元產生,使得與部分和PS1至PSM係基於輸入資料元A1至AN與行C1至CM中的單一一者組合的實施例相比,部分和PS1或OSUM的解析度或精度經提高。 In each of the non-limiting examples depicted in Figures 7A and 7B, in operation, the partial sum PS1 or OSUM output from memory circuit 100A/100B is based on each input data element A1 through AN and row C1 through AN. A memory cell BCX/BX2 combination in two or more of the CMs. The partial sum PS1 or OSUM is thus generated based on the bits of the combination of weight data elements stored in memory cells BCX/BX2, such that the partial sum PS1 to PSM is based on the input data elements A1 to AN and the cells in rows C1 to CM. The resolution or precision of the partial and PS1 or OSUM is improved compared to the combined embodiments.

在一些實施例中,記憶體單元BCX/BX2包括權重資料元,該些權重資料元包括總計四個位元,使得在運算中,在第7A圖描繪的實施例中,部分和PS1係基於總計八個位元的權重資料元輸出,且在第7B圖描繪的實施例中,部分和OSUM係基於總計十六個位元的權重資料元輸出。 In some embodiments, memory cells BCX/BX2 include weight data elements that include a total of four bits, such that in operation, in the embodiment depicted in Figure 7A, the partial sum PS1 is based on the total Eight bits of weight data element output, and in the embodiment depicted in Figure 7B, the partial sum OSUM is based on a total of sixteen bits of weight data element output.

第7A圖及第7B圖中描繪的實施例係為了說明的目的而提供的非限制性實例。在各種實施例中,記憶體電路100A/100B以其他方式組態為基於組合的儲存的權重資料元產生一或多個部分和,從而與部分和不基於組合的權重資料元的實施例相比提高了解析度。 The embodiments depicted in Figures 7A and 7B are non-limiting examples provided for illustrative purposes. In various embodiments, memory circuits 100A/100B are otherwise configured to generate one or more partial sums based on combined stored weight data elements, as compared to embodiments where partial sums are not based on combined weight data elements Improved resolution.

第8圖係根據一些實施例的記憶體電路工作電壓VDD的圖。第8圖描繪的實施例中,工作電壓VDD係記憶體電路100A或100B在其中工作的功率域的電源電壓,如上文關於第1A圖及第1B圖所論述。工作電壓VDD包括0V、VDD1及VDD2三個電源電壓位凖,三個電壓電壓位準中電源電壓位凖VDD1大於電源電壓位凖VDD2。第8圖中描繪的電壓位準及時序關係,例如相對歷時及/或強度及序列,係為了說明的目的而提供的非限制性實例。 FIG. 8 is a diagram of the operating voltage VDD of a memory circuit according to some embodiments. In the embodiment depicted in FIG. 8, the operating voltage VDD is the supply voltage of the power domain in which the memory circuit 100A or 100B operates, as discussed above with respect to FIGS. 1A and 1B. The working voltage VDD includes three power supply voltage levels of 0V, VDD1 and VDD2. Among the three voltage voltage levels, the power supply voltage level VDD1 is greater than the power supply voltage level VDD2. The voltage levels and timing relationships, such as relative durations and/or intensities and sequences depicted in Figure 8, are non-limiting examples provided for illustrative purposes.

電源電壓位凖0V表示掉電模式,在該模式下不執行記憶體電路操作。在一些實施例中,記憶體陣列120A或120B包括儲存單元SU1及SU2(若存在),儲存單元SU1及SU2包括非揮發性記憶體單元,使得權重資料元WTmn及/或WTm(n+1)在工作電壓VDD具有0V電壓位凖的一或多個歷時經留存。 A power supply voltage level of 0V indicates a power-down mode in which no memory circuit operations are performed. In some embodiments, memory array 120A or 120B includes storage cells SU1 and SU2 (if present), which include non-volatile memory cells such that weight data elements WTmn and/or WTm(n+1) One or more durations with a voltage level of 0V at the operating voltage VDD are retained.

電源電壓位凖VDD1表示I/O模式,在該模式期間,一或多個權重元件WTmn及/或WTm(n+1)在一或多個寫入操作中經儲存於記憶體單元BCX及或BX2中,及/或在一或多個讀取操作中經存取。 The supply voltage level VDD1 represents the I/O mode during which one or more weight elements WTmn and/or WTm(n+1) are stored in memory cells BCX and/or in one or more write operations BX2, and/or accessed in one or more read operations.

電源電壓位凖VDD2表示運算模式,在該模式期 間,一或多個記憶體內運算的運算經執行,如上文關於第1A圖及第1B圖所論述及/或如下文關於方法900及1000所論述。 The power supply voltage bit VDD2 indicates the operation mode, during this mode During this time, operations of one or more in-memory operations are performed, as discussed above with respect to FIGS. 1A and 1B and/or as discussed below with respect to methods 900 and 1000 .

第8圖中描繪的實施例中,藉由在小於電源電壓位凖VDD1的電源電壓位凖VDD2下執行記憶體內運算,與在具有與I/O模式相同的電壓位凖的運算模式下執行記憶體內運算的方法相比,功耗降低。 In the embodiment depicted in Figure 8, by performing in-memory operations at a supply voltage level less than the supply voltage level of VDD1, VDD2, the memory is performed in an operation mode with the same voltage level as the I/O mode. Compared with the in vivo computing method, the power consumption is reduced.

第9圖係根據一些實施例的操作記憶體電路的方法900的流程圖。方法900可用於記憶體電路,例如,上文關於第1A圖及第1B圖論述的記憶體電路100A或100B。 FIG. 9 is a flowchart of a method 900 of operating a memory circuit in accordance with some embodiments. The method 900 may be used with a memory circuit, such as the memory circuit 100A or 100B discussed above with respect to Figures 1A and 1B.

第9圖中描繪方法900的操作的序列僅用於說明;方法900的操作能夠同時經執行或以不同於第9圖中描繪的序列執行。在一些實施例中,除第9圖中描繪的那些操作之外的操作在第9圖描繪的操作之前、之間、期間及/或之後經執行。在一些實施例中,方法900的操作係操作IC(例如,感測器、RF裝置、處理器、邏輯或信號處理電路、或類似者)的方法的子集。在各種實施例中,方法900的一或多個操作係下文關於第10圖論述的方法1000的子集。 The sequence of operations of method 900 depicted in FIG. 9 is for illustration only; the operations of method 900 can be performed concurrently or in a different sequence than depicted in FIG. 9 . In some embodiments, operations other than those depicted in FIG. 9 are performed before, between, during, and/or after the operations depicted in FIG. 9 . In some embodiments, the operating method of method 900 is a subset of the method of operating an IC (eg, a sensor, RF device, processor, logic or signal processing circuit, or the like). In various embodiments, one or more operations of method 900 are subsets of method 1000 discussed below with respect to FIG. 10 .

方法900係部分和計算的非限制性實例,其中部分和PS1至PSM的實例PSm係針對行C1至CM中的對應第m一者計算,如上文關於第1A圖至第5圖所論述。在第9圖中描繪的實施例中,計數器k在各輸入資料元A1 至AN的各H數目個位元中的各者之間迴圈。在計數器k的各值處,對應於求和資料元SD1至SDM的實例,求和資料元Pk經計算為對應第k個位元及權重資料元Wmn的N數目個乘積的和。部分和PSm係藉由累加的資料元Pk產生,如下所述。 Method 900 is a non-limiting example of a partial sum calculation, where an instance PSm of the partial sums PS1-PSM is calculated for the corresponding m-th one of rows C1-CM, as discussed above with respect to FIGS. 1A-5 . In the embodiment depicted in Figure 9, the counter k loops between each of the H number of bits of each of the input data elements A1 to AN. At each value of counter k, corresponding to the instance of summed data elements SD1 through SDM, summed data element Pk is computed as the sum of the N number of products corresponding to the kth bit and weight data element Wmn. The partial sum PSm is generated from the accumulated data elements Pk , as described below.

在操作910處,將計數器k初始化為零。在一些實施例中,初始化計數器k包括使用上文關於第1A圖至第5圖論述的控制電路150。 At operation 910, the counter k is initialized to zero. In some embodiments, initializing the counter k includes using the control circuit 150 discussed above with respect to FIGS. 1A-5 .

在一些實施例中,將計數器k初始化為零包括將一或多個資料暫存器的內容設定為零。在一些實施例中,將計數器k初始化為零包括將資料暫存器R1的內部資料元IDE設定為零,如上文關於累加器500及第5圖所論述。 In some embodiments, initializing the counter k to zero includes setting the contents of one or more data registers to zero. In some embodiments, initializing counter k to zero includes setting the internal data element IDE of data register R1 to zero, as discussed above with respect to accumulator 500 and FIG. 5 .

在操作920處,計數器k增加一,且基於計數器k的值產生求和資料元Pk。產生求和資料元Pk包括在自n=1至N定義的範圍上對對應於記憶體陣列120A或120B中的N個資料列中的各者的乘積資料元進行求和。各第n個乘積資料元係對應於計數器n及k的輸入資料元An的第k個位元Ank與對應權重資料元Wmn或Wm(n+1)相乘。在n=1至N的範圍上對所得乘積資料元進行求和,由此產生對應於求和資料元SD1至SDM的實例的求和資料元PkAt operation 920, the counter k is incremented by one, and a summation data element Pk is generated based on the value of the counter k . Generating summed data elements Pk includes summing product data elements corresponding to each of the N data columns in memory array 120A or 120B over a range defined from n=1 to N. Each nth product data element is multiplied by the kth bit Ank of the input data element An corresponding to the counters n and k by the corresponding weight data element Wmn or Wm(n+1). The resulting product data elements are summed over the range of n=1 to N, thereby producing summed data elements Pk corresponding to instances of summed data elements SD1 to SDM.

在一些實施例中,產生求和資料元Pk包括使用對應於行C1至CM中的第m一者的加法器樹122在n=1 至n=N的範圍上對由記憶體單元BCX輸出的乘積資料元Pmn進行求和,如上文關於記憶體電路100A及第1A圖至第5圖所論述。在一些實施例中,產生求和資料元Pk包括使用記憶體單元BX2在l=1至1=L的範圍上產生求和資料元Sm1至SmL,及使用對應於行C1至CM中的第m者的加法器樹122對由記憶體單元BX2輸出的和乘積資料元Sm1至SmL進行求和,如上文關於記憶體電路100B和第1圖所論述。 In some embodiments, generating the summation data element Pk includes using the adder tree 122 corresponding to the mth one of rows C1 to CM over the range of n=1 to n=N to output by memory cell BCX The product data elements Pmn of are summed as discussed above with respect to memory circuit 100A and Figures 1A-5. In some embodiments, generating summation data element Pk includes generating summation data elements Sm1-SmL over a range of 1=1 to 1=L using memory cell BX2, and using memory cell BX2 corresponding to the first in rows C1-CM M's adder tree 122 sums the sum-product data elements Sm1-SmL output by memory cell BX2, as discussed above with respect to memory circuit 100B and FIG. 1 .

在操作930處,部分和資料元Ok基於計數器k的值產生。產生部分和資料元Ok包括當計數器k具有值1時將部分和資料元Ok初始化為求和資料元Pk的第一值,及當計數器k具有非1的值時,移位資料元Ok的前一值(Ok-1)且與求和資料元Pk的當前值相加。 At operation 930, the partial sum data element Ok is generated based on the value of the counter k. Generating the partial sum data element Ok includes initializing the partial sum data element Ok to the first value of the sum data element P k when the counter k has a value of 1, and when the counter k has a value other than 1, shifting the data element The previous value of O k (O k-1 ) is added to the current value of the summing data element P k .

移位部分和資料元Ok的前一值對應於將前一值增加或減少一有效位元。在一些實施例中,將計數器k自1遞增至H對應於增加輸入資料元A1至AN的有效位元,且移位部分和資料元Ok的前一值對應於將前一值增加一有效位元,即,將前一值乘以2。在一些實施例中,將計數器k自1遞增至H對應於減少輸入資料元A1至AN的有效位元,且移位部分和資料元Ok的前一值對應於將前一值減少一有效位元,即,將前一值除以2。 Shifting the part and the previous value of the data element Ok corresponds to increasing or decreasing the previous value by one significant bit. In some embodiments, incrementing counter k from 1 to H corresponds to incrementing the valid bits of input data elements A1 through AN, and shifting the portion and the previous value of data element Ok corresponds to incrementing the previous value by a valid bit bits, that is, multiply the previous value by 2. In some embodiments, incrementing counter k from 1 to H corresponds to decrementing the valid bits of input data elements A1 through AN, and shifting the portion and the previous value of data element Ok corresponds to decrementing the previous value by a valid bit bits, that is, divide the previous value by 2.

在一些實施例中,產生部分和資料元Ok包括將部分和資料元PS1至PSM設定為對應求和資料元SD1至SDM的第一實例,係藉由:儲存對應求和資料元SD1至 SDM的第一實例為資料暫存器R1中的內部資料元IDE,使用移位器SH1移位內部資料元IDE,及將求和資料元SD1至SDM的後續實例與移位器資料元SDE相加,如上文關於第1A圖至第5圖所論述。 In some embodiments, generating the partial sum data elements Ok includes setting the partial sum data elements PS1 to PSM as the first instance of the corresponding summed data elements SD1 to SDM by: storing the corresponding summed data elements SD1 to SDM The first instance of is the internal data element IDE in data register R1, shifter SH1 is used to shift the internal data element IDE, and subsequent instances of summed data elements SD1 to SDM are added with the shifter data element SDE , as discussed above with respect to Figures 1A-5.

在操作940處,將計數器k與數目H進行比較。若計數器k小於數目H,則方法900返回至操作920;及若計數器k等於數目H,則方法900繼續操作950。 At operation 940, the counter k is compared to the number H. If counter k is less than number H, method 900 returns to operation 920; and if counter k is equal to number H, method 900 continues with operation 950.

在操作950處,將部分和資料元PSm設定為對應於計數器k=H的部分和資料元Ok的最終值。在一些實施例中,數目H=4,遞增計數器k對應於增加輸入資料元A1至AN的有效位元,且設定部分和資料元PSm為部分和資料元Ok的最終值,由PSm=

Figure 110118621-A0305-02-0037-2
Figure 110118621-A0305-02-0037-3
給出,其中計數器k=1對應於一LSB及係數20,及計數器k=4對應於一MSB及係數23。 At operation 950, part sum data element PSm is set to the final value of part sum data element Ok corresponding to counter k =H. In some embodiments, number H=4, incrementing counter k corresponds to incrementing the valid bits of input data elements A1 to AN, and setting part sum data element PSm to the final value of part sum data element Ok , given by PSm=
Figure 110118621-A0305-02-0037-2
Figure 110118621-A0305-02-0037-3
is given where counter k=1 corresponds to one LSB and coefficient 2 0 , and counter k=4 corresponds to one MSB and coefficient 2 3 .

在一些實施例中,設定部分和資料元PSm為部分和資料元Ok的最終值包括在對應第m個輸出埠O1至OM上輸出第m個部分和資料元PS1至PSM,如上文關於第1A圖至第5圖所論述。 In some embodiments, setting part and data element PSm to the final value of part and data element Ok includes outputting the mth part and data element PS1 to PSM on the corresponding mth output port O1 to OM, as described above with respect to the th 1A to 5 discussed.

藉由使用記憶體電路100A或100B執行方法900的操作的部分或全部,部分和基於記憶體內運算產生,由此實現上文關於記憶體電路100A及100B論述的益處。 The benefits discussed above with respect to memory circuits 100A and 100B are achieved by using memory circuit 100A or 100B to perform some or all of the operations of method 900, in part and based on in-memory operations.

第10圖係根據一些實施例的操作記憶體電路的方 法1000的流程圖。方法1000可用於記憶體電路,例如,上文關於第1A圖及第1B圖論述的記憶體電路100A或100B。 FIG. 10 is a method of operating a memory circuit according to some embodiments. Flowchart of Law 1000. The method 1000 may be used with a memory circuit, such as the memory circuit 100A or 100B discussed above with respect to Figures 1A and 1B.

第10圖中描繪方法1000的操作的序列僅用於說明;方法1000的操作能夠以不同於第10圖中描繪的序列經執行。在一些實施例中,除第10圖中描繪的那些操作之外的操作在第10圖中描繪的操作之前、之間、期間及/或之後執行。在一些實施例中,方法1000的操作係操作IC(例如,感測器、RF裝置、處理器、邏輯或信號處理電路、或類似者)的方法的子集。在一些實施例中,方法1000的操作係操作CNN或其他類神經網路的方法的子集。 The sequence of operations of method 1000 depicted in FIG. 10 is for illustration only; the operations of method 1000 can be performed in a different sequence than that depicted in FIG. 10 . In some embodiments, operations other than those depicted in FIG. 10 are performed before, between, during, and/or after the operations depicted in FIG. 10 . In some embodiments, the operating method of method 1000 is a subset of the method of operating an IC (eg, a sensor, RF device, processor, logic or signal processing circuit, or the like). In some embodiments, the operations of method 1000 operate on a subset of CNN or other neural network-like methods.

在操作1010處,在一些實施例中,第一權重資料元儲存在一記憶體單元行的各記憶體單元中。在一些實施例中,在記憶體單元行的各記憶體單元中儲存第一權重資料元包括在記憶體單元的複數個行中儲存權重資料。在各種實施例中,在各記憶體單元行的各記憶體單元中儲存第一權重資料元包括使用I/O電路130將權重資料元WTmn及/或WTm(n+1)儲存在行C1至CM的記憶體單元BCX或BX2中,如上文關於第1A圖至第5圖所論述。 At operation 1010, in some embodiments, the first weight data element is stored in each memory cell of a row of memory cells. In some embodiments, storing the first weight data element in each memory cell of the row of memory cells includes storing the weight data in the plurality of rows of memory cells. In various embodiments, storing the first weight data element in each memory cell of each memory cell row includes using the I/O circuit 130 to store the weight data element WTmn and/or WTm(n+1) in rows C1 to In the memory cell BCX or BX2 of the CM, as discussed above with respect to Figures 1A-5.

在一些實施例中,在記憶體單元行的各記憶體單元中儲存第一權重資料元包括在大於執行操作1020至1070的一些或全部的第二電源電壓位準的第一電源電壓位準下操作記憶體電路。在一些實施例中,在第一電源電壓位準下操作記憶體電路包括在電源電壓位準VDD1下操 作記憶體電路,及在第二電源電壓位準下操作記憶體電路包括在電源電壓位準VDD2下操作記憶體電路,如上文關於第8圖所論述。 In some embodiments, storing the first weighted data element in each memory cell of the row of memory cells includes at a first power supply voltage level that is greater than a second power supply voltage level at which some or all of operations 1020-1070 are performed Operates memory circuits. In some embodiments, operating the memory circuit at the first supply voltage level includes operating at the supply voltage level VDD1 Making the memory circuit, and operating the memory circuit at the second supply voltage level includes operating the memory circuit at the supply voltage level VDD2, as discussed above with respect to FIG. 8 .

在操作1020處,在一些實施例中,複數個輸入資料元中的各輸入資料元的H個位元中的第k個位元的集合自選擇電路同時輸出。在一些實施例中,同時輸出該些輸入資料元中的各輸入資料元的H個位元中的第k個位元的集合包括自選擇電路110輸出輸入資料元A1至AN的第k個位元A1k至ANk的集合,如上文關於第1A圖至第5圖所論述。 At operation 1020, in some embodiments, a set of kth bits of the H bits of each input data element of the plurality of input data elements is output simultaneously from the selection circuit. In some embodiments, simultaneously outputting the set of the kth bit of the H bits of each of the input data elements includes outputting the kth bit of the input data elements A1 to AN from the selection circuit 110 The set of elements A1k through ANk, as discussed above with respect to Figures 1A-5.

在各種實施例中,同時輸出該些輸入資料元中的各輸入資料元的H個位元中的第k個位元的集合係藉由自LSB至MSB或自MSB至LSB遞增依序輸出第k個位元的集合的一部分。 In various embodiments, simultaneously outputting the kth bit set of the H bits of each of the input data elements is performed by sequentially outputting the kth bit from LSB to MSB or from MSB to LSB in increasing order. part of a set of k bits.

在一些實施例中,同時輸出該些輸入資料元中的各輸入資料元的H個位元中的第k個位元的集合包括在選擇電路處接收該些輸入資料元。在一些實施例中,同時輸出該些輸入資料元中的各輸入資料元的H個位元中的第k個位元的集合包括在選擇電路中儲存該些輸入資料元,例如,儲存在一或多個資料暫存器中。在一些實施例中,同時輸出該些輸入資料元中的各輸入資料元的第k個位元的集合包括使用上文關於第1A圖及第1B圖論述的選擇電路110接收及儲存輸入資料元A1至AN。在一些實施例中,同時輸出該些輸入資料元中的各輸入資料元的第k個位元的集 合包括使用上文關於第2圖論述的選擇電路200。 In some embodiments, simultaneously outputting the kth set of H bits of each of the input data elements includes receiving the input data elements at a selection circuit. In some embodiments, simultaneously outputting the k-th set of the H bits of each of the input data elements includes storing the input data elements in a selection circuit, eg, in a or multiple data registers. In some embodiments, simultaneously outputting the kth set of bits of each of the input data elements includes receiving and storing the input data elements using the selection circuit 110 discussed above with respect to Figures 1A and 1B A1 to AN. In some embodiments, the set of k-th bits of each of the input data elements is output simultaneously The combination includes using the selection circuit 200 discussed above with respect to FIG. 2 .

在一些實施例中,同時輸出該些輸入資料元中的各輸入資料元的H個位元中的第k個位元的集合包括產生及響應於一或多個控制信號,例如,藉由上文關於第1A圖至第5圖論述的控制電路150產生的一或多個控制信號CTRL。 In some embodiments, simultaneously outputting the kth set of H bits of each of the input data elements includes generating and responding to one or more control signals, eg, by the above One or more control signals CTRL generated by the control circuit 150 discussed herein with respect to FIGS. 1A-5.

在一些實施例中,同時輸出該些輸入資料元中的各輸入資料元的H個位元中的第k個位元的集合包括執行下文關於第9圖論述的方法900的一些或全部。 In some embodiments, simultaneously outputting the kth set of H bits of each of the input data elements includes performing some or all of the method 900 discussed below with respect to FIG. 9 .

在操作1030處,在一記憶體單元行處接收第k個位元的集合。在各種實施例中,在記憶體單元行處接收第k個位元的集合包括在記憶體單元BCX或BX2行處接收第k個位元A1k至ANk的集合。 At operation 1030, the kth set of bits is received at a row of memory cells. In various embodiments, receiving the kth set of bits at the row of memory cells includes receiving the kth set of bits Alk through ANk at the row of memory cells BCX or BX2.

在一些實施例中,在記憶體單元行處接收第k個位元的集合包括在複數個行的各行處接收第k個位元的集合。在各種實施例中,在該些行處接收第k個位元的集合包括在上文關於第1A圖至第5圖論述的行C1至CM的各者處接收第k個位元A1k至ANk的集合。 In some embodiments, receiving the kth set of bits at the row of memory cells includes receiving the kth set of bits at each of the plurality of rows. In various embodiments, receiving the k-th set of bits at the rows includes receiving the k-th bits A1k-ANk at each of rows C1-CM discussed above with respect to FIGS. 1A-5 collection.

在一些實施例中,在記憶體單元行處接收第k個位元的集合包括執行下文關於第9圖論述的方法900的一些或全部。 In some embodiments, receiving the kth set of bits at the memory cell row includes performing some or all of the method 900 discussed below with respect to FIG. 9 .

在操作1040處,記憶體單元行的各記憶體單元用於將對應輸入資料元的第k個位元與儲存在記憶體單元中的第一權重資料元相乘,由此產生對應第一乘積資料元。 在各種實施例中,使用記憶體單元將對應輸入資料元的第k個位元與儲存在記憶體單元中的第一權重資料元相乘包括使用記憶體單元BCX或BX2將第k個位元A1k至ANk與第一權重資料元相乘,如上文關於第1A圖至第5圖所論述。 At operation 1040, each memory cell of the memory cell row is used to multiply the kth bit of the corresponding input data element by the first weight data element stored in the memory cell, thereby generating a corresponding first product data element. In various embodiments, multiplying the kth bit corresponding to the input data element by the first weight data element stored in the memory cell using the memory cell includes multiplying the kth bit using the memory cell BCX or BX2 A1k through ANk are multiplied by the first weight data element, as discussed above with respect to FIGS. 1A through 5 .

在一些實施例中,將對應輸入資料元的第k個位元與儲存在記憶體單元中的第一權重資料元相乘由此產生對應第一乘積資料元,包括將位元Ank與權重資料元WTmn相乘由此產生乘積資料元Pmn,如上文關於記憶體單元300A及300B及第3A圖及第3B圖所論述。 In some embodiments, the kth bit corresponding to the input data element is multiplied by the first weight data element stored in the memory unit to thereby generate the corresponding first product data element, including multiplying the bit Ank with the weight data The elements WTmn are multiplied thereby resulting in a product data element Pmn, as discussed above with respect to memory cells 300A and 300B and Figures 3A and 3B.

在一些實施例中,使用記憶體單元行的各記憶體單元將該些資料元中的對應輸入資料元的第k個位元與第一權重資料元相乘,包括使用記憶體單元行的各記憶體單元將該些資料元中的另一對應輸入資料元的第k個位元與儲存在記憶體單元中的第二權重資料元相乘,由此產生第二乘積資料元,且將第一乘積資料元與第二乘積資料元相加以產生和資料元。 In some embodiments, each memory cell using a row of memory cells multiplies the kth bit of the corresponding input data element among the data elements by the first weight data element, including using each memory cell of the row of memory cells. The memory unit multiplies the k-th bit of another corresponding input data element among the data elements by the second weight data element stored in the memory unit, thereby generating a second product data element, and multiplying the k-th data element A product data element is added with a second product data element to generate a sum data element.

在一些實施例中,將該些資料元的另一對應輸入資料元的第k個位元與儲存在記憶體單元中的第二權重資料元相乘,由此產生第二乘積資料元,且將第一乘積資料元與第二乘積資料元相加以產生和資料元,包括將位元A(n+1)k與權重資料元WTm(n+1)相乘,由此產生乘積資料元Pm(n+1),且將乘積資料元Pmn與乘積資料元Pm(n+1)相加以產生上文關於記憶體單元300B及第3B 圖論述的和資料元Sm1。 In some embodiments, the k-th bit of another corresponding input data element of the data elements is multiplied by the second weight data element stored in the memory unit, thereby generating a second product data element, and Summing the first product data element and the second product data element to generate the sum data element, including multiplying the bit A(n+1)k by the weight data element WTm(n+1), thereby producing the product data element Pm (n+1), and the product data element Pmn and the product data element Pm(n+1) are added to generate the above for memory cells 300B and 3B Figure discusses and data element Sm1.

在各種實施例中,使用記憶體單元將對應輸入資料元的第k個位元與儲存在記憶體單元中的第一權重資料元相乘,包括使用複數個記憶體單元行(例如,上文關於第1A圖至第5圖論述的行C1至CM)將對應輸入資料元的第k個位元A1k至ANk與複數個第一權重資料元中的對應第一權重資料元相乘。 In various embodiments, multiplying the kth bit corresponding to the input data element by the first weight data element stored in the memory cell using a memory cell, including using a plurality of memory cell rows (eg, above, Rows C1 through CM) discussed with respect to Figures 1A through 5 multiply the kth bit A1k through ANk of the corresponding input data element by a corresponding first weight data element of the plurality of first weight data elements.

在各種實施例中,使用記憶體單元行將對應輸入資料元的第k個位元與儲存在記憶體單元中的第一權重資料元相乘包括產生及響應於一或多個控制信號,例如,由上文第1A圖至第5圖論述的控制電路150產生的一或多個控制信號CTRL。 In various embodiments, multiplying the kth bit corresponding to the input data element by the first weighted data element stored in the memory cell using the memory cell row includes generating and responding to one or more control signals, such as , one or more control signals CTRL generated by the control circuit 150 discussed in FIGS. 1A-5 above.

在一些實施例中,使用記憶體單元行將對應輸入資料元的第k個位元與儲存在記憶體單元中的第一權重資料元相乘包括執行下文關於第9圖論述的方法900的一些或全部。 In some embodiments, multiplying the kth bit of the corresponding input data element by the first weight data element stored in the memory cell using the memory cell row includes performing some of the method 900 discussed below with respect to FIG. 9 or all.

在操作1050處,加法器樹用於基於第一乘積資料元中的各者產生求和資料元。在一些實施例中,使用加法器樹基於第一乘積資料元中的各者產生求和資料元包括使用加法器樹122基於上文關於第1A圖至第5圖論述的乘積資料元Pmn及/或Pm(n+1)產生求和資料元SD1至SDM的實例。 At operation 1050, the adder tree is used to generate a summation data element based on each of the first product data elements. In some embodiments, using the adder tree to generate the summation data elements based on each of the first product data elements includes using the adder tree 122 based on the product data elements Pmn and/or the product data elements Pmn discussed above with respect to FIGS. 1A-5. Or Pm(n+1) yields instances of summed data elements SD1 to SDM.

在一些實施例中,使用加法器樹產生求和資料元包括使用上文關於第4圖論述的加法器樹400。 In some embodiments, using the adder tree to generate the summation data element includes using the adder tree 400 discussed above with respect to FIG. 4 .

在一些實施例中,使用加法器樹產生求和資料元包括使用複數個加法器樹產生複數個求和資料元,例如,如上文關於第1A圖至第5圖論述的求和資料元SD1至SDM。 In some embodiments, generating a summation data element using an adder tree includes generating a plurality of summation data elements using a plurality of adder trees, eg, summation data elements SD1 through as discussed above with respect to FIGS. 1A through 5 SDM.

在一些實施例中,使用加法器樹產生求和資料元包括在加法器樹處接收第一乘積資料元。在一些實施例中,在加法器樹處接收第一乘積資料元包括在加法器樹122處接收乘積資料元P11至PMN,如上文關於第1A圖至第5圖所論述。 In some embodiments, generating the summation data element using the adder tree includes receiving the first product data element at the adder tree. In some embodiments, receiving the first product data element at the adder tree includes receiving the product data element P11 through PMN at the adder tree 122, as discussed above with respect to FIGS. 1A-5.

在一些實施例中,使用加法器樹產生求和資料元包括在加法器樹處接收和資料元。在一些實施例中,在加法器樹處接收和資料元包括在加法器樹122處接收和資料元S11至SML,如上文關於第1A圖至第5圖所論述。 In some embodiments, generating the sum data element using the adder tree includes receiving the sum data element at the adder tree. In some embodiments, receiving the sum data elements at the adder tree includes receiving the sum data elements S11 through SML at the adder tree 122, as discussed above with respect to FIGS. 1A-5.

在一些實施例中,使用加法器樹產生求和資料元包括使用多工器將加法器樹耦接至選擇的記憶體單元,例如,使用上文關於第6圖論述的多工器MA。 In some embodiments, using the adder tree to generate the summation data element includes coupling the adder tree to selected memory cells using a multiplexer, eg, using the multiplexer MA discussed above with respect to FIG. 6 .

在一些實施例中,使用加法器樹基於第一乘積資料元中的各者產生求和資料元包括產生及響應於一或多個控制信號,例如,藉由上文關於第1A圖至第5圖論述的控制電路150產生的一或多個控制信號CTRL。 In some embodiments, generating a summation data element based on each of the first product data elements using the adder tree includes generating and responding to one or more control signals, eg, as described above with respect to FIGS. 1A-5 One or more control signals CTRL are generated by the control circuit 150 discussed in FIG.

在一些實施例中,使用加法器樹基於第一乘積資料元中的各者產生求和資料元包括執行下文關於第9圖論述的方法900的一些或全部。 In some embodiments, generating a summation data element based on each of the first product data elements using the adder tree includes performing some or all of the method 900 discussed below with respect to FIG. 9 .

在操作1060處,累加器用於基於求和資料元產生 部分和。在一些實施例中,使用累加器基於求和資料元產生部分和包括使用累加器140基於對應求和資料元SD1至SDM產生部分和PS1至PSM,如上文關於第1A圖至第5圖所論述。 At operation 1060, the accumulator is used to generate based on the summed data elements part and. In some embodiments, generating the partial sums based on the summed data elements using the accumulator includes generating the partial sums PS1 through PSM based on the corresponding summed data elements SD1 through SDM using the accumulator 140 , as discussed above with respect to FIGS. 1A through 5 .

在一些實施例中,使用累加器產生部分和包括將第一求和資料元與儲存在資料暫存器中且由移位器移位的第二求和資料元相加。在一些實施例中,將第一求和資料元與第二求和資料元相加同步於選擇電路依序輸出第k個位元集合。在一些實施例中,使用累加器產生部分和包括使用累加器500產生部分和PSm,如上文關於第5圖所論述。 In some embodiments, generating the partial sum using the accumulator includes adding a first summed data element to a second summed data element stored in a data register and shifted by a shifter. In some embodiments, adding the first summed data element to the second summed data element is synchronized with the selection circuit to sequentially output the kth bit set. In some embodiments, generating the partial sum using the accumulator includes generating the partial sum PSm using the accumulator 500 , as discussed above with respect to FIG. 5 .

在一些實施例中,使用累加器基於求和資料元產生部分和包括使用複數個累加器產生複數個部分和資料元,例如,如上文關於第1A圖至第5圖論述的部分和資料元PS1至PSM。 In some embodiments, using an accumulator to generate a partial sum based on summing data elements includes generating a plurality of parts and data elements using a plurality of accumulators, eg, as discussed above with respect to FIGS. 1A-5 , the partial sum data element PS1 to PSM.

在一些實施例中,使用該些累加器產生該些部分和包括使用第一累加器基於由第二累加器產生的第二部分和產生第一部分和,例如,使用累加器140的第一實例基於部分和PS2產生部分和PS1,如上文關於第7A圖所論述。 In some embodiments, generating the partial sums using the accumulators includes generating the first partial sums based on the second partial sums generated by the second accumulators using a first accumulator, eg, using a first instance of the accumulator 140 based on Partial Sum PS2 produces Partial Sum PS1, as discussed above with respect to Figure 7A.

在一些實施例中,使用該些累加器產生該些部分和包括使用一加法器基於由多個累加器產生的多個部分和產生一部分和,例如,使用加法器ADDSUM基於部分和PS1至PS4產生部分和OSUM,如上文關於第7B圖所論述。 In some embodiments, generating the partial sums using the accumulators includes generating a partial sum based on the partial sums generated by the accumulators using an adder, eg, generating the partial sums based on the partial sums PS1 to PS4 using the adder ADDSUM section and OSUM, as discussed above with respect to Figure 7B.

在一些實施例中,使用累加器基於求和資料元產生部分和包括產生及響應於一或多個控制信號,例如,藉由上文關於第1A圖至第5圖論述的控制電路150產生的一或多個控制信號CTRL。 In some embodiments, generating a partial sum based on summed data elements using an accumulator includes generating and responding to one or more control signals, eg, generated by the control circuit 150 discussed above with respect to FIGS. 1A-5 . One or more control signals CTRL.

在一些實施例中,使用累加器基於求和資料元產生部分和包括執行下文關於第9圖論述的方法900的一些或全部。 In some embodiments, generating a partial sum based on summing data elements using an accumulator includes performing some or all of the method 900 discussed below with respect to FIG. 9 .

在操作1070處,在一些實施例中,重複操作1010至1060的一些或全部。在一些實施例中,重複操作1010至1060的一些或全部包括同步操作1010至1060的一些或全部的執行。在一些實施例中,重複操作1010至1060的一些或全部包括遞增計數器,例如,上文關於第1A圖至第9圖論述的計數器k。在一些實施例中,重複操作1010至1060的一些或全部包括產生一或多個控制信號,例如,使用控制電路150產生控制信號CTRL中的一或多者,如上文關於第1A圖至第5圖所論述。 At operation 1070, in some embodiments, some or all of operations 1010-1060 are repeated. In some embodiments, repeating some or all of operations 1010-1060 includes synchronizing the performance of some or all of operations 1010-1060. In some embodiments, repeating some or all of operations 1010-1060 includes incrementing a counter, eg, counter k discussed above with respect to FIGS. 1A-9. In some embodiments, repeating some or all of operations 1010-1060 includes generating one or more control signals, eg, using control circuit 150 to generate one or more of control signals CTRL, as described above with respect to FIGS. 1A-5 discussed in the figure.

在一些實施例中,重複操作1010至1060的一些或全部包括執行上文關於第9圖論述的方法900的一些或全部。 In some embodiments, repeating some or all of operations 1010-1060 includes performing some or all of method 900 discussed above with respect to FIG. 9 .

在一些實施例中,重複操作1010至1060的一些或全部包括使用累加器基於H個求和資料元產生部分和,例如,使用累加器140基於對應求和資料元SD1至SDM中的H個實例產生部分和PS1至PSM,如上文關於第1A圖至第5圖所論述。 In some embodiments, repeating some or all of operations 1010-1060 includes generating a partial sum based on the H summed data elements using an accumulator, eg, using accumulator 140 based on H instances of the corresponding summed data elements SD1-SDM Partial sums PS1 through PSM are generated as discussed above with respect to Figures 1A through 5 .

在一些實施例中,重複操作1010至1060的一些或全部包括將由選擇電路輸出的第k個位元的集合與對應第一權重資料元依序相乘,由此產生複數個第一乘積資料元,例如,上文關於第3A圖及第3B圖論述的第一乘積資料元Pmn。 In some embodiments, repeating some or all of operations 1010-1060 includes sequentially multiplying the k-th set of bits output by the selection circuit with the corresponding first weight data element, thereby generating a plurality of first product data elements For example, the first product data element Pmn discussed above with respect to Figures 3A and 3B.

在一些實施例中,重複操作1010至1060的一些或全部包括將由選擇電路輸出的第k個位元的集合與對應第二權重資料元依序相乘,由此產生複數個第二乘積資料元,例如,上文關於第3B圖論述的第二乘積資料元Pm(n+1)。 In some embodiments, repeating some or all of operations 1010-1060 includes sequentially multiplying the k-th set of bits output by the selection circuit with the corresponding second weight data element, thereby generating a plurality of second product data elements For example, the second product data element Pm(n+1) discussed above with respect to Figure 3B.

在一些實施例中,重複操作1010至1060的一些或全部包括使用加法器樹基於該些第一乘積資料元、及在一些實施例中進一步基於該些第二乘積資料元產生H個求和資料元。 In some embodiments, repeating some or all of operations 1010-1060 includes using an adder tree to generate H summation data based on the first product data elements, and in some embodiments further based on the second product data elements Yuan.

在一些實施例中,該些輸入資料元係複數個輸入資料元的集合中的第一複數個輸入資料元,且重複操作1010至1060的一些或全部包括依序接收複數個輸入資料元的集合中的各複數個輸入資料元,及執行操作1010至1060的一些或全部以基於複數個輸入資料元的集合中的各複數個輸入資料元及單一複數個權值資料元來產生一或多個部分和。 In some embodiments, the input data elements are the first plurality of input data elements in the set of the plurality of input data elements, and repeating some or all of operations 1010-1060 includes sequentially receiving the set of the plurality of input data elements each of the plurality of input data elements in , and performing some or all of operations 1010-1060 to generate one or more based on each of the plurality of input data elements in the set of the plurality of input data elements and the single plurality of weight data elements part and.

藉由執行方法1000的操作的一些或全部,基於記憶體內運算產生部分和,由此實現上文關於記憶體電路100A及100B論述的益處。在基於複數個輸入資料元的 集合中的各複數個輸入資料元及單一複數個權值資料元產生一或多個部分和的實施例中,與單一複數個權值資料元未重複用於多個記憶體內部分和運算的方法相比,功率位準進一步降低。 By performing some or all of the operations of method 1000, a partial sum is generated based on in-memory operations, thereby realizing the benefits discussed above with respect to memory circuits 100A and 100B. based on multiple input data elements In embodiments where each plurality of input data elements and a single plurality of weight data elements in a set generate one or more partial sums, a method for partial sum operations with a single plurality of weight data elements not duplicated in multiple memories In contrast, the power level is further reduced.

在一些實施例中,記憶體電路包括用以接收複數個輸入資料元(該些輸入資料元中的各輸入資料元包括等於N的數目個位元)且輸出該些輸入資料元中的各輸入資料元的H個位元中的第k個位元的經選擇的集合的選擇電路、記憶體單元行(記憶體單元行的各記憶體單元包括用以儲存第一權重資料元的第一儲存單元及用以基於第一權重資料元及第k個位元的經選擇的集合中的第一第k個位元產生第一乘積資料元的第一乘法器)及用以基於第一乘積資料元中的各者產生求和資料元的加法器樹。在一些實施例中,第一權重資料元係多位元資料元。在一些實施例中,記憶體單元行的各記憶體單元包括用以儲存第二權重資料元的第二儲存單元,用以基於第二權重資料元及第k個位元的經選擇的集合中的第二第k個位元產生第二乘積資料元的第二乘法器,及用以自第一及第二乘積資料元產生和資料元的加法器,其中加法器樹用以基於和資料元中的各者產生求和資料元。在一些實施例中,求和資料元係H個求和資料元中的一求和資料元,選擇電路用以自第一個位元至第H個位元依序輸出第k個位元的集合,加法器樹用以基於依序輸出第k個位元的集合產生H個求和資料元中的各者,及記憶體電路包括用以基於H個求和資料元產生 部分和的累加器。在一些實施例中,記憶體電路包括控制電路,控制電路用以產生由選擇電路及累加器接收的一或多個控制信號,記憶體電路由此用以同步於選擇電路依序輸出第k個位元的集合產生部分和。在一些實施例中,記憶體單元行係複數個記憶體單元行中的一行,各記憶體單元行用以接收各複數個位元中的H個位元中的第k個位元的經選擇的集合;加法器樹係複數個加法器樹中的一加法器樹,加法器樹耦接至該些記憶體單元行中的對應行;累加器係複數個累加器中的一累加器,累加器耦接至該些加法器樹中的對應加法器樹,且該些累加器中的各累加器用以基於由該些加法器樹中的對應加法器樹產生的H個求和資料元產生對應部分和。在一些實施例中,該些累加器中的至少一累加器用以基於由該些累加器中的另一累加器產生的部分和產生對應部分和。在一些實施例中,各第一儲存單元包括SRAM裝置,SRAM裝置用以儲存第一權重資料元的一些或全部。在一些實施例中,記憶體電路包括I/O電路,I/O電路用以將各第一權重資料元儲存在對應第一儲存單元中。 In some embodiments, the memory circuit includes a means to receive a plurality of input data elements (each of the input data elements includes a number of bits equal to N) and output each of the input data elements A selection circuit of the selected set of the kth bit of the H bits of the data element, the memory cell row (each memory cell of the memory cell row includes a first storage for storing the first weight data element unit and a first multiplier to generate a first product data element based on the first weight data element and the first kth bit in the selected set of the kth bit) and to generate a first product data element based on the first product data Each of the elements produces an adder tree that sums the data elements. In some embodiments, the first weight data element is a multi-bit data element. In some embodiments, each memory cell of a row of memory cells includes a second storage cell for storing a second weight data element in a selected set based on the second weight data element and the kth bit The second k-th bit of the second multiplier for generating the second product data element, and the adder for generating the sum data element from the first and second product data elements, wherein the adder tree is used based on the sum data element Each of them produces a summation data element. In some embodiments, the summation data element is a summation data element among the H summation data elements, and the selection circuit is configured to sequentially output the kth bit from the first bit to the Hth bit. a set, an adder tree for generating each of the H summed data elements based on the set of sequentially outputting the kth bit, and a memory circuit included for generating based on the H summed data elements Accumulator for partial sums. In some embodiments, the memory circuit includes a control circuit for generating one or more control signals received by the selection circuit and the accumulator, whereby the memory circuit is used for sequentially outputting the kth signal in synchronization with the selection circuit A collection of bits yields a partial sum. In some embodiments, a row of memory cells is one of a plurality of rows of memory cells, each row of memory cells being used to receive a selected kth bit of the H bits of the plurality of bits A set of adder trees; an adder tree is an adder tree among a plurality of adder trees, and the adder tree is coupled to a corresponding row among the rows of memory cells; an accumulator is an accumulator among a plurality of accumulators, accumulating The accumulators are coupled to corresponding ones of the adder trees, and each of the accumulators is used to generate corresponding ones based on the H summation data elements generated by the corresponding ones of the adder trees part and. In some embodiments, at least one of the accumulators is used to generate a corresponding partial sum based on a partial sum generated by another of the accumulators. In some embodiments, each first storage unit includes an SRAM device for storing some or all of the first weight data elements. In some embodiments, the memory circuit includes an I/O circuit for storing each first weight data element in a corresponding first storage unit.

在一些實施例中,操作記憶體電路的方法包括在記憶體單元行處接收該些輸入資料元中的各輸入資料元的H數目個位元中的第k個位元的集合;使用記憶體單元行的各記憶體單元將該些資料元中的對應輸入資料元的第k個位元與儲存在記憶體單元中的第一權重資料元相乘,由此產生對應第一乘積資料元;及使用加法器樹基於第一乘積 資料元的各者產生求和資料元。在一些實施例中,使用記憶體單元行的各記憶體單元將該些資料元中的對應輸入資料元的第k個位元與第一權重資料元相乘包括使用記憶體單元行的各記憶體單元將該些資料元中的另一對應輸入資料元的第k個位元與儲存在記憶體單元的第二權重資料元相乘,由此產生第二乘積資料元,及將第一乘積資料元與第二乘積資料元相加以產生和資料元,其中使用加法器樹產生求和資料元係基於對應和資料元的各者。在一些實施例中,方法包括使用選擇電路依序輸出該些輸入資料元中的各輸入資料元的H個位元中的第k個位元的集合,及使用累加器基於H個求和資料元產生部分和,其中使用記憶體單元行的各記憶體單元將該些資料元中的輸入資料元的第k個位元與第一權重資料元相乘包括將各第k個位元與第一權重資料元依序相乘,由此產生複數個第一乘積資料元;及使用加法器樹基於第一乘積資料元的各者產生求和資料元包括使用加法器樹基於該些第一乘積資料元產生H個求和資料元。在一些實施例中,接收該些輸入資料元中的各輸入資料元的第k個位元包括在複數個記憶體單元行的各記憶體單元行處接收第k個位元的集合;使用記憶體單元行的各記憶體單元將第k個位元與第一權重資料元相乘包括使用該些記憶體單元行中的各記憶體單元行的各記憶體單元將第k個位元與儲存在記憶體單元中的對應第一權重資料元相乘,由此產生對應第一乘積資料元;使用加法器樹產生求和資料元包括基於第一乘積資料元使用複數 個加法器樹產生複數個求和資料元;及使用累加器產生部分和包括使用複數個累加器基於對應H個求和資料元產生複數個部分和。在一些實施例中,使用該些累加器產生該些部分和包括使用第一累加器基於由第二累加器產生的第二部分和產生第一部分和。在一些實施例中,使用累加器產生部分和包括將第一求和資料元與儲存在資料暫存器中且由移位器移位的第二求和資料元相加,且將第一求和資料元與第二求和資料元相加係同步於選擇電路依序輸出第k個位元的集合。在一些實施例中,使用選擇電路依序輸出該些輸入資料元中的各輸入資料元的H個位元中的第k個位元的集合包括自LSB至MSB輸出第k個位元的集合。在一些實施例中,方法包括基於第一電源電壓位準將第一權重資料元儲存在記憶體單元行的各記憶體單元中,其中使用記憶體單元行的各記憶體單元將第k個位元與第一權重資料元相乘及使用加法器樹產生求和資料元中的各者係基於低於第一電源電壓位準的第二電源電壓位準。 In some embodiments, a method of operating a memory circuit includes receiving, at a row of memory cells, a set of kth bits of the H number of bits for each of the input data elements; using the memory Each memory cell of the cell row multiplies the kth bit of the corresponding input data element among the data elements by the first weight data element stored in the memory cell, thereby generating the corresponding first product data element; and using an adder tree based on the first product The owners of the data elements generate the summed data elements. In some embodiments, using each memory cell of the row of memory cells to multiply the kth bit of the corresponding input data element of the data elements by the first weight data element includes using each memory of the row of memory cells The volume unit multiplies the k-th bit of another of the data elements corresponding to the input data element by the second weight data element stored in the memory unit, thereby generating a second product data element, and multiplying the first product The data elements are added with the second product data elements to generate sum data elements, wherein the use of the adder tree to generate the summed data elements is based on each of the corresponding sum data elements. In some embodiments, the method includes sequentially outputting a set of kth bits of the H bits of each of the input data elements using a selection circuit, and summing the data based on the H data using an accumulator generating a partial sum of elements, wherein each memory cell using a row of memory cells multiplies the kth bit of the input data element among those data elements by the first weight data element including multiplying each kth bit with the first weight data element A weight data element is sequentially multiplied, thereby generating a plurality of first product data elements; and generating a summation data element based on each of the first product data elements using an adder tree includes using an adder tree based on the first products The data element produces H summed data elements. In some embodiments, receiving the kth bit of each of the input data elements includes receiving the kth set of bits at each of the plurality of memory cell rows; using memory Multiplying the kth bit by the first weight data element by each memory cell of the row of memory cells includes storing the kth bit with the memory cell of each row of memory cells in the row of memory cells. Corresponding first weight data elements in the memory cells are multiplied, thereby generating corresponding first product data elements; using the adder tree to generate summation data elements includes using complex numbers based on the first product data elements generating a plurality of summation data elements; and generating a partial sum using the accumulators includes generating a plurality of partial sums based on the corresponding H summation data elements using the plurality of accumulators. In some embodiments, generating the partial sums using the accumulators includes generating the first partial sums using the first accumulator based on the second partial sum generated by the second accumulator. In some embodiments, generating the partial sum using an accumulator includes adding a first summed data element to a second summed data element stored in a data register and shifted by a shifter, and adding the first summed data element The addition of the sum data element to the second summed data element is synchronized with the selection circuit to sequentially output the set of kth bits. In some embodiments, using the selection circuit to sequentially output the set of kth bits of the H bits of each of the input data elements includes outputting the set of kth bits from LSB to MSB . In some embodiments, the method includes storing the first weighted data element in each memory cell of a row of memory cells based on a first supply voltage level, wherein the kth bit cell is stored using each memory cell of the row of memory cells Each of multiplying the first weight data element and generating the summing data element using the adder tree is based on a second supply voltage level that is lower than the first supply voltage level.

在一些實施例中,記憶體電路包括一選擇電路,選擇電路用以針對各包含H個位元的複數個輸入資料元,將第k個位元的經選擇的集合依序輸出至複數個記憶體單元行中的各記憶體單元行的對應記憶體單元;複數個加法器樹,該些加法器樹中的各加法器樹耦接至該些記憶體單元行中的對應記憶體單元行;及複數個累加器,該些累加器中的各累加器耦接至該些加法器樹的對應加法器樹。各記憶體單元行的各記憶體單元包括乘法器,乘法器用以基於 第k個位元的經選擇的集合中的對應第k個位元及儲存在記憶體單元中的權重資料元產生乘積資料元;該些加法器樹中的各加法器樹用以針對第k個位元的各依序輸出集合,基於對應記憶體單元行的乘積資料元的各者產生求和資料元;及該些累加器中的各累加器用以基於由該些加法器樹中的對應加法器樹產生的求和資料元產生部分和。在一些實施例中,該些加法器樹中的各加法器樹包括第一加法器,第一加法器用以接收第一及第二和資料元及輸出具有第一位元數目個位元的求和資料元;及第二及第三加法器,該第二或第三加法器用以基於對應記憶體單元行的乘積資料元輸出第一及第二和資料元,第一及第二和資料元中的各者具有比第一位元數目少一的第二位元數目個位元。在一些實施例中,該些加法器樹中的至少一加法器樹經由多工器耦接至該些記憶體單元行中的對應記憶體單元行。 In some embodiments, the memory circuit includes a selection circuit for sequentially outputting the selected set of kth bits to the plurality of memories for the plurality of input data elements each comprising H bits Corresponding memory cells of each memory cell row in the memory cell row; a plurality of adder trees, each adder tree in the adder trees is coupled to a corresponding memory cell row in the memory cell rows; and a plurality of accumulators, each of the accumulators is coupled to a corresponding adder tree of the adder trees. Each memory cell of each memory cell row includes a multiplier for Corresponding kth bits in the selected set of kth bits and weight data elements stored in memory cells generate product data elements; each of the adder trees is used for the kth Each sequential output set of bits generates a summation data element based on each of the product data elements corresponding to the row of memory cells; and each of the accumulators is used to generate a summation data element based on each of the accumulators in the adder tree based on the corresponding ones in the adder tree. The summation data elements produced by the adder tree produce partial sums. In some embodiments, each of the adder trees includes a first adder for receiving first and second sum data elements and outputting a result having a first number of bits sum data elements; and second and third adders for outputting first and second sum data elements, first and second sum data elements based on product data elements corresponding to the row of memory cells Each of has a second number of bits that is one less than the first number of bits. In some embodiments, at least one of the adder trees is coupled to a corresponding one of the rows of memory cells via a multiplexer.

前述內容概述幾個實施例的特徵,使得熟習此項技術者可更佳地理解本揭示內容的態樣。熟習此項技術者應瞭解,該些技術者可容易將本揭示內容用作作為設計或修改用於實現本文中介紹的實施例的相同目的及/或達成與本文中介紹的實施例的相同優優點的其他製程及結構的基礎。熟習此項技術者亦應認識到,此等等效構造不背離本揭示內容的精神及範疇,且該些技術者可在不背離本揭示內容的精神及範疇的情況下作出本文中的各種改變、取代及改動。 The foregoing outlines features of several embodiments so that those skilled in the art may better understand aspects of the present disclosure. It should be appreciated by those skilled in the art that those skilled in the art may readily use the present disclosure as a design or modification for carrying out the same purposes and/or achieving the same advantages of the embodiments described herein. Advantages of other processes and the basis of structure. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the present disclosure, and that those skilled in the art can make various changes herein without departing from the spirit and scope of the present disclosure , replacement and modification.

1000:方法 1000: Method

1010:操作 1010: Operation

1020:操作 1020: Operations

1030:操作 1030: Operation

1040:操作 1040: Operation

1050:操作 1050: Operation

1060:操作 1060:Operation

1070:操作 1070:Operation

Claims (10)

一種記憶體電路,包含:一選擇電路,用以接收複數個輸入資料元,該些輸入資料元中的各輸入資料元包含等於H的數目個位元,以及輸出該些輸入資料元中的各輸入資料元的該H個位元的多個第k個位元的一經選擇的集合;一記憶體單元行,該記憶體單元行的各記憶體單元包含:一第一儲存單元,用以儲存一第一權重資料元;及一第一乘法器,用以基於該第一權重資料元及多個第k個位元的該經選擇的集合中的一對應第一第k個位元產生一第一乘積資料元;及一加法器樹,用以基於該些第一乘積資料元中的各者產生一求和資料元。 A memory circuit includes: a selection circuit for receiving a plurality of input data elements, each of the input data elements including a number of bits equal to H, and outputting each of the input data elements a selected set of kth bits of the H bits of the input data element; a row of memory cells, each memory cell of the row of memory cells comprising: a first storage unit for storing a first weight data element; and a first multiplier for generating a corresponding first kth bit based on the first weight data element and the selected set of kth bits first product data elements; and an adder tree for generating a summation data element based on each of the first product data elements. 如請求項1所述之記憶體電路,其中該記憶體單元行的各記憶體單元進一步包含:一第二儲存單元,用以儲存一第二權重資料元;一第二乘法器,用以基於該第二權重資料元及多個第k 個位元的該經選擇的集合中的一對應第二第k個位元產生一第二乘積資料元;及一加法器,用以自該第一乘積資料元及該第二乘積資料元產生一和資料元,其中該加法器樹用以基於該些和資料元中的各者產生該求和資料元。 The memory circuit of claim 1, wherein each memory cell of the memory cell row further comprises: a second storage unit for storing a second weight data element; a second multiplier for based on the second weight data element and a plurality of kth A corresponding second kth bit in the selected set of bits generates a second product data element; and an adder for generating from the first product data element and the second product data element A sum data element, wherein the adder tree is used to generate the sum data element based on each of the sum data elements. 如請求項1所述之記憶體電路,其中該求和資料元係H個求和資料元中的一個求和資料元,該選擇電路用以自一第一個位元至一第H個位元依序輸出多個第k個位元的集合,該加法器樹用以基於該依序輸出多個第k個位元的集合產生該H個求和資料元中的各者,且該記憶體電路進一步包含一累加器,該累加器用以基於該H個求和資料元產生一部分和。 The memory circuit of claim 1, wherein the summation data element is a summation data element among H summation data elements, and the selection circuit is used to select from a first bit to an Hth bit element sequentially outputs a plurality of sets of k-th bits, the adder tree is used to generate each of the H summation data elements based on the sets of outputs of a plurality of k-th bits in sequence, and the memory The bulk circuit further includes an accumulator for generating a partial sum based on the H summed data elements. 一種操作一記憶體電路的方法,包含:在一記憶體單元行處接收複數個輸入資料元中的各輸入資料元的H數目個位元的多個第k個位元的一集合;使用該記憶體單元行的各記憶體單元將該些資料元中的 一對應輸入資料元的該第k個位元與儲存在該記憶體單元中的一第一權重資料元相乘,以產生一對應第一乘積資料元;及使用一加法器樹基於該些第一乘積資料元中的各者產生一求和資料元。 A method of operating a memory circuit, comprising: receiving a set of a plurality of kth bits of H number of bits of each input data element in a plurality of input data elements at a memory cell row; using the Each memory cell of a memory cell row associates the The k-th bit of a corresponding input data element is multiplied by a first weight data element stored in the memory unit to generate a corresponding first product data element; and an adder tree is used based on the k-th data elements Each of a product data element yields a summation data element. 如請求項4所述之方法,其中該使用該記憶體單元行的各記憶體單元將該些資料元中的該對應輸入資料元的該第k個位元與該第一權重資料元相乘包含使用該記憶體單元行的各記憶體單元以進行以下操作:將該些資料元中的另一對應輸入資料元的該第k個位元與儲存在該記憶體單元中的一第二權重資料元相乘,以產生一第二乘積資料元;且將該第一乘積資料元與該第二乘積資料元相加以產生一和資料元,其中該使用該加法器樹產生該求和資料元係基於該些對應和資料元中的各者。 The method of claim 4, wherein each memory cell using the memory cell row multiplies the kth bit of the corresponding input data element among the data elements by the first weight data element Including using each memory cell of the memory cell row to perform the following operation: associate the kth bit of another corresponding input data element among the data elements with a second weight stored in the memory cell multiplying data elements to generate a second product data element; and adding the first product data element and the second product data element to generate a sum data element, wherein the summation data element is generated using the adder tree is based on each of these correspondences and data elements. 如請求項4所述之方法,進一步包含:使用一選擇電路依序輸出該些輸入資料元中的各輸入資 料元的該H個位元中的多個第k個位元的多個集合;及使用一累加器基於H個求和資料元產生一部分和,其中該使用該記憶體單元行的各記憶體單元將該些資料元中的該對應輸入資料元的該第k個位元與該第一權重資料元相乘包含:將各第k個位元與該第一權重資料元依序相乘,以產生複數個第一乘積資料元;且該使用該加法器樹基於該些第一乘積資料元中的各者產生該求和資料元包含:使用該加法器樹基於該些第一乘積資料元產生該H個求和資料元。 The method of claim 4, further comprising: using a selection circuit to sequentially output each input data element in the input data elements a plurality of sets of a plurality of kth bits of the H bits of the element; and generating a partial sum based on the H summed data elements using an accumulator, wherein the use of each memory of the row of memory cells The unit multiplying the k-th bit of the corresponding input data element among the data elements by the first weight data element includes: sequentially multiplying each k-th bit by the first weight data element, to generate a plurality of first product data elements; and the generating the summation data elements based on each of the first product data elements using the adder tree includes: using the adder tree based on the first product data elements The H summation data elements are generated. 如請求項4所述之方法,該方法進一步包含:基於一第一電源電壓位準將該第一權重資料元儲存在該記憶體單元行的各記憶體單元中,其中該使用該記憶體單元行的各記憶體單元將該第k個位元與該第一權重資料元相乘及該使用該加法器樹產生該求和資料元中的各者係基於低於該第一電源電壓位準的一第二電源電壓位準。 The method of claim 4, further comprising: storing the first weighted data element in each memory cell of the memory cell row based on a first supply voltage level, wherein the using the memory cell row Each of the memory cells of the multiplication of the k-th bit by the first weight data element and the use of the adder tree to generate the summed data element is based on a voltage below the first supply voltage level a second power supply voltage level. 一種記憶體電路,包含:一選擇電路,用以針對各包含H個位元的複數個輸入資料元,依序輸出多個第k個位元的多個經選擇的集合至複數個記憶體單元行中的各記憶體單元行的多個對應記憶體單元;複數個加法器樹,該些加法器樹中的各加法器樹耦接至該些記憶體單元行中的一對應記憶體單元行;及複數個累加器,該些累加器中的各累加器耦接至該些加法器樹中的一對應加法器樹,其中各記憶體單元行的各記憶體單元包含一乘法器,該乘法器用以基於多個第k個位元的該經選擇的集合中的該對應第k個位元及儲存在該記憶體單元中的一權重資料元產生一乘積資料元,該些加法器樹中的各加法器樹用以針對多個第k個位元的依序輸出的各集合,基於該對應記憶體單元行的該些乘積資料元中的各者產生一求和資料元,且該些累加器中的各累加器用以基於由該些加法器樹中的該對應加法器樹產生的該些求和資料元產生一部分和。 A memory circuit, comprising: a selection circuit for sequentially outputting a plurality of selected sets of a plurality of kth bits to a plurality of memory cells for a plurality of input data elements each including H bits a plurality of corresponding memory cells of each row of memory cells in a row; a plurality of adder trees, each of the adder trees being coupled to a corresponding row of memory cells of the rows of memory cells and a plurality of accumulators, each of the accumulators is coupled to a corresponding one of the adder trees, wherein each memory cell of each row of memory cells includes a multiplier that multiplies A device for generating a product data element based on the corresponding k-th bit in the selected set of k-th bits and a weight data element stored in the memory cell, in the adder tree Each adder tree of is used to generate a summation data element based on each of the product data elements of the corresponding row of memory cells for each set of sequential outputs of a plurality of kth bits, and the Each of the accumulators is used to generate a partial sum based on the summation data elements generated by the corresponding one of the adder trees. 如請求項8所述之記憶體電路,其中該些加法器樹中的各加法器樹包含:一第一加法器,該第一加法器用以接收一第一和資料元及一第二和資料元,及輸出具有一第一位元數目個位元的該些求和資料元;及一第二加法器及一第三加法器,該第二加法器及該第三加法器用以基於該對應記憶體單元行的該些乘積資料元輸出該第一和資料元及該第二和資料元,該第一和資料元及該第二和資料元中的各者具有比該第一位元數目少一的一第二位元數目個位元。 The memory circuit of claim 8, wherein each of the adder trees comprises: a first adder for receiving a first sum data element and a second sum data elements, and output the summed data elements having a first number of bits; and a second adder and a third adder for use based on the corresponding The product data elements of the row of memory cells output the first sum data element and the second sum data element, each of the first sum data element and the second sum data element having a number greater than the first sum data element A second bit number less than one bit. 如請求項8所述之記憶體電路,其中該些加法器樹中的至少一加法器樹經由一多工器耦接至該些記憶體單元行中的該對應記憶體單元行。 The memory circuit of claim 8, wherein at least one of the adder trees is coupled to the corresponding one of the rows of memory cells via a multiplexer.
TW110118621A 2020-07-14 2021-05-24 Memory circuit and operating method thereof TWI771014B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202063051497P 2020-07-14 2020-07-14
US63/051,497 2020-07-14
US17/203,130 2021-03-16
US17/203,130 US20220019407A1 (en) 2020-07-14 2021-03-16 In-memory computation circuit and method

Publications (2)

Publication Number Publication Date
TW202203053A TW202203053A (en) 2022-01-16
TWI771014B true TWI771014B (en) 2022-07-11

Family

ID=76920518

Family Applications (1)

Application Number Title Priority Date Filing Date
TW110118621A TWI771014B (en) 2020-07-14 2021-05-24 Memory circuit and operating method thereof

Country Status (7)

Country Link
US (1) US20220019407A1 (en)
EP (1) EP3940527A1 (en)
JP (1) JP2022018112A (en)
KR (1) KR102555621B1 (en)
CN (1) CN113571109A (en)
DE (1) DE102021107093A1 (en)
TW (1) TWI771014B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022117853A (en) * 2021-02-01 2022-08-12 パナソニックIpマネジメント株式会社 Diagnostic circuit, electronic device and diagnostic method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW202016803A (en) * 2018-10-29 2020-05-01 旺宏電子股份有限公司 Neural Network System
TW202024960A (en) * 2018-12-07 2020-07-01 南韓商三星電子股份有限公司 Tensor computation dataflow accelerator semiconductor circuit
TWI698884B (en) * 2019-02-19 2020-07-11 旺宏電子股份有限公司 Memory devices and methods for operating the same

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0847552B1 (en) * 1995-08-31 2002-10-30 Intel Corporation An apparatus for performing multiply-add operations on packed data
TW200520225A (en) * 2003-10-24 2005-06-16 Matsushita Electric Ind Co Ltd Pixel arranging apparatus, solid-state image sensing apparatus, and camera
KR102408858B1 (en) * 2017-12-19 2022-06-14 삼성전자주식회사 A nonvolatile memory device, a memory system including the same and a method of operating a nonvolatile memory device
US10678507B2 (en) * 2017-12-22 2020-06-09 Alibaba Group Holding Limited Programmable multiply-add array hardware
US10515689B2 (en) * 2018-03-20 2019-12-24 Taiwan Semiconductor Manufacturing Company, Ltd. Memory circuit configuration and method
CN110673824B (en) * 2018-07-03 2022-08-19 赛灵思公司 Matrix vector multiplication circuit and circular neural network hardware accelerator
US10831446B2 (en) * 2018-09-28 2020-11-10 Intel Corporation Digital bit-serial multi-multiply-and-accumulate compute in memory
KR20200039930A (en) * 2018-10-08 2020-04-17 삼성전자주식회사 Memory device performing in-memory prefetching and system including the same

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW202016803A (en) * 2018-10-29 2020-05-01 旺宏電子股份有限公司 Neural Network System
TW202024960A (en) * 2018-12-07 2020-07-01 南韓商三星電子股份有限公司 Tensor computation dataflow accelerator semiconductor circuit
TWI698884B (en) * 2019-02-19 2020-07-11 旺宏電子股份有限公司 Memory devices and methods for operating the same

Also Published As

Publication number Publication date
US20220019407A1 (en) 2022-01-20
EP3940527A1 (en) 2022-01-19
KR102555621B1 (en) 2023-07-13
CN113571109A (en) 2021-10-29
TW202203053A (en) 2022-01-16
DE102021107093A1 (en) 2022-01-20
KR20220008743A (en) 2022-01-21
JP2022018112A (en) 2022-01-26

Similar Documents

Publication Publication Date Title
Haj-Ali et al. Efficient algorithms for in-memory fixed point multiplication using magic
US11886378B2 (en) Computer architecture with resistive processing units
US11568223B2 (en) Neural network circuit
Zidan et al. Field-programmable crossbar array (FPCA) for reconfigurable computing
CN110597484B (en) Multi-bit full adder based on memory calculation and multi-bit full addition operation control method
US20220188604A1 (en) Method and Apparatus for Performing a Neural Network Operation
US10825512B1 (en) Memory reads of weight values
CN112636745B (en) Logic unit, adder and multiplier
US20220269483A1 (en) Compute in memory accumulator
US9933998B2 (en) Methods and apparatuses for performing multiplication
TWI771014B (en) Memory circuit and operating method thereof
US20230005529A1 (en) Neuromorphic device and electronic device including the same
CN112951290B (en) Memory computing circuit and device based on nonvolatile random access memory
CN114974337A (en) Time domain memory computing circuit based on spin magnetic random access memory
CN108109655B (en) RRAM iterative multiplier circuit based on MIG logic and implementation method
US11853596B2 (en) Data sequencing circuit and method
Monga et al. A Novel Decoder Design for Logic Computation in SRAM: CiM-SRAM
Hemmat et al. Power-efficient ReRAM-aware CNN model generation
US20230418600A1 (en) Non-volatile memory die with latch-based multiply-accumulate components
US20230161557A1 (en) Compute-in-memory devices and methods of operating the same
US20240028298A1 (en) Memory device and method with in-memory computing
Jiang Architecture and Circuit Design Optimization for Compute-In-Memory
Mandal et al. ReRAM-based in-memory computation of galois field arithmetic
Tasnim et al. MAGIC-DHT: Fast in-memory computing for Discrete Hadamard Transform
Yang Leveraging RRAM to Design Efficient Digital Circuits and Systems for Beyond Von Neumann in-Memory Computing