TWI741766B

TWI741766B - In-memory computation device

Info

Publication number: TWI741766B
Application number: TW109129491A
Authority: TW
Inventors: 許柏凱; 葉騰豪; 徐子軒; 呂函庭
Original assignee: 旺宏電子股份有限公司
Priority date: 2020-08-28
Filing date: 2020-08-28
Publication date: 2021-10-01
Also published as: TW202209318A

Abstract

An in-memory computation device including a memory array, p×q analog to digital converters (ADCs) and a ladder adder is provided. The memory array is divided into p×q memory tiles, where p and q are positive integers larger than 1. Each of the memory tiles has a plurality local bit lines coupled to a global bit line respectively through a plurality of bit line selection switches. The bit line selection switches are turned on or cur off according to a plurality of control signals. The memory array receives a plurality of input signals. The ADCs are respectively coupled to a plurality of global bit lines of the memory blocks. The ADCs respectively convert electrical signals on the global bit lines to generate a plurality of sub-output signals. The ladder adder is coupled to the ADCs, and performs an addition operation on the sub-output signals to generate a calculation result.

Description

In-memory computing device

本發明是有關於一種記憶體內運算裝置，且特別是有關於一種可降低資料儲存的需求的記憶體內運算裝置。The present invention relates to an in-memory arithmetic device, and more particularly to an in-memory arithmetic device that can reduce the demand for data storage.

近年來，用於邊緣計算的深度類神經網路（Deep neural networks, DNN）的人工智能（AI）加速器，對於人工智能型的物聯網（AIoT）的應用程序的整合與實施變得越來越重要。除了傳統的馮·諾依曼（Von Neumann）計算結構外，一種可進一步提升計算效率的記憶體內運算（Computation In Memory, CIM）的架構被提出。In recent years, artificial intelligence (AI) accelerators for deep neural networks (DNN) for edge computing have become more and more integrated and implemented for artificial intelligence-based Internet of Things (AIoT) applications. important. In addition to the traditional Von Neumann computing structure, a Computation In Memory (CIM) architecture that can further improve computing efficiency is proposed.

然而，在多個輸入信號以及多個權重的乘加運算中，大範圍的以及大量的數據無法避免的會被產生，因此，要如何減低記憶體內運算裝置所需的資料儲存的需求以及功率消耗，成為本領域工程人員的重要課題。However, in the multiplication and addition operations of multiple input signals and multiple weights, a large range and a large amount of data will inevitably be generated. Therefore, how to reduce the data storage requirements and power consumption of the in-memory computing device , Has become an important subject for engineers in this field.

本發明提供一種記憶體內運算裝置，可降低資料儲存的需求。The invention provides an in-memory arithmetic device, which can reduce the demand for data storage.

本發明的記憶體內運算裝置包括記憶體陣列、p

q個類比數位轉換器以及階梯式加法器。記憶體陣列區分為p

q個記憶分區（tiles），其中p、q均為大於1的正整數。各記憶分區具有多條分區位元線，以分別透過多個位元線選擇開關耦接至對應的總體位元線。位元線選擇開關分別依據多個控制信號以被導通或斷開。記憶體陣列接收多個輸入信號。類比數位轉換器分別耦接至記憶分區的多條總體位元線。類比數位轉換器分別轉換總體位元線上的電信號以產生數位的p

q個子輸出值。階梯式加法器耦接類比數位轉換器，針對子輸出值執行加法運算以產生運算結果。 The in-memory arithmetic device of the present invention includes a memory array, p

q analog-to-digital converters and ladder adders. The memory array is divided into p

q memory partitions (tiles), where p and q are both positive integers greater than 1. Each memory partition has a plurality of partition bit lines, which are respectively coupled to the corresponding overall bit lines through a plurality of bit line selection switches. The bit line selection switch is turned on or off according to a plurality of control signals, respectively. The memory array receives multiple input signals. The analog-to-digital converter is respectively coupled to a plurality of global bit lines of the memory partition. The analog-to-digital converter separately converts the electrical signals on the overall bit line to generate digital p

q sub-output values. The ladder adder is coupled to the analog-to-digital converter, and performs addition operation on the sub-output value to generate the operation result.

基於上述，本發明透過將記憶體陣列區分為多個記憶分區，各記憶分區可透過調整位元線選擇開關的導通數量來調整權重的位元數。多個記憶分區依據所接收的輸入信號分別產生多個子輸出值。再透過階梯式加法器來針對子輸出值執行加法運算以產生運算結果。透過上述的架構，記憶體內運算裝置中，用來執行乘加運算的資料儲存的需求可以減小，可有效降低硬體成本以及功率消耗，並提升計算的速率。Based on the above, the present invention divides the memory array into a plurality of memory partitions, and each memory partition can adjust the number of weighted bits by adjusting the number of conduction of the bit line selection switch. The multiple memory partitions respectively generate multiple sub-output values according to the received input signal. Then, an addition operation is performed on the sub-output value through a ladder adder to generate an operation result. Through the above-mentioned structure, the data storage requirements for performing multiplication and addition operations in the in-memory arithmetic device can be reduced, which can effectively reduce the hardware cost and power consumption, and increase the calculation rate.

請參照圖1，圖1繪示本發明一實施例的記憶體內運算裝置的示意圖。記憶體內運算裝置100可應用於深度類神經網路（Deep neural networks, DNN）的計算。記憶體內運算裝置100包括記憶體陣列110、p

q個類比數位轉換器AD11~ADqp以及階梯式加法器120。在本實施例中，記憶體陣列110可被區分為p

q個記憶分區MB11~MBqp，其中的p、q均為大於1的正整數。各記憶分區MB11~MBqp均可接收多個輸入信號。各記憶分區MB11~MBqp中的多個記憶胞並提供多個權重值，並依據輸入信號以及權重值執行乘加運算。 Please refer to FIG. 1. FIG. 1 is a schematic diagram of an in-memory arithmetic device according to an embodiment of the present invention. The in-memory computing device 100 can be applied to deep neural networks (DNN) calculations. The in-memory computing device 100 includes a memory array 110, p

q analog-to-digital converters AD11~ADqp and step adder 120. In this embodiment, the memory array 110 can be divided into p

q memory partitions MB11~MBqp, where p and q are both positive integers greater than 1. Each memory partition MB11~MBqp can receive multiple input signals. The multiple memory cells in each memory partition MB11~MBqp provide multiple weight values, and perform multiplication and addition operations according to the input signal and the weight value.

類比數位轉換器AD11~ADqp分別耦接至記憶分區MB11~MBqp。在本實施例中，各記憶分區MB11~MBqp中具有總體位元線（global bit line）。類比數位轉換器AD11~ADqp分別耦接至記憶分區MB11~MBqp的總體位元線。類比數位轉換器AD11~ADqp並分別針對記憶分區MB11~MBqp的總體位元線上的電信號進行數比數位轉換動作，並藉以產生p

q個子輸出值SV11~SVqp。其中，上述的電信號可以為電壓信號或是電流信號。 The analog-to-digital converters AD11~ADqp are respectively coupled to the memory partitions MB11~MBqp. In this embodiment, each memory partition MB11-MBqp has a global bit line (global bit line). The analog-to-digital converters AD11~ADqp are respectively coupled to the overall bit lines of the memory partitions MB11~MBqp. The analog-to-digital converter AD11~ADqp performs digital-to-digital conversion for the electrical signals on the overall bit line of the memory partition MB11~MBqp, and generates p

q sub-output values SV11~SVqp. Wherein, the above-mentioned electrical signal can be a voltage signal or a current signal.

在本實施例中，每一記憶分區MB11~MBqp中均具有多條分區位元線。對應相同的記憶分區中的所有分區位元線，均耦接至相對應的總體位元線。In this embodiment, each memory partition MB11~MBqp has multiple partition bit lines. All the partition bit lines corresponding to the same memory partition are coupled to the corresponding overall bit lines.

階梯式加法器120則耦接至類比數位轉換器AD11~ADqp。階梯式加法器120接收子輸出值SV11~SVqp，並針對子輸出值AD11~ADqp執行加法運算，並藉以產生運算結果CR。The ladder adder 120 is coupled to the analog-to-digital converters AD11~ADqp. The ladder adder 120 receives the sub-output values SV11~SVqp, performs addition operations on the sub-output values AD11~ADqp, and thereby generates the operation result CR.

以下請參照圖2A，圖2A繪示本發明實施例的記憶體內運算裝置中的記憶體陣列的分割方式的示意圖。在本實施例中，記憶體陣列200包括多個反或式快閃（NOR flash）記憶胞。記憶體陣列200可區分為p

q個記憶分區MB11~MBqp，其中包括p個記憶分區列以及q個記憶分區行。p、q均為大於1的正整數。記憶分區MB11~MBqp分別具有多個總體位元線GBL_11~GBL_qp。設置在相同列的記憶分區MB11~MBqp可接收相同的輸入信號，例如，與設置在相同列的記憶分區MB11、MBq1接收輸入信號IN11~IN14，設置在相同列的記憶分區MB1p、MBqp接收輸入信號INp1~INp4。 Please refer to FIG. 2A below. FIG. 2A illustrates a schematic diagram of a partitioning method of the memory array in the in-memory arithmetic device according to an embodiment of the present invention. In this embodiment, the memory array 200 includes a plurality of NOR flash memory cells. The memory array 200 can be divided into p

q memory partitions MB11~MBqp, including p memory partition columns and q memory partition rows. Both p and q are positive integers greater than 1. Memory partitions MB11~MBqp respectively have multiple global bit lines GBL_11~GBL_qp. The memory partitions MB11~MBqp set in the same column can receive the same input signal. For example, the memory partitions MB11 and MBq1 set in the same column receive input signals IN11~IN14, and the memory partitions MB1p and MBqp set in the same column receive input signals. INp1~INp4.

對應記憶分區MB11~MBqp，本發明實施例的記憶體內運算裝置並設置p

q個類比數位轉換器AD11~ADqp。類比數位轉換器AD11~ADqp分別耦接至總體位元線GBL_11~GBL_qp，並針對總體位元線GBL_11~GBL_qp上的電信號進行類比數位轉換動作，以分別產生p

q個子輸出值。 Corresponding to the memory partitions MB11~MBqp, the in-memory arithmetic device of the embodiment of the present invention is set with p

q analog-to-digital converters AD11~ADqp. The analog-to-digital converters AD11~ADqp are respectively coupled to the global bit lines GBL_11~GBL_qp, and perform analog-to-digital conversion actions on the electrical signals on the global bit lines GBL_11~GBL_qp to respectively generate p

q sub-output values.

在此，記憶體陣列200中的記憶胞可以預先被程式化為0或1的值，並透過使記憶胞接收的字元線電壓為選中或未選中，以使記憶胞提供所需要的權重值。Here, the memory cells in the memory array 200 can be programmed to a value of 0 or 1, and the character line voltage received by the memory cell is selected or unselected, so that the memory cell provides the required value Weights.

附帶一提，在圖2A的實施方式中，若記憶體陣列200具有j條記憶胞行（可提供j位元權重值），則記憶體陣列200最大可提供的權重值最大可至2 ^j-1，而產生很大的資料儲存需求。透過本發明實施方式，使記憶體陣列200區分為p

q個記憶分區MB11~MBqp，並透過在記憶分區MB11~MBqp分別設置總體位元線GBL_11~GBL_qp的方式，可以使在提供j位元權重值的前提下，各記憶分區MB11~MBqp可提供的權重值最大可至q

（2 ^j/q-1），有效降低資料儲存需求。 Incidentally, in the embodiment of FIG. 2A, if the memory array 200 has j memory cell rows (which can provide j-bit weight values), the maximum weight value that the memory array 200 can provide can be up to 2 ^j − 1. There is a huge demand for data storage. Through the embodiment of the present invention, the memory array 200 is divided into p

q memory partitions MB11~MBqp, and by setting the overall bit line GBL_11~GBL_qp respectively in the memory partitions MB11~MBqp, each memory partition MB11~MBqp can provide The weight value can be up to q

(2 ^j/q -1), effectively reducing data storage requirements.

另一方面，在記憶體陣列200具有i條記憶胞列（可提供i位元輸入信號）時，透過使記憶體陣列200區分為p

q個記憶分區MB11~MBqp，可以使單一記憶分區對應的輸入信號的數量降低為i/p個，因此，本實施例的p個記憶分區對應的輸入信號的最大數值可以為p

（2 ^i/p-1）。 On the other hand, when the memory array 200 has i memory cell rows (which can provide i-bit input signals), the memory array 200 is divided into p

The q memory partitions MB11~MBqp can reduce the number of input signals corresponding to a single memory partition to i/p. Therefore, the maximum value of input signals corresponding to p memory partitions in this embodiment can be p

(2 ^i/p -1).

基於上述，透過本發明實施例的p

q個記憶分區MB11~MBqp的區分方式，可以將具有i條記憶胞列以及j條記憶胞行的記憶體陣列200的資料儲存需求，由（2 ⁱ-1）

（2 ^j-1）降低至p

q

（2 ^i/p-1）

（2 ^j/q-1）。以i=j=8，p=q=4為範例，資料儲存需求可由65025降低至144。 Based on the above, through the p of the embodiment of the present invention

The distinguishing method of q memory partitions MB11~MBqp can meet the data storage requirements of the memory array 200 with i memory cell rows and j memory cell rows, which is determined by (2 ⁱ -1)

(2 ^j -1) reduced to p

q

(2 ^i/p -1)

(2 ^j/q -1). Taking i=j=8 and p=q=4 as an example, the data storage requirement can be reduced from 65025 to 144.

此外，請參照圖2B以及圖2C，圖2B以及圖2C繪示本發明實施例的權重位元數調整方式的示意圖。在圖2B中，以圖2A的記憶分區MB11為範例，記憶分區MB11中並具有多個位元線選擇開關BLT1~BLT6。分區位元線（local bit line）BL1~BL6分別透過位元線選擇開關BLT1~BLT6以耦接至總體位元線GBL_11。位元線選擇開關BLT1~BLT6分別受控於控制信號CT1~CT6以分別被導通或斷開。其中，位元線選擇開關BLT1~BLT6的被導通數量可表示記憶分區MB11的權重位元數。In addition, please refer to FIG. 2B and FIG. 2C. FIG. 2B and FIG. 2C are schematic diagrams of the weight bit number adjustment method according to the embodiment of the present invention. In FIG. 2B, taking the memory partition MB11 of FIG. 2A as an example, the memory partition MB11 also has a plurality of bit line selection switches BLT1~BLT6. The local bit lines BL1~BL6 are respectively coupled to the global bit line GBL_11 through bit line selection switches BLT1~BLT6. The bit line selection switches BLT1~BLT6 are controlled by the control signals CT1~CT6 to be turned on or off respectively. Among them, the number of bit line selection switches BLT1 to BLT6 that are turned on may represent the number of weighted bits of the memory partition MB11.

以圖2C為範例，其中位元線選擇開關BLT1~BLT3分別依據控制信號CT1~CT3被導通，而位元線選擇開關BLT4~BLT6分別依據控制信號CT4~CT6被斷開。在此條件下，有效連接至總體位元線GBL_11的三條分區位元線BL1~BL3，可呈現可被編碼為000、001、011以及111的兩個位元的權重。當然，當權重位元數需要被調整時，可透過調整位元線選擇開關BLT1~BLT6的被導通數量來實現。Taking FIG. 2C as an example, the bit line selection switches BLT1~BLT3 are turned on according to the control signals CT1~CT3, and the bit line selection switches BLT4~BLT6 are turned off according to the control signals CT4~CT6, respectively. Under this condition, the three partition bit lines BL1 to BL3, which are effectively connected to the global bit line GBL_11, can exhibit two bit weights that can be coded as 000, 001, 011, and 111. Of course, when the number of weighted bits needs to be adjusted, it can be achieved by adjusting the number of bit line selection switches BLT1 to BLT6 to be turned on.

關於各記憶分區的實施方式，請參照圖3A以及圖3B分別繪示的本發明實施例的記憶分區的不同實施方式的示意圖。在圖3A中，記憶分區MB11包括多個記憶胞MC1~MCK+1，其中，記憶胞MC1~MCK+1為二電晶體（2T）式反或式快閃記憶胞。Regarding the implementation of each memory partition, please refer to FIG. 3A and FIG. 3B for schematic diagrams of different implementations of the memory partition of the embodiment of the present invention, respectively. In FIG. 3A, the memory partition MB11 includes a plurality of memory cells MC1~MCK+1, where the memory cells MC1~MCK+1 are two-transistor (2T) trans-OR flash memory cells.

。記憶胞MC1~MCK+1以平面的方式進行設置。記憶分區MB11另包括分別對應記憶胞MC1~MCK+1的多個選擇開關ST1~STK+1。選擇開關ST1~STK+1由電晶體所構成。記憶胞MC1~MCK+1的與分別對應的選擇開關ST1~STK+1依序串接在對應的分區位元線以及共同源極線CSL間。以記憶胞MC1以及MC2為範例，記憶胞MC1與選擇開關ST1依序串接在分區位元線BL1以及共同源極線CSL間；記憶胞MC2則與選擇開關ST2依序串接在分區位元線BL1以及共同源極線CSL間。此外，記憶分區MB11中的所有分區位元線BL1~BLL均耦接至總體位元線GBL。. The memory cells MC1~MCK+1 are set in a planar manner. The memory partition MB11 further includes a plurality of selection switches ST1~STK+1 respectively corresponding to the memory cells MC1~MCK+1. The selector switches ST1~STK+1 are composed of transistors. The memory cells MC1~MCK+1 and the corresponding selection switches ST1~STK+1 are serially connected between the corresponding partition bit lines and the common source line CSL in sequence. Taking the memory cells MC1 and MC2 as examples, the memory cell MC1 and the selection switch ST1 are serially connected between the partition bit line BL1 and the common source line CSL in sequence; the memory cell MC2 and the selection switch ST2 are serially connected in the partition bit in sequence Between the line BL1 and the common source line CSL. In addition, all the partition bit lines BL1 ˜BLL in the memory partition MB11 are coupled to the global bit line GBL.

在記憶分區MB11中，選擇開關ST1以及ST2的控制端分別接收輸入信號IN11以及IN12。記憶胞MC1以及MC2的閘極則分別接收信號MG1、MG2。記憶胞MC1以及MC2形成2T架構的反或型快閃記憶元件。在進行記憶體內運算動作時，依據輸入信號IN11以及IN12，選擇開關ST1以及ST2分別提供電流，再依據記憶胞MC1以及MC2所提供的轉導值以作為權重值，並產生乘加結果。分區位元線BL1上可傳送依據乘加動作所產生的電壓至總體位元線GBL。In the memory partition MB11, the control ends of the selection switches ST1 and ST2 receive input signals IN11 and IN12, respectively. The gates of the memory cells MC1 and MC2 receive signals MG1 and MG2, respectively. The memory cells MC1 and MC2 form an NOR flash memory device with a 2T structure. When performing arithmetic operations in the memory, according to the input signals IN11 and IN12, the selection switches ST1 and ST2 respectively provide currents, and then the transduction values provided by the memory cells MC1 and MC2 are used as weight values, and the multiplication and addition results are generated. The divided bit line BL1 can transmit the voltage generated according to the multiplying and adding operation to the global bit line GBL.

在本實施方式中，總體位元線GBL耦接至類比數位轉換器AD11。類比數位轉換器AD11則可轉換總體位元線GBL上的電壓，並獲得為數位格式的子輸出值。In this embodiment, the global bit line GBL is coupled to the analog-to-digital converter AD11. The analog-to-digital converter AD11 can convert the voltage on the global bit line GBL and obtain the sub-output value in a digital format.

值得一提的，類比數位轉換器AD11硬體架構，可應用本領域具通常知識者所熟知的類比數位轉換電路來實施，沒有特定的限制。It is worth mentioning that the hardware architecture of the analog-to-digital converter AD11 can be implemented using analog-to-digital conversion circuits well known to those skilled in the art without any specific restrictions.

另外，在圖3B中，記憶分區MB11可應用三維的立體架構來實施。且各分區位元線BL1~BLL可耦接至兩個或兩個以上的記憶胞。圖3B與圖3A實施方式的記憶分區MB11的運作方式相同，在此不多贅述。In addition, in FIG. 3B, the memory partition MB11 can be implemented using a three-dimensional structure. And each partition bit line BL1~BLL can be coupled to two or more memory cells. The operation mode of the memory partition MB11 in the embodiment of FIG. 3B is the same as that of the embodiment of FIG. 3A, and will not be repeated here.

以下請參照圖4，圖4繪示本發明實施例的階梯式加法器的實施方式的示意圖。階梯式加法器400包括多個第一子階梯式加法器411~41p以及第二子階梯式加法器420。對應區分為p

q個記憶分區的記憶體陣列，第一子階梯式加法器411~41p的數量可以是p個。依據圖2的範例，第一子階梯式加法器411~41p可以分別對應至p個記憶分區列，且各第一子階梯式加法器411~41p，耦接至對應的記憶分區列的q個類比數位轉換器。在圖2中，以記憶分區MB11~MBq1所在的記憶分區列為範例，第一子階梯式加法器411可耦接分別對應記憶分區MB11~MBq1的類比數位轉換器AD11~ADq1。 Please refer to FIG. 4 below. FIG. 4 is a schematic diagram of an implementation of a stepped adder according to an embodiment of the present invention. The step adder 400 includes a plurality of first sub step adders 411 to 41 p and a second sub step adder 420. Corresponding distinction is p

For the memory array of q memory partitions, the number of the first sub-ladder adders 411 to 41p can be p. According to the example of FIG. 2, the first sub-ladder adders 411~41p can respectively correspond to p memory partition rows, and each of the first sub-ladder adders 411~41p is coupled to q corresponding memory partition rows Analog to digital converter. In FIG. 2, taking the memory partition row where the memory partitions MB11~MBq1 are located as an example, the first sub-ladder adder 411 can be coupled to analog-to-digital converters AD11~ADq1 corresponding to the memory partitions MB11~MBq1, respectively.

第一子階梯式加法器411並針對類比數位轉換器AD11~ADq1所產生的子輸出值進行加法運算來產生第一方向運算結果CDR1。相類似的，第一子階梯式加法器412~41p可透過所執行的加法運算來分別產生多個第一方向運算結果CDR2~CDRp。The first sub-ladder adder 411 performs addition operations on the sub-output values generated by the analog-to-digital converters AD11 to ADq1 to generate the first direction operation result CDR1. Similarly, the first sub-ladder adders 412 to 41p can generate a plurality of first direction operation results CDR2 to CDRp through the performed addition operations.

第二子階梯式加法器420耦接至第一子階梯式加法器411~41p。第二子階梯式加法器420用以針對第一子階梯式加法器411~41p別產生的第一方向運算結果CDR2~CDRp進行加法運算，並據以產生運算結果CR。The second sub-ladder adder 420 is coupled to the first sub-ladder adders 411-41p. The second sub-ladder adder 420 is used to perform addition operations on the first-direction calculation results CDR2 to CDRp generated by the first sub-ladder adders 411 to 41p, and generate the calculation result CR accordingly.

關於各第一子階梯式加法器411~41p以及第二子階梯式加法器420的實施細節，可參照以下圖5以及圖6的實施方式。For the implementation details of each of the first sub-step adders 411 to 41p and the second sub-step adder 420, please refer to the following embodiments of FIG. 5 and FIG. 6.

圖5繪示本發明實施例的第一子階梯式加法器的實施方式的示意圖。第一子階梯式加法器500耦接至對應相同記憶分區列的q個類比數位轉換器AD11~AD1q。第一子階梯式加法器500具有N個層LA1~LAN，其中N=

。在本實施方式中，各個層LA1~LAN中，具有一個或多個全加器以及移位器。其中，第一層LA1中包括全加器FAD11~FAD1A以及移位器SF11~SF1A。全加器FAD11~FAD1A分別對應移位器SF11~SF1A，並兩兩交錯且依序耦接至類比數位轉換器AD11~AD1q。其中，全加器FAD11~FAD1A的數量為q/2個，移位器SF11~SF1A同樣為q/2個。全加器FAD11~FAD1A分別接收第奇數個的類比數位轉換器AD11、AD13、…所產生的子輸出值，移位器SF11~SF1A則分別接收第偶數個的類比數位轉換器AD12、…、AD1q所產生的子輸出值，並執行位元移動的動作。 FIG. 5 is a schematic diagram of an implementation of a first sub-ladder adder according to an embodiment of the present invention. The first sub-ladder adder 500 is coupled to q analog-to-digital converters AD11 to AD1q corresponding to the same memory partition row. The first sub-ladder adder 500 has N layers LA1~LAN, where N=

. In this embodiment, each layer LA1 to LAN has one or more full adders and shifters. Among them, the first layer LA1 includes full adders FAD11~FAD1A and shifters SF11~SF1A. The full adders FAD11~FAD1A correspond to the shifters SF11~SF1A respectively, and are interleaved in pairs and sequentially coupled to the analog-to-digital converters AD11~AD1q. Among them, the number of full adders FAD11~FAD1A is q/2, and the number of shifters SF11~SF1A is also q/2. The full adders FAD11~FAD1A respectively receive the sub-output values generated by the odd-numbered analog-to-digital converters AD11, AD13,..., and the shifters SF11~SF1A respectively receive the even-numbered analog-to-digital converters AD12,..., AD1q The generated sub-output value, and the action of bit shifting is performed.

其中，在本實施方式中，移位器SF11~SF1A用以使所接收的子輸出值進行往高位元方向移位的動作。在本實施方式中，第一層LA1的移位器SF11~SF1A的位元移動數量等於j/q個，其中j為記憶體陣列的記憶胞行的總數量。全加器FAD11~FAD1A並另分別接收移位器SF11~SF1A的輸出，並執行全加動作。Among them, in this embodiment, the shifters SF11 to SF1A are used to shift the received sub-output value in the higher bit direction. In this embodiment, the number of bit shifts of the shifters SF11 to SF1A of the first layer LA1 is equal to j/q, where j is the total number of memory cell rows in the memory array. The full adder FAD11~FAD1A also receives the output of the shifter SF11~SF1A respectively, and executes the full add operation.

附帶一提的，本實施方式中，第二層的移位器位元移動數量等於2

j/q個，其餘依此類推。另外，在第一子階梯式加法器500的第r層中具有q/

個全加器以及q/

個移位器，相同層中的全加器以及移位器依序交錯排列，並分別耦接前一層的全加器的輸出端，其中1＜r

N。 Incidentally, in this embodiment, the number of shifter bits in the second layer is equal to 2.

j/q, the rest can be deduced by analogy. In addition, in the rth layer of the first sub-ladder adder 500, there is q/

Full adder and q/

Two shifters, the full adders and shifters in the same layer are arranged in a staggered order, and are respectively coupled to the output terminals of the full adder of the previous layer, where 1<r

N.

在第N層LAN中則包括單一個全加器FADN1以及單一個移位器SFN1。移位器SFN1的位元移動數量等於

j/q。全加器FADN1則產生第一方向運算結果CDR1。 In the Nth layer of LAN, a single full adder FADN1 and a single shifter SFN1 are included. The number of bit shifts of the shifter SFN1 is equal to

j/q. The full adder FADN1 generates the first direction operation result CDR1.

本實施方式中的全加器FAD11~FADN1以及移位器SF11~SFN1的硬體架構，可應用本領域具通常知識者所熟知的全加電路以及數位移位電路來實施，沒有特定的限制。The hardware architectures of the full adders FAD11 to FADN1 and the shifters SF11 to SFN1 in this embodiment can be implemented using full add circuits and digital bit shift circuits that are well known to those skilled in the art, and there is no specific limitation.

圖6繪示本發明實施例的第二子階梯式加法器的實施方式的示意圖。第二子階梯式加法器600包括多個層LB1~LBM，其中每一層LB1~LBM包括至少一全加器以及至少一移位器。其中，第二子階梯式加法器600具有M個層LB1~LBM，且M=

，其中p為第一方向運算結果CDR1~CDRp的數量。在本實施方式中，第一層LB1中具有全加器FAD11a~FAD1Ba以及移位器SF11a~SF1Ba。全加器FAD11a~FAD1Ba分別對應移位器SF11a~SF1Ba，並兩兩交錯排列。全加器FAD11a~FAD1Ba的多個第一輸入端分別接收第奇數個的第一方向運算結果CDR1、CDR3…、CDRp-1，全加器FAD11a~FAD1Ba的多個第二輸入端分別耦接至移位器SF11a~SF1Ba的輸出端。移位器SF11a~SF1Ba的輸入端則分別接收第偶數個的第一方向運算結果CDR2、CDR4…、CDRp。全加器FAD11a~FAD1Ba的數量與移位器SF11a~SF1Ba的數量均等於p/2。 FIG. 6 is a schematic diagram of an implementation of a second sub-ladder adder according to an embodiment of the present invention. The second sub-ladder adder 600 includes a plurality of layers LB1 ˜LBM, and each layer LB1 ˜LBM includes at least one full adder and at least one shifter. Wherein, the second sub-ladder adder 600 has M layers LB1~LBM, and M=

, Where p is the number of CDR1~CDRp results in the first direction. In this embodiment, the first layer LB1 has full adders FAD11a to FAD1Ba and shifters SF11a to SF1Ba. The full adders FAD11a~FAD1Ba correspond to the shifters SF11a~SF1Ba respectively, and they are arranged in a staggered manner. The multiple first input terminals of the full adders FAD11a~FAD1Ba respectively receive the odd-numbered first direction operation results CDR1, CDR3..., CDRp-1, and the multiple second input terminals of the full adders FAD11a~FAD1Ba are respectively coupled to The output terminals of the shifters SF11a~SF1Ba. The input ends of the shifters SF11a~SF1Ba respectively receive the even-numbered first direction operation results CDR2, CDR4..., CDRp. The number of full adders FAD11a~FAD1Ba and the number of shifters SF11a~SF1Ba are both equal to p/2.

另外，在第二子階梯式加法器600的第s層中，則具有p/

個全加器以及p/

個移位器，相同層中的全加器以及移位器依序交錯排列，並分別耦接前一層的全加器的輸出端，其中1＜s

。在最後一層LBM中，則具有單一全加器FADM1a以及單一移位器SFM1a。單一全加器FADM1a並用以產生運算結果CR。 In addition, in the s-th layer of the second sub-ladder adder 600, there is p/

Full adder and p/

Two shifters, the full adders and shifters in the same layer are arranged in a staggered order, and are respectively coupled to the output terminals of the full adder of the previous layer, where 1<s

. In the last layer of LBM, there is a single full adder FADM1a and a single shifter SFM1a. A single full adder FADM1a is used to generate the operation result CR.

在本實施方式中，第二子階梯式加法器600的移位器SF11a~SF1Ba用以使所接收的子輸出值進行往高位元方向移位的動作。第一層LB1中的移位器SF11a~SF1Ba的位元移動數量相同，並均等於i/p個，其中i為輸入信號的位元數。另外，第二子階梯式加法器600的第二層中的移位器的位元移動數量則可以均為2

i/p個，依此類推，最後一層LBM的移位器SFM1a的位元移動數量則可以為

i/p。 In this embodiment, the shifters SF11a to SF1Ba of the second sub-ladder adder 600 are used to shift the received sub-output value in the higher bit direction. The shifters SF11a to SF1Ba in the first layer LB1 have the same number of bit shifts, and they are all equal to i/p, where i is the number of bits of the input signal. In addition, the number of bit shifts of the shifters in the second layer of the second sub-ladder adder 600 can all be 2.

i/p, and so on, the number of bit shifts of the shifter SFM1a of the last layer of LBM can be

i/p.

本實施方式中的全加器FAD11a~FADM1a以及移位器SF11a~SFM1a的硬體架構，可應用本領域具通常知識者所熟知的全加電路以及數位移位電路來實施，沒有特定的限制。另外，本實施方式中的全加器FAD11a~FADM1a的硬體架構可以與圖5實施方式中的全加器FAD11~FADN1的硬體架構相同或不相同。本實施方式中的移位器SF11a~SFM1a的硬體架構可以與圖5實施方式中的移位器SF11~SFN1的硬體架構相同或不相同。The hardware architectures of the full adders FAD11a to FADM1a and the shifters SF11a to SFM1a in this embodiment can be implemented using full add circuits and digital bit shift circuits known to those skilled in the art, and there is no specific limitation. In addition, the hardware architecture of the full adders FAD11a~FADM1a in this embodiment may be the same or different from the hardware architecture of the full adders FAD11~FADN1 in the embodiment of FIG. 5. The hardware architecture of the shifters SF11a~SFM1a in this embodiment may be the same or different from the hardware architecture of the shifters SF11~SFN1 in the embodiment of FIG. 5.

以下請參照圖7，圖7繪示本發明另一實施例的記憶體內運算裝置的部分電路的示意圖。記憶體內運算裝置700另包括正規化電路720以及量化器730。正規化電路720耦接至階梯式加法器710的輸出端，以接收階梯式加法器710所產生的運算結果CR。正規化電路720包括乘法器721以及全加器722。乘法器721接收運算結果CR以及縮放倍率SF，並使運算結果CR以及縮放倍率SF相乘。全加器722則接收乘法器721的輸出，並另接收偏移參數BF。全加器722用以使乘法器721的輸出與偏移參數BF相加，並據以產生調整後運算結果NCR。Please refer to FIG. 7 below. FIG. 7 is a schematic diagram of a partial circuit of an in-memory arithmetic device according to another embodiment of the present invention. The in-memory arithmetic device 700 further includes a normalization circuit 720 and a quantizer 730. The normalization circuit 720 is coupled to the output terminal of the step adder 710 to receive the operation result CR generated by the step adder 710. The normalization circuit 720 includes a multiplier 721 and a full adder 722. The multiplier 721 receives the calculation result CR and the zoom magnification SF, and multiplies the calculation result CR and the zoom magnification SF. The full adder 722 receives the output of the multiplier 721 and also receives the offset parameter BF. The full adder 722 is used to add the output of the multiplier 721 and the offset parameter BF to generate the adjusted operation result NCR accordingly.

上述的縮放倍率SF以及偏移參數BF可以由設計者自行設定，用以將運算結果CR正規化（normalize）至一合理的數值範圍，以方便後續的運算。The above-mentioned zoom magnification SF and the offset parameter BF can be set by the designer to normalize the calculation result CR to a reasonable value range to facilitate subsequent calculations.

量化器730耦接至正規化電路720，接收調整後運算結果NCR，並使調整後運算結果NCR除以一參考數值DEN以產生輸出運算結果OCR。本實施例中，量化器730可以為一除法器731。其中，參考數值DEN可以為，可為設計者預先設定的非零的一預設數值，沒有特定的限制。The quantizer 730 is coupled to the normalization circuit 720, receives the adjusted operation result NCR, and divides the adjusted operation result NCR by a reference value DEN to generate an output operation result OCR. In this embodiment, the quantizer 730 may be a divider 731. Wherein, the reference value DEN may be a non-zero preset value preset by a designer, and there is no specific limitation.

上述的全加器722、乘法器721以及除法器731的硬體架構可應用本領域熟知的全加器電路、乘法器電路以及除法器電路來分別實施，沒有特定的限制。The aforementioned hardware architectures of the full adder 722, the multiplier 721, and the divider 731 can be implemented separately by using the full adder circuit, the multiplier circuit, and the divider circuit well-known in the art, and there is no specific limitation.

附帶一提的，本實施例的記憶體內運算裝置700可應用於卷積神經網路（Convolutional Neural Network, CNN）。Incidentally, the in-memory arithmetic device 700 of this embodiment can be applied to a Convolutional Neural Network (CNN).

綜上所述，本發明藉由使記憶體陣列區分為p

q個記憶分區，再配合階梯式加法器以完成所要執行的乘加運算。在本發明的架構下，權重位元數可依據位元線選擇開關的導通數量來調整。並且，運算過程中所產生的數值大小可以被縮減，資料儲存的需求可以有效的被降低，可降低硬體的負擔，並增加計算的效率。 In summary, the present invention divides the memory array into p

q memory partitions are combined with a ladder adder to complete the multiplication and addition operations to be performed. Under the framework of the present invention, the number of weighted bits can be adjusted according to the number of conduction of the bit line selection switch. In addition, the size of the value generated during the calculation process can be reduced, the data storage requirement can be effectively reduced, the burden on the hardware can be reduced, and the calculation efficiency can be increased.

100、700:記憶體內運算裝置 110、200:記憶體陣列 120、400:階梯式加法器 411~41p、500:第一子階梯式加法器 420、600:第二子階梯式加法器 720:正規化電路 721:乘法器 722:全加器 730:量化器 731:除法器 AD11~ADqp:類比數位轉換器 BF:偏移參數 BL1~BLL:分區位元線 BLT1~BLT6:位元線選擇開關 CDR1~CDRp:第一方向運算結果 CR:運算結果 CSL:共同源極線 CT1~CT6:控制信號 DEN:參考數值 FAD11~FADN1、FAD11a~FADM1a:全加器 GBL、GBL_11~GBL_qp:總體位元線 IN11~INp4:輸入信號 LA1~LAN、LB1~LBM:層 MB11~MBqp:記憶分區 MC1~MCK+1:記憶胞 NCR:調整後運算結果 OCR:輸出運算結果 SF:縮放倍率 SF11~SFN1、SF11a~SFM1a:移位器 ST1~STK+1:選擇開關 SV11~SVqp:子輸出值 100, 700: In-memory computing device 110, 200: memory array 120, 400: Step adder 411~41p, 500: the first sub-ladder adder 420, 600: second sub-ladder adder 720: Normalized Circuit 721: Multiplier 722: Full Adder 730: quantizer 731: Divider AD11~ADqp: Analog-to-digital converter BF: Offset parameter BL1~BLL: Partition bit line BLT1~BLT6: bit line selection switch CDR1~CDRp: first direction calculation result CR: operation result CSL: Common Source Line CT1~CT6: control signal DEN: reference value FAD11~FADN1, FAD11a~FADM1a: full adder GBL, GBL_11~GBL_qp: overall bit line IN11~INp4: Input signal LA1~LAN, LB1~LBM: Layer MB11~MBqp: memory partition MC1~MCK+1: memory cell NCR: adjusted calculation result OCR: output operation result SF: zoom ratio SF11~SFN1, SF11a~SFM1a: shifter ST1~STK+1: selector switch SV11~SVqp: Sub output value

圖1繪示本發明一實施例的記憶體內運算裝置的示意圖。圖2A繪示本發明實施例的記憶體內運算裝置中的記憶體陣列的分割方式的示意圖。圖2B以及圖2C繪示本發明實施例的權重位元數調整方式的示意圖。圖3A以及圖3B分別繪示本發明實施例的記憶分區的不同實施方式的示意圖。圖4繪示本發明實施例的階梯式加法器的實施方式的示意圖。圖5繪示本發明實施例的第一子階梯式加法器的實施方式的示意圖。圖6繪示本發明實施例的第二子階梯式加法器的實施方式的示意圖。圖7繪示本發明另一實施例的記憶體內運算裝置的部分電路的示意圖。 FIG. 1 is a schematic diagram of an in-memory arithmetic device according to an embodiment of the present invention. FIG. 2A is a schematic diagram of the partitioning method of the memory array in the in-memory arithmetic device according to an embodiment of the present invention. FIG. 2B and FIG. 2C are schematic diagrams of adjusting the number of weight bits according to an embodiment of the present invention. 3A and 3B respectively show schematic diagrams of different implementations of the memory partition according to an embodiment of the present invention. FIG. 4 is a schematic diagram of an implementation of a stepped adder according to an embodiment of the present invention. FIG. 5 is a schematic diagram of an implementation of a first sub-ladder adder according to an embodiment of the present invention. FIG. 6 is a schematic diagram of an implementation of a second sub-ladder adder according to an embodiment of the present invention. FIG. 7 is a schematic diagram of a partial circuit of an in-memory arithmetic device according to another embodiment of the present invention.

100:記憶體內運算裝置 100: In-memory computing device

110:記憶體陣列 110: memory array

120:階梯式加法器 120: Ladder adder

AD11~ADqp:類比數位轉換器 AD11~ADqp: Analog-to-digital converter

CR:運算結果 CR: operation result

MB11~MBqp:記憶分區 MB11~MBqp: memory partition

SV11~SVqp:子輸出值 SV11~SVqp: Sub output value

Claims

An in-memory arithmetic device includes: a memory array divided into p×q memory partitions, p and q are both positive integers greater than 1, the memory array receives a plurality of input signals, and the memory partitions are respectively coupled At most a plurality of global bit lines, each of the memory partitions has a plurality of partition bit lines, the partition bit lines are respectively coupled to the corresponding global bit lines through a plurality of bit line selection switches, the bit lines The line selection switches are turned on or off respectively according to a plurality of control signals; p×q analog-to-digital converters are respectively coupled to the overall bit lines, and the analog-to-digital converters respectively convert the overall bit lines And a ladder adder, coupled to the analog-to-digital converters, to perform an addition operation on the sub-output values to generate an operation result, wherein each of the memory The number of turned-on of the bit line selection switches of the partition represents the number of weighted bits of each memory partition.

The in-memory arithmetic device according to claim 1, wherein the stepped adder includes: p first sub-step adders, each of the first sub-step adders is respectively coupled to q of the analog-to-digital converters , The first sub-ladder adders respectively generate p first-direction calculation results; and a second sub-ladder adder, coupled to the first sub-ladder adders, based on the first-direction calculation results To produce the result of the operation.

The in-memory arithmetic device according to claim 2, wherein each of the first sub-ladder adders has N layers, and each layer includes at least one full adder and at least one shifter, where N=log ₂ q.

The in-memory arithmetic device according to claim 3, wherein the first layer of each of the first sub-ladder adders has q/2 of the at least one full adder and q/2 of the at least one shifter, the The at least one full adder and the at least one shifter are arranged alternately in order to respectively receive the corresponding sub-output values.

The in-memory arithmetic device according to claim 4, wherein each of the first sub-ladder adders has q/2 ^r of the at least one full adder and q/2 ^r of the at least one shift in the rth layer The at least one full adder and the at least one shifter in the same layer are arranged in a staggered order, and are respectively coupled to the output terminal of the at least one full adder of the previous layer, where 1<r

N.

The in-memory arithmetic device according to claim 2, wherein the second sub-ladder adder has M layers, and each layer includes at least one full adder and at least one shifter, where M=log ₂ p, the At least one full adder and at least one shifter are arranged alternately to receive the first direction operation results respectively.

The in-memory arithmetic device according to claim 6, wherein the first layer of the second sub-ladder adder has p/2 the at least one full adder and p/2 the at least one shifter, the at least A full adder and the at least one shifter are sequentially interleaved to receive the first direction operation results respectively.

The in-memory arithmetic device according to claim 6, wherein the ^s- th layer of the second sub-ladder adder has p/2 s the at least one full adder and p/2 ^s the at least one shifter , The at least one full adder and the at least one shifter in the same layer are arranged in a staggered order, and are respectively coupled to the output terminal of the at least one full adder of the previous layer, where 1<s

M.

The in-memory arithmetic device according to claim 1, further comprising: a normalization circuit, coupled to the step adder, performs a normalization action on the calculation result according to a zoom factor and an offset parameter, and generates a Operation result after adjustment.

The in-memory arithmetic device according to claim 9, further comprising: a quantizer, coupled to the normalization circuit, performs a quantization operation on the adjusted operation result according to a reference value, and generates an output operation result.

The in-memory arithmetic device according to claim 10, wherein the quantizer is a divider that divides the adjusted operation result by the reference value to generate the output operation result.

The in-memory arithmetic device according to claim 9, wherein the normalization circuit includes: a multiplier for multiplying the calculation result by the scaling factor; and a full adder for offsetting the output of the multiplier The parameters are added to produce the adjusted calculation result.

The in-memory arithmetic device according to claim 1, wherein the memory array includes a plurality of two-transistor (2T) trans-OR flash memory cells.

The in-memory arithmetic device according to claim 13, wherein the control terminal of a selection transistor in each of the two-transistor type NAND flash memory cell receives one of the input signals.