TWI741766B - In-memory computation device - Google Patents
In-memory computation device Download PDFInfo
- Publication number
- TWI741766B TWI741766B TW109129491A TW109129491A TWI741766B TW I741766 B TWI741766 B TW I741766B TW 109129491 A TW109129491 A TW 109129491A TW 109129491 A TW109129491 A TW 109129491A TW I741766 B TWI741766 B TW I741766B
- Authority
- TW
- Taiwan
- Prior art keywords
- memory
- sub
- adder
- ladder
- arithmetic device
- Prior art date
Links
Images
Landscapes
- Analogue/Digital Conversion (AREA)
Abstract
Description
本發明是有關於一種記憶體內運算裝置,且特別是有關於一種可降低資料儲存的需求的記憶體內運算裝置。The present invention relates to an in-memory arithmetic device, and more particularly to an in-memory arithmetic device that can reduce the demand for data storage.
近年來,用於邊緣計算的深度類神經網路(Deep neural networks, DNN)的人工智能(AI)加速器,對於人工智能型的物聯網(AIoT)的應用程序的整合與實施變得越來越重要。除了傳統的馮·諾依曼(Von Neumann)計算結構外,一種可進一步提升計算效率的記憶體內運算(Computation In Memory, CIM)的架構被提出。In recent years, artificial intelligence (AI) accelerators for deep neural networks (DNN) for edge computing have become more and more integrated and implemented for artificial intelligence-based Internet of Things (AIoT) applications. important. In addition to the traditional Von Neumann computing structure, a Computation In Memory (CIM) architecture that can further improve computing efficiency is proposed.
然而,在多個輸入信號以及多個權重的乘加運算中,大範圍的以及大量的數據無法避免的會被產生,因此,要如何減低記憶體內運算裝置所需的資料儲存的需求以及功率消耗,成為本領域工程人員的重要課題。However, in the multiplication and addition operations of multiple input signals and multiple weights, a large range and a large amount of data will inevitably be generated. Therefore, how to reduce the data storage requirements and power consumption of the in-memory computing device , Has become an important subject for engineers in this field.
本發明提供一種記憶體內運算裝置,可降低資料儲存的需求。The invention provides an in-memory arithmetic device, which can reduce the demand for data storage.
本發明的記憶體內運算裝置包括記憶體陣列、p q個類比數位轉換器以及階梯式加法器。記憶體陣列區分為p q個記憶分區(tiles),其中p、q均為大於1的正整數。各記憶分區具有多條分區位元線,以分別透過多個位元線選擇開關耦接至對應的總體位元線。位元線選擇開關分別依據多個控制信號以被導通或斷開。記憶體陣列接收多個輸入信號。類比數位轉換器分別耦接至記憶分區的多條總體位元線。類比數位轉換器分別轉換總體位元線上的電信號以產生數位的p q個子輸出值。階梯式加法器耦接類比數位轉換器,針對子輸出值執行加法運算以產生運算結果。 The in-memory arithmetic device of the present invention includes a memory array, p q analog-to-digital converters and ladder adders. The memory array is divided into p q memory partitions (tiles), where p and q are both positive integers greater than 1. Each memory partition has a plurality of partition bit lines, which are respectively coupled to the corresponding overall bit lines through a plurality of bit line selection switches. The bit line selection switch is turned on or off according to a plurality of control signals, respectively. The memory array receives multiple input signals. The analog-to-digital converter is respectively coupled to a plurality of global bit lines of the memory partition. The analog-to-digital converter separately converts the electrical signals on the overall bit line to generate digital p q sub-output values. The ladder adder is coupled to the analog-to-digital converter, and performs addition operation on the sub-output value to generate the operation result.
基於上述,本發明透過將記憶體陣列區分為多個記憶分區,各記憶分區可透過調整位元線選擇開關的導通數量來調整權重的位元數。多個記憶分區依據所接收的輸入信號分別產生多個子輸出值。再透過階梯式加法器來針對子輸出值執行加法運算以產生運算結果。透過上述的架構,記憶體內運算裝置中,用來執行乘加運算的資料儲存的需求可以減小,可有效降低硬體成本以及功率消耗,並提升計算的速率。Based on the above, the present invention divides the memory array into a plurality of memory partitions, and each memory partition can adjust the number of weighted bits by adjusting the number of conduction of the bit line selection switch. The multiple memory partitions respectively generate multiple sub-output values according to the received input signal. Then, an addition operation is performed on the sub-output value through a ladder adder to generate an operation result. Through the above-mentioned structure, the data storage requirements for performing multiplication and addition operations in the in-memory arithmetic device can be reduced, which can effectively reduce the hardware cost and power consumption, and increase the calculation rate.
請參照圖1,圖1繪示本發明一實施例的記憶體內運算裝置的示意圖。記憶體內運算裝置100可應用於深度類神經網路(Deep neural networks, DNN)的計算。記憶體內運算裝置100包括記憶體陣列110、p
q個類比數位轉換器AD11~ADqp以及階梯式加法器120。在本實施例中,記憶體陣列110可被區分為p
q個記憶分區MB11~MBqp,其中的p、q均為大於1的正整數。各記憶分區MB11~MBqp均可接收多個輸入信號。各記憶分區MB11~MBqp中的多個記憶胞並提供多個權重值,並依據輸入信號以及權重值執行乘加運算。
Please refer to FIG. 1. FIG. 1 is a schematic diagram of an in-memory arithmetic device according to an embodiment of the present invention. The in-
類比數位轉換器AD11~ADqp分別耦接至記憶分區MB11~MBqp。在本實施例中,各記憶分區MB11~MBqp中具有總體位元線(global bit line)。類比數位轉換器AD11~ADqp分別耦接至記憶分區MB11~MBqp的總體位元線。類比數位轉換器AD11~ADqp並分別針對記憶分區MB11~MBqp的總體位元線上的電信號進行數比數位轉換動作,並藉以產生p q個子輸出值SV11~SVqp。其中,上述的電信號可以為電壓信號或是電流信號。 The analog-to-digital converters AD11~ADqp are respectively coupled to the memory partitions MB11~MBqp. In this embodiment, each memory partition MB11-MBqp has a global bit line (global bit line). The analog-to-digital converters AD11~ADqp are respectively coupled to the overall bit lines of the memory partitions MB11~MBqp. The analog-to-digital converter AD11~ADqp performs digital-to-digital conversion for the electrical signals on the overall bit line of the memory partition MB11~MBqp, and generates p q sub-output values SV11~SVqp. Wherein, the above-mentioned electrical signal can be a voltage signal or a current signal.
在本實施例中,每一記憶分區MB11~MBqp中均具有多條分區位元線。對應相同的記憶分區中的所有分區位元線,均耦接至相對應的總體位元線。In this embodiment, each memory partition MB11~MBqp has multiple partition bit lines. All the partition bit lines corresponding to the same memory partition are coupled to the corresponding overall bit lines.
階梯式加法器120則耦接至類比數位轉換器AD11~ADqp。階梯式加法器120接收子輸出值SV11~SVqp,並針對子輸出值AD11~ADqp執行加法運算,並藉以產生運算結果CR。The
以下請參照圖2A,圖2A繪示本發明實施例的記憶體內運算裝置中的記憶體陣列的分割方式的示意圖。在本實施例中,記憶體陣列200包括多個反或式快閃(NOR flash)記憶胞。記憶體陣列200可區分為p
q個記憶分區MB11~MBqp,其中包括p個記憶分區列以及q個記憶分區行。p、q均為大於1的正整數。記憶分區MB11~MBqp分別具有多個總體位元線GBL_11~GBL_qp。設置在相同列的記憶分區MB11~MBqp可接收相同的輸入信號,例如,與設置在相同列的記憶分區MB11、MBq1接收輸入信號IN11~IN14,設置在相同列的記憶分區MB1p、MBqp接收輸入信號INp1~INp4。
Please refer to FIG. 2A below. FIG. 2A illustrates a schematic diagram of a partitioning method of the memory array in the in-memory arithmetic device according to an embodiment of the present invention. In this embodiment, the
對應記憶分區MB11~MBqp,本發明實施例的記憶體內運算裝置並設置p q個類比數位轉換器AD11~ADqp。類比數位轉換器AD11~ADqp分別耦接至總體位元線GBL_11~GBL_qp,並針對總體位元線GBL_11~GBL_qp上的電信號進行類比數位轉換動作,以分別產生p q個子輸出值。 Corresponding to the memory partitions MB11~MBqp, the in-memory arithmetic device of the embodiment of the present invention is set with p q analog-to-digital converters AD11~ADqp. The analog-to-digital converters AD11~ADqp are respectively coupled to the global bit lines GBL_11~GBL_qp, and perform analog-to-digital conversion actions on the electrical signals on the global bit lines GBL_11~GBL_qp to respectively generate p q sub-output values.
在此,記憶體陣列200中的記憶胞可以預先被程式化為0或1的值,並透過使記憶胞接收的字元線電壓為選中或未選中,以使記憶胞提供所需要的權重值。Here, the memory cells in the
附帶一提,在圖2A的實施方式中,若記憶體陣列200具有j條記憶胞行(可提供j位元權重值),則記憶體陣列200最大可提供的權重值最大可至2
j-1,而產生很大的資料儲存需求。透過本發明實施方式,使記憶體陣列200區分為p
q個記憶分區MB11~MBqp,並透過在記憶分區MB11~MBqp分別設置總體位元線GBL_11~GBL_qp的方式,可以使在提供j位元權重值的前提下,各記憶分區MB11~MBqp可提供的權重值最大可至q
(2
j/q-1),有效降低資料儲存需求。
Incidentally, in the embodiment of FIG. 2A, if the
另一方面,在記憶體陣列200具有i條記憶胞列(可提供i位元輸入信號)時,透過使記憶體陣列200區分為p
q個記憶分區MB11~MBqp,可以使單一記憶分區對應的輸入信號的數量降低為i/p個,因此,本實施例的p個記憶分區對應的輸入信號的最大數值可以為p
(2
i/p-1)。
On the other hand, when the
基於上述,透過本發明實施例的p
q個記憶分區MB11~MBqp的區分方式,可以將具有i條記憶胞列以及j條記憶胞行的記憶體陣列200的資料儲存需求,由(2
i-1)
(2
j-1)降低至p
q
(2
i/p-1)
(2
j/q-1)。以i=j=8,p=q=4為範例,資料儲存需求可由65025降低至144。
Based on the above, through the p of the embodiment of the present invention The distinguishing method of q memory partitions MB11~MBqp can meet the data storage requirements of the
此外,請參照圖2B以及圖2C,圖2B以及圖2C繪示本發明實施例的權重位元數調整方式的示意圖。在圖2B中,以圖2A的記憶分區MB11為範例,記憶分區MB11中並具有多個位元線選擇開關BLT1~BLT6。分區位元線(local bit line)BL1~BL6分別透過位元線選擇開關BLT1~BLT6以耦接至總體位元線GBL_11。位元線選擇開關BLT1~BLT6分別受控於控制信號CT1~CT6以分別被導通或斷開。其中,位元線選擇開關BLT1~BLT6的被導通數量可表示記憶分區MB11的權重位元數。In addition, please refer to FIG. 2B and FIG. 2C. FIG. 2B and FIG. 2C are schematic diagrams of the weight bit number adjustment method according to the embodiment of the present invention. In FIG. 2B, taking the memory partition MB11 of FIG. 2A as an example, the memory partition MB11 also has a plurality of bit line selection switches BLT1~BLT6. The local bit lines BL1~BL6 are respectively coupled to the global bit line GBL_11 through bit line selection switches BLT1~BLT6. The bit line selection switches BLT1~BLT6 are controlled by the control signals CT1~CT6 to be turned on or off respectively. Among them, the number of bit line selection switches BLT1 to BLT6 that are turned on may represent the number of weighted bits of the memory partition MB11.
以圖2C為範例,其中位元線選擇開關BLT1~BLT3分別依據控制信號CT1~CT3被導通,而位元線選擇開關BLT4~BLT6分別依據控制信號CT4~CT6被斷開。在此條件下,有效連接至總體位元線GBL_11的三條分區位元線BL1~BL3,可呈現可被編碼為000、001、011以及111的兩個位元的權重。當然,當權重位元數需要被調整時,可透過調整位元線選擇開關BLT1~BLT6的被導通數量來實現。Taking FIG. 2C as an example, the bit line selection switches BLT1~BLT3 are turned on according to the control signals CT1~CT3, and the bit line selection switches BLT4~BLT6 are turned off according to the control signals CT4~CT6, respectively. Under this condition, the three partition bit lines BL1 to BL3, which are effectively connected to the global bit line GBL_11, can exhibit two bit weights that can be coded as 000, 001, 011, and 111. Of course, when the number of weighted bits needs to be adjusted, it can be achieved by adjusting the number of bit line selection switches BLT1 to BLT6 to be turned on.
關於各記憶分區的實施方式,請參照圖3A以及圖3B分別繪示的本發明實施例的記憶分區的不同實施方式的示意圖。在圖3A中,記憶分區MB11包括多個記憶胞MC1~MCK+1,其中,記憶胞MC1~MCK+1為二電晶體(2T)式反或式快閃記憶胞。Regarding the implementation of each memory partition, please refer to FIG. 3A and FIG. 3B for schematic diagrams of different implementations of the memory partition of the embodiment of the present invention, respectively. In FIG. 3A, the memory partition MB11 includes a plurality of memory cells MC1~MCK+1, where the memory cells MC1~MCK+1 are two-transistor (2T) trans-OR flash memory cells.
。記憶胞MC1~MCK+1以平面的方式進行設置。記憶分區MB11另包括分別對應記憶胞MC1~MCK+1的多個選擇開關ST1~STK+1。選擇開關ST1~STK+1由電晶體所構成。記憶胞MC1~MCK+1的與分別對應的選擇開關ST1~STK+1依序串接在對應的分區位元線以及共同源極線CSL間。以記憶胞MC1以及MC2為範例,記憶胞MC1與選擇開關ST1依序串接在分區位元線BL1以及共同源極線CSL間;記憶胞MC2則與選擇開關ST2依序串接在分區位元線BL1以及共同源極線CSL間。此外,記憶分區MB11中的所有分區位元線BL1~BLL均耦接至總體位元線GBL。. The memory cells MC1~MCK+1 are set in a planar manner. The memory partition MB11 further includes a plurality of selection switches ST1~STK+1 respectively corresponding to the memory cells MC1~MCK+1. The selector switches ST1~STK+1 are composed of transistors. The memory cells MC1~MCK+1 and the corresponding selection switches ST1~STK+1 are serially connected between the corresponding partition bit lines and the common source line CSL in sequence. Taking the memory cells MC1 and MC2 as examples, the memory cell MC1 and the selection switch ST1 are serially connected between the partition bit line BL1 and the common source line CSL in sequence; the memory cell MC2 and the selection switch ST2 are serially connected in the partition bit in sequence Between the line BL1 and the common source line CSL. In addition, all the partition bit lines BL1 ˜BLL in the memory partition MB11 are coupled to the global bit line GBL.
在記憶分區MB11中,選擇開關ST1以及ST2的控制端分別接收輸入信號IN11以及IN12。記憶胞MC1以及MC2的閘極則分別接收信號MG1、MG2。記憶胞MC1以及MC2形成2T架構的反或型快閃記憶元件。在進行記憶體內運算動作時,依據輸入信號IN11以及IN12,選擇開關ST1以及ST2分別提供電流,再依據記憶胞MC1以及MC2所提供的轉導值以作為權重值,並產生乘加結果。分區位元線BL1上可傳送依據乘加動作所產生的電壓至總體位元線GBL。In the memory partition MB11, the control ends of the selection switches ST1 and ST2 receive input signals IN11 and IN12, respectively. The gates of the memory cells MC1 and MC2 receive signals MG1 and MG2, respectively. The memory cells MC1 and MC2 form an NOR flash memory device with a 2T structure. When performing arithmetic operations in the memory, according to the input signals IN11 and IN12, the selection switches ST1 and ST2 respectively provide currents, and then the transduction values provided by the memory cells MC1 and MC2 are used as weight values, and the multiplication and addition results are generated. The divided bit line BL1 can transmit the voltage generated according to the multiplying and adding operation to the global bit line GBL.
在本實施方式中,總體位元線GBL耦接至類比數位轉換器AD11。類比數位轉換器AD11則可轉換總體位元線GBL上的電壓,並獲得為數位格式的子輸出值。In this embodiment, the global bit line GBL is coupled to the analog-to-digital converter AD11. The analog-to-digital converter AD11 can convert the voltage on the global bit line GBL and obtain the sub-output value in a digital format.
值得一提的,類比數位轉換器AD11硬體架構,可應用本領域具通常知識者所熟知的類比數位轉換電路來實施,沒有特定的限制。It is worth mentioning that the hardware architecture of the analog-to-digital converter AD11 can be implemented using analog-to-digital conversion circuits well known to those skilled in the art without any specific restrictions.
另外,在圖3B中,記憶分區MB11可應用三維的立體架構來實施。且各分區位元線BL1~BLL可耦接至兩個或兩個以上的記憶胞。圖3B與圖3A實施方式的記憶分區MB11的運作方式相同,在此不多贅述。In addition, in FIG. 3B, the memory partition MB11 can be implemented using a three-dimensional structure. And each partition bit line BL1~BLL can be coupled to two or more memory cells. The operation mode of the memory partition MB11 in the embodiment of FIG. 3B is the same as that of the embodiment of FIG. 3A, and will not be repeated here.
以下請參照圖4,圖4繪示本發明實施例的階梯式加法器的實施方式的示意圖。階梯式加法器400包括多個第一子階梯式加法器411~41p以及第二子階梯式加法器420。對應區分為p
q個記憶分區的記憶體陣列,第一子階梯式加法器411~41p的數量可以是p個。依據圖2的範例,第一子階梯式加法器411~41p可以分別對應至p個記憶分區列,且各第一子階梯式加法器411~41p,耦接至對應的記憶分區列的q個類比數位轉換器。在圖2中,以記憶分區MB11~MBq1所在的記憶分區列為範例,第一子階梯式加法器411可耦接分別對應記憶分區MB11~MBq1的類比數位轉換器AD11~ADq1。
Please refer to FIG. 4 below. FIG. 4 is a schematic diagram of an implementation of a stepped adder according to an embodiment of the present invention. The
第一子階梯式加法器411並針對類比數位轉換器AD11~ADq1所產生的子輸出值進行加法運算來產生第一方向運算結果CDR1。相類似的,第一子階梯式加法器412~41p可透過所執行的加法運算來分別產生多個第一方向運算結果CDR2~CDRp。The first
第二子階梯式加法器420耦接至第一子階梯式加法器411~41p。第二子階梯式加法器420用以針對第一子階梯式加法器411~41p別產生的第一方向運算結果CDR2~CDRp進行加法運算,並據以產生運算結果CR。The second
關於各第一子階梯式加法器411~41p以及第二子階梯式加法器420的實施細節,可參照以下圖5以及圖6的實施方式。For the implementation details of each of the first
圖5繪示本發明實施例的第一子階梯式加法器的實施方式的示意圖。第一子階梯式加法器500耦接至對應相同記憶分區列的q個類比數位轉換器AD11~AD1q。第一子階梯式加法器500具有N個層LA1~LAN,其中N=
。在本實施方式中,各個層LA1~LAN中,具有一個或多個全加器以及移位器。其中,第一層LA1中包括全加器FAD11~FAD1A以及移位器SF11~SF1A。全加器FAD11~FAD1A分別對應移位器SF11~SF1A,並兩兩交錯且依序耦接至類比數位轉換器AD11~AD1q。其中,全加器FAD11~FAD1A的數量為q/2個,移位器SF11~SF1A同樣為q/2個。全加器FAD11~FAD1A分別接收第奇數個的類比數位轉換器AD11、AD13、…所產生的子輸出值,移位器SF11~SF1A則分別接收第偶數個的類比數位轉換器AD12、…、AD1q所產生的子輸出值,並執行位元移動的動作。
FIG. 5 is a schematic diagram of an implementation of a first sub-ladder adder according to an embodiment of the present invention. The first
其中,在本實施方式中,移位器SF11~SF1A用以使所接收的子輸出值進行往高位元方向移位的動作。在本實施方式中,第一層LA1的移位器SF11~SF1A的位元移動數量等於j/q個,其中j為記憶體陣列的記憶胞行的總數量。全加器FAD11~FAD1A並另分別接收移位器SF11~SF1A的輸出,並執行全加動作。Among them, in this embodiment, the shifters SF11 to SF1A are used to shift the received sub-output value in the higher bit direction. In this embodiment, the number of bit shifts of the shifters SF11 to SF1A of the first layer LA1 is equal to j/q, where j is the total number of memory cell rows in the memory array. The full adder FAD11~FAD1A also receives the output of the shifter SF11~SF1A respectively, and executes the full add operation.
附帶一提的,本實施方式中,第二層的移位器位元移動數量等於2
j/q個,其餘依此類推。另外,在第一子階梯式加法器500的第r層中具有q/
個全加器以及q/
個移位器,相同層中的全加器以及移位器依序交錯排列,並分別耦接前一層的全加器的輸出端,其中1<r
N。
Incidentally, in this embodiment, the number of shifter bits in the second layer is equal to 2. j/q, the rest can be deduced by analogy. In addition, in the rth layer of the first
在第N層LAN中則包括單一個全加器FADN1以及單一個移位器SFN1。移位器SFN1的位元移動數量等於 j/q。全加器FADN1則產生第一方向運算結果CDR1。 In the Nth layer of LAN, a single full adder FADN1 and a single shifter SFN1 are included. The number of bit shifts of the shifter SFN1 is equal to j/q. The full adder FADN1 generates the first direction operation result CDR1.
本實施方式中的全加器FAD11~FADN1以及移位器SF11~SFN1的硬體架構,可應用本領域具通常知識者所熟知的全加電路以及數位移位電路來實施,沒有特定的限制。The hardware architectures of the full adders FAD11 to FADN1 and the shifters SF11 to SFN1 in this embodiment can be implemented using full add circuits and digital bit shift circuits that are well known to those skilled in the art, and there is no specific limitation.
圖6繪示本發明實施例的第二子階梯式加法器的實施方式的示意圖。第二子階梯式加法器600包括多個層LB1~LBM,其中每一層LB1~LBM包括至少一全加器以及至少一移位器。其中,第二子階梯式加法器600具有M個層LB1~LBM,且M=
,其中p為第一方向運算結果CDR1~CDRp的數量。在本實施方式中,第一層LB1中具有全加器FAD11a~FAD1Ba以及移位器SF11a~SF1Ba。全加器FAD11a~FAD1Ba分別對應移位器SF11a~SF1Ba,並兩兩交錯排列。全加器FAD11a~FAD1Ba的多個第一輸入端分別接收第奇數個的第一方向運算結果CDR1、CDR3…、CDRp-1,全加器FAD11a~FAD1Ba的多個第二輸入端分別耦接至移位器SF11a~SF1Ba的輸出端。移位器SF11a~SF1Ba的輸入端則分別接收第偶數個的第一方向運算結果CDR2、CDR4…、CDRp。全加器FAD11a~FAD1Ba的數量與移位器SF11a~SF1Ba的數量均等於p/2。
FIG. 6 is a schematic diagram of an implementation of a second sub-ladder adder according to an embodiment of the present invention. The second
另外,在第二子階梯式加法器600的第s層中,則具有p/
個全加器以及p/
個移位器,相同層中的全加器以及移位器依序交錯排列,並分別耦接前一層的全加器的輸出端,其中1<s
。在最後一層LBM中,則具有單一全加器FADM1a以及單一移位器SFM1a。單一全加器FADM1a並用以產生運算結果CR。
In addition, in the s-th layer of the second
在本實施方式中,第二子階梯式加法器600的移位器SF11a~SF1Ba用以使所接收的子輸出值進行往高位元方向移位的動作。第一層LB1中的移位器SF11a~SF1Ba的位元移動數量相同,並均等於i/p個,其中i為輸入信號的位元數。另外,第二子階梯式加法器600的第二層中的移位器的位元移動數量則可以均為2
i/p個,依此類推,最後一層LBM的移位器SFM1a的位元移動數量則可以為
i/p。
In this embodiment, the shifters SF11a to SF1Ba of the second
本實施方式中的全加器FAD11a~FADM1a以及移位器SF11a~SFM1a的硬體架構,可應用本領域具通常知識者所熟知的全加電路以及數位移位電路來實施,沒有特定的限制。另外,本實施方式中的全加器FAD11a~FADM1a的硬體架構可以與圖5實施方式中的全加器FAD11~FADN1的硬體架構相同或不相同。本實施方式中的移位器SF11a~SFM1a的硬體架構可以與圖5實施方式中的移位器SF11~SFN1的硬體架構相同或不相同。The hardware architectures of the full adders FAD11a to FADM1a and the shifters SF11a to SFM1a in this embodiment can be implemented using full add circuits and digital bit shift circuits known to those skilled in the art, and there is no specific limitation. In addition, the hardware architecture of the full adders FAD11a~FADM1a in this embodiment may be the same or different from the hardware architecture of the full adders FAD11~FADN1 in the embodiment of FIG. 5. The hardware architecture of the shifters SF11a~SFM1a in this embodiment may be the same or different from the hardware architecture of the shifters SF11~SFN1 in the embodiment of FIG. 5.
以下請參照圖7,圖7繪示本發明另一實施例的記憶體內運算裝置的部分電路的示意圖。記憶體內運算裝置700另包括正規化電路720以及量化器730。正規化電路720耦接至階梯式加法器710的輸出端,以接收階梯式加法器710所產生的運算結果CR。正規化電路720包括乘法器721以及全加器722。乘法器721接收運算結果CR以及縮放倍率SF,並使運算結果CR以及縮放倍率SF相乘。全加器722則接收乘法器721的輸出,並另接收偏移參數BF。全加器722用以使乘法器721的輸出與偏移參數BF相加,並據以產生調整後運算結果NCR。Please refer to FIG. 7 below. FIG. 7 is a schematic diagram of a partial circuit of an in-memory arithmetic device according to another embodiment of the present invention. The in-
上述的縮放倍率SF以及偏移參數BF可以由設計者自行設定,用以將運算結果CR正規化(normalize)至一合理的數值範圍,以方便後續的運算。The above-mentioned zoom magnification SF and the offset parameter BF can be set by the designer to normalize the calculation result CR to a reasonable value range to facilitate subsequent calculations.
量化器730耦接至正規化電路720,接收調整後運算結果NCR,並使調整後運算結果NCR除以一參考數值DEN以產生輸出運算結果OCR。本實施例中,量化器730可以為一除法器731。其中,參考數值DEN可以為,可為設計者預先設定的非零的一預設數值,沒有特定的限制。The
上述的全加器722、乘法器721以及除法器731的硬體架構可應用本領域熟知的全加器電路、乘法器電路以及除法器電路來分別實施,沒有特定的限制。The aforementioned hardware architectures of the
附帶一提的,本實施例的記憶體內運算裝置700可應用於卷積神經網路(Convolutional Neural Network, CNN)。Incidentally, the in-
綜上所述,本發明藉由使記憶體陣列區分為p q個記憶分區,再配合階梯式加法器以完成所要執行的乘加運算。在本發明的架構下,權重位元數可依據位元線選擇開關的導通數量來調整。並且,運算過程中所產生的數值大小可以被縮減,資料儲存的需求可以有效的被降低,可降低硬體的負擔,並增加計算的效率。 In summary, the present invention divides the memory array into p q memory partitions are combined with a ladder adder to complete the multiplication and addition operations to be performed. Under the framework of the present invention, the number of weighted bits can be adjusted according to the number of conduction of the bit line selection switch. In addition, the size of the value generated during the calculation process can be reduced, the data storage requirement can be effectively reduced, the burden on the hardware can be reduced, and the calculation efficiency can be increased.
100、700:記憶體內運算裝置
110、200:記憶體陣列
120、400:階梯式加法器
411~41p、500:第一子階梯式加法器
420、600:第二子階梯式加法器
720:正規化電路
721:乘法器
722:全加器
730:量化器
731:除法器
AD11~ADqp:類比數位轉換器
BF:偏移參數
BL1~BLL:分區位元線
BLT1~BLT6:位元線選擇開關
CDR1~CDRp:第一方向運算結果
CR:運算結果
CSL:共同源極線
CT1~CT6:控制信號
DEN:參考數值
FAD11~FADN1、FAD11a~FADM1a:全加器
GBL、GBL_11~GBL_qp:總體位元線
IN11~INp4:輸入信號
LA1~LAN、LB1~LBM:層
MB11~MBqp:記憶分區
MC1~MCK+1:記憶胞
NCR:調整後運算結果
OCR:輸出運算結果
SF:縮放倍率
SF11~SFN1、SF11a~SFM1a:移位器
ST1~STK+1:選擇開關
SV11~SVqp:子輸出值
100, 700: In-
圖1繪示本發明一實施例的記憶體內運算裝置的示意圖。 圖2A繪示本發明實施例的記憶體內運算裝置中的記憶體陣列的分割方式的示意圖。 圖2B以及圖2C繪示本發明實施例的權重位元數調整方式的示意圖。 圖3A以及圖3B分別繪示本發明實施例的記憶分區的不同實施方式的示意圖。 圖4繪示本發明實施例的階梯式加法器的實施方式的示意圖。 圖5繪示本發明實施例的第一子階梯式加法器的實施方式的示意圖。 圖6繪示本發明實施例的第二子階梯式加法器的實施方式的示意圖。 圖7繪示本發明另一實施例的記憶體內運算裝置的部分電路的示意圖。 FIG. 1 is a schematic diagram of an in-memory arithmetic device according to an embodiment of the present invention. FIG. 2A is a schematic diagram of the partitioning method of the memory array in the in-memory arithmetic device according to an embodiment of the present invention. FIG. 2B and FIG. 2C are schematic diagrams of adjusting the number of weight bits according to an embodiment of the present invention. 3A and 3B respectively show schematic diagrams of different implementations of the memory partition according to an embodiment of the present invention. FIG. 4 is a schematic diagram of an implementation of a stepped adder according to an embodiment of the present invention. FIG. 5 is a schematic diagram of an implementation of a first sub-ladder adder according to an embodiment of the present invention. FIG. 6 is a schematic diagram of an implementation of a second sub-ladder adder according to an embodiment of the present invention. FIG. 7 is a schematic diagram of a partial circuit of an in-memory arithmetic device according to another embodiment of the present invention.
100:記憶體內運算裝置 100: In-memory computing device
110:記憶體陣列 110: memory array
120:階梯式加法器 120: Ladder adder
AD11~ADqp:類比數位轉換器 AD11~ADqp: Analog-to-digital converter
CR:運算結果 CR: operation result
MB11~MBqp:記憶分區 MB11~MBqp: memory partition
SV11~SVqp:子輸出值 SV11~SVqp: Sub output value
Claims (14)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109129491A TWI741766B (en) | 2020-08-28 | 2020-08-28 | In-memory computation device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109129491A TWI741766B (en) | 2020-08-28 | 2020-08-28 | In-memory computation device |
Publications (2)
Publication Number | Publication Date |
---|---|
TWI741766B true TWI741766B (en) | 2021-10-01 |
TW202209318A TW202209318A (en) | 2022-03-01 |
Family
ID=80782408
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW109129491A TWI741766B (en) | 2020-08-28 | 2020-08-28 | In-memory computation device |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI741766B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6198682B1 (en) * | 1999-02-13 | 2001-03-06 | Integrated Device Technology, Inc. | Hierarchical dynamic memory array architecture using read amplifiers separate from bit line sense amplifiers |
-
2020
- 2020-08-28 TW TW109129491A patent/TWI741766B/en active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6198682B1 (en) * | 1999-02-13 | 2001-03-06 | Integrated Device Technology, Inc. | Hierarchical dynamic memory array architecture using read amplifiers separate from bit line sense amplifiers |
Non-Patent Citations (2)
Title |
---|
C. Xue, et al.,"Embedded 1-Mb ReRAM-Based Computing-in-Memory Macro With Multibit Input and Weight for CNN-Based AI Edge Processors,"IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 55, NO. 1, JANUARY 2020. |
R. Guo, et al. "A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS,"VLSI Symp., 2019 |
Also Published As
Publication number | Publication date |
---|---|
TW202209318A (en) | 2022-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109063825B (en) | Convolutional neural network accelerator | |
WO2021004366A1 (en) | Neural network accelerator based on structured pruning and low-bit quantization, and method | |
CN107689948B (en) | Efficient data access management device applied to neural network hardware acceleration system | |
CN109800876B (en) | Data operation method of neural network based on NOR Flash module | |
US10795729B2 (en) | Data accelerated processing system | |
CN111052153B (en) | Neural network operation circuit using semiconductor memory element and operation method | |
TWI767310B (en) | Processor, computing method, and computer program product | |
WO1990002381A1 (en) | Neurocomputer | |
CN110383282A (en) | The system and method calculated for mixed signal | |
US20230068450A1 (en) | Method and apparatus for processing sparse data | |
CN114115797A (en) | In-memory arithmetic device | |
CN111611195A (en) | Software-definable storage and calculation integrated chip and software definition method thereof | |
JP2021530761A (en) | Low-precision deep neural network enabled by compensation instructions | |
CN114707647A (en) | Precision lossless storage and calculation integrated device and method suitable for multi-precision neural network | |
CN113261015A (en) | Neural network system and data processing technology | |
TWI741766B (en) | In-memory computation device | |
WO2024051525A1 (en) | Long short-term memory neural network circuit and control method | |
CN115879530B (en) | RRAM (remote radio access m) memory-oriented computing system array structure optimization method | |
US20230253032A1 (en) | In-memory computation device and in-memory computation method to perform multiplication operation in memory cell array according to bit orders | |
CN113283591B (en) | Efficient convolution implementation method and device based on Winograd algorithm and approximate multiplier | |
CN114970831A (en) | Digital-analog hybrid storage and calculation integrated equipment | |
CN110780849B (en) | Matrix processing method, device, equipment and computer readable storage medium | |
CN114237548A (en) | Method and system for complex dot product operation based on nonvolatile memory array | |
CN111625760B (en) | Storage and calculation integrated method based on electrical characteristics of flash memory | |
TWI546675B (en) | Mode selective balanced encoded interconnect |