TW202211217A

TW202211217A - Memory device and operation method thereof

Info

Publication number: TW202211217A
Application number: TW110125919A
Authority: TW
Inventors: 胡瀚文; 李永駿; 林柏榕; 王淮慕
Original assignee: 旺宏電子股份有限公司
Priority date: 2020-09-08
Filing date: 2021-07-14
Publication date: 2022-03-16
Also published as: TWI783573B

Abstract

A memory device and an operation method thereof are provided. The memory device includes: a memory array including a plurality of memory cells for storing a plurality of weights; a multiplication circuit for performing bitwise multiplication on a plurality of input data and the weights to generate a plurality of multiplication results, wherein in performing bitwise multiplication, the memory cells generate a plurality of memory cell currents; a digital accumulating circuit for performing a digital accumulating on the multiplication results; an analog accumulating circuit for performing an analog accumulating on the memory cell currents to generate a first MAC operation result; and a decision unit for deciding whether to perform the analog accumulating, the digital accumulating or a hybrid accumulating, wherein in performing the hybrid accumulating, whether the digital accumulating circuit is triggered is based on the first MAC operation result.

Description

Memory device and method of operating the same

本發明係有關於一種具有記憶體內運算(In-Memory-Computing (IMC))的記憶體裝置及其操作方法。The present invention relates to a memory device with In-Memory-Computing (IMC) and an operation method thereof.

人工智慧(AI)已在許多領域中成為高度有效解決方案。AI的關鍵操作在於對大量的輸入資料(如輸入特徵圖(input feature maps))與權重值進行乘積累加運算(multiply-and-accumulation (MAC))。Artificial Intelligence (AI) has become a highly effective solution in many fields. The key operation of AI is to perform multiply-and-accumulation (MAC) operations on a large amount of input data (such as input feature maps) and weight values.

然而，以目前的AI架構而言，容易遇到輸出入瓶頸(IO bottleneck)與低效率的MAC運算流程(inefficient MAC operation flow)。However, with the current AI architecture, it is easy to encounter an IO bottleneck and an inefficient MAC operation flow.

為達到高準確度，可執行具有多位元輸入及多位元權重值的MAC操作。然而，輸出入瓶頸變得更加嚴重，且效率將更低。To achieve high accuracy, MAC operations with multi-bit inputs and multi-bit weight values can be performed. However, the I/O bottleneck becomes more severe and the efficiency will be lower.

記憶體內運算(In-Memory-Computing (IMC))可用於加速MAC運算，因為IMC可減少在中央處理架構下所需要用的複雜算術邏輯單元 (Arithmetic logic unit，ALU)，且提供記憶體內的MAC操作的高並行性(parallelism)。In-Memory-Computing (IMC) can be used to accelerate MAC operations, because IMC can reduce the complex Arithmetic logic unit (ALU) required under the central processing architecture, and provide an in-memory MAC High parallelism of operations.

以非揮發性記憶體式IMC(NVM-based IMC)而言，其優點例如是，非揮發性儲存，資料搬移減少等。In terms of non-volatile memory IMC (NVM-based IMC), its advantages are, for example, non-volatile storage and reduced data transfer.

在進行IMC時，如果能有同時兼顧「操作速度」與「操作準確性」的話，對於IMC性能將可有所助益。When performing IMC, it will be helpful for IMC performance if the "operation speed" and "operation accuracy" can be taken into account at the same time.

根據本案一實例，提出一種記憶體裝置，包括：一記憶體陣列，包括複數個記憶體單元，可用於儲存複數個權重值於該記憶體陣列的該些記憶體單元內；一乘法電路，耦接至該記憶體陣列，該乘法電路對複數個輸入資料與該些權重值進行乘法，以得到複數個乘法結果，其中於進行乘法時，該些記憶體單元產生複數個記憶體單元電流；一數位式累加電路，耦接至該乘法電路，對該些乘法結果進行一數位式累加；一類比式累加電路，耦接至該記憶體陣列，對該些記憶體單元電流進行一類比式累加以產生一第一乘積累加運算(MAC)操作結果；以及一決定單元，耦接至該數位式累加電路與該類比式累加電路，決定進行該類比式累加、該數位式累加或一混合式累加，其中，於進行該混合式累加時，根據該第一乘積累加運算操作結果決定是否觸發該數位式累加電路。According to an example of the present application, a memory device is proposed, comprising: a memory array including a plurality of memory cells, which can be used to store a plurality of weight values in the memory cells of the memory array; a multiplication circuit coupled to connected to the memory array, the multiplication circuit multiplies a plurality of input data and the weight values to obtain a plurality of multiplication results, wherein when the multiplication is performed, the memory cells generate a plurality of memory cell currents; a A digital accumulation circuit, coupled to the multiplication circuit, performs a digital accumulation on the multiplication results; an analog accumulation circuit, coupled to the memory array, performs an analog accumulation on the memory cell currents generating a first multiply-accumulate (MAC) operation result; and a determination unit, coupled to the digital-accumulation circuit and the analog-accumulation circuit, determines to perform the analog-accumulation, the digital-accumulation or a hybrid-accumulation, Wherein, when the hybrid accumulation is performed, whether to trigger the digital accumulation circuit is determined according to the operation result of the first multiply-accumulate operation.

根據本案另一實例，提出一種記憶體裝置之操作方法，包括：儲存複數個權重值於該記憶體裝置之一記憶體陣列的複數個記憶體單元內；對複數個輸入資料與該些權重值進行位元乘法，以得到複數個乘法結果，其中於進行乘法時，該些記憶體單元產生複數個記憶體單元電流；以及決定進行一類比式累加、一數位式累加或一混合式累加，其中，於進行該類比式累加時，對該些記憶體單元電流進行該類比式累加以產生一第一乘積累加運算(MAC)操作結果；於進行該數位式累加時，對該些乘法結果進行該數位式累加產生一第二乘積累加運算操作結果；以及於進行該混合式累加時，根據該第一乘積累加運算操作結果決定是否觸發該數位式累加。According to another example of the present application, a method for operating a memory device is proposed, comprising: storing a plurality of weight values in a plurality of memory cells of a memory array of the memory device; comparing a plurality of input data and the weight values performing a bitwise multiplication to obtain a plurality of multiplication results, wherein when performing the multiplication, the memory cells generate a plurality of memory cell currents; and deciding to perform an analog accumulation, a digital accumulation or a hybrid accumulation, wherein , when performing the analog accumulation, perform the analog accumulation on the memory cell currents to generate a first multiply-accumulate operation (MAC) operation result; when performing the digital accumulation, perform the multiplication results on the multiplication results. The digital accumulation generates a second multiply-accumulate operation result; and when the hybrid accumulation is performed, whether to trigger the digital-accumulation is determined according to the first multiply-accumulate operation result.

為了對本發明之上述及其他方面有更佳的瞭解，下文特舉實施例，並配合所附圖式詳細說明如下：In order to have a better understanding of the above-mentioned and other aspects of the present invention, the following specific examples are given and described in detail in conjunction with the accompanying drawings as follows:

本說明書的技術用語係參照本技術領域之習慣用語，如本說明書對部分用語有加以說明或定義，該部分用語之解釋係以本說明書之說明或定義為準。本揭露之各個實施例分別具有一或多個技術特徵。在可能實施的前提下，本技術領域具有通常知識者可選擇性地實施任一實施例中部分或全部的技術特徵，或者選擇性地將這些實施例中部分或全部的技術特徵加以組合。The technical terms in this specification refer to the common terms in the technical field. If some terms are described or defined in this description, the interpretations of these terms are subject to the descriptions or definitions in this description. Each embodiment of the present disclosure has one or more technical features. Under the premise of possible implementation, those skilled in the art can selectively implement some or all of the technical features in any embodiment, or selectively combine some or all of the technical features in these embodiments.

請參照第1圖，其繪示根據本案一實施例的具有記憶體內運算(In-Memory-Computing (IMC))功能之記憶體裝置100之功能方塊圖。具有記憶體內運算功能之記憶體裝置100包括：記憶體陣列110、乘法電路120、輸出入電路130、數位式累加電路135、類比式累加電路160、決定單元170與比較器163。數位式累加電路135包括分群電路140與計數單元150。類比式累加電路160包括類比數位轉換單元161。其中，記憶體陣列110、乘法電路120與類比式累加電路160、類比數位轉換單元161是類比的，而數位式累加電路135、分群電路140與計數單元150是數位的。Please refer to FIG. 1 , which shows a functional block diagram of a memory device 100 having an In-Memory-Computing (IMC) function according to an embodiment of the present invention. The memory device 100 with in-memory computing function includes: a memory array 110 , a multiplying circuit 120 , an I/O circuit 130 , a digital accumulation circuit 135 , an analog accumulation circuit 160 , a determination unit 170 and a comparator 163 . The digital accumulating circuit 135 includes a grouping circuit 140 and a counting unit 150 . The analog accumulation circuit 160 includes an analog to digital conversion unit 161 . Among them, the memory array 110, the multiplication circuit 120 and the analog accumulation circuit 160 and the analog to digital conversion unit 161 are analog, while the digital accumulation circuit 135, the grouping circuit 140 and the counting unit 150 are digital.

記憶體陣列110包括複數個記憶體單元111。在本案一實施例中，記憶體單元111例如但不受限於，為非揮發性記憶體單元。當進行MAC操作時，記憶體單元111可用於儲存權重值(weight)。The memory array 110 includes a plurality of memory cells 111 . In an embodiment of the present application, the memory unit 111 is, for example, but not limited to, a non-volatile memory unit. When performing MAC operations, the memory unit 111 may be used to store weights.

乘法電路120耦接至記憶體陣列110。乘法電路120包括複數個單位元乘法單元121。各單位元乘法單元121包括：輸入閂鎖器121A、感應放大器(SA)121B、輸出閂鎖器121C與共同資料閂鎖器(common data latch(CDL))121D。輸入閂鎖器121A耦接至記憶體陣列110。感應放大器121B耦接至輸入閂鎖器121A。輸出閂鎖器121C耦接至感應放大器121B。共同資料閂鎖器121D耦接至輸出閂鎖器121C。The multiplying circuit 120 is coupled to the memory array 110 . The multiplying circuit 120 includes a plurality of unitary multiplying units 121 . Each unit multiplication unit 121 includes an input latch 121A, a sense amplifier (SA) 121B, an output latch 121C, and a common data latch (CDL) 121D. The input latch 121A is coupled to the memory array 110 . The sense amplifier 121B is coupled to the input latch 121A. The output latch 121C is coupled to the sense amplifier 121B. The common data latch 121D is coupled to the output latch 121C.

輸出入電路130耦接至乘法電路120、分群電路140與計數單元150，用以接收輸入資料，並將記憶體裝置100所得到的輸出資料輸出。The I/O circuit 130 is coupled to the multiplying circuit 120 , the grouping circuit 140 and the counting unit 150 for receiving input data and outputting the output data obtained by the memory device 100 .

數位式累加電路135用以進行數位式累加，其細節將於底下說明之。The digital accumulation circuit 135 is used for digital accumulation, the details of which will be described below.

類比式累加電路160用以進行類比式累加，其細節將於底下說明之。The analog accumulation circuit 160 is used for analog accumulation, and the details of which will be described below.

決定單元170決定記憶體裝置100進行類比式累加、數位式累加或混合式累加。決定單元170可分別輸出致能信號EN1與EN2至類比式累加電路160與數位式累加電路135，以決定是否致能類比式累加電路160或數位式累加電路135。The determination unit 170 determines whether the memory device 100 performs analog accumulation, digital accumulation or hybrid accumulation. The determining unit 170 can output the enabling signals EN1 and EN2 to the analog accumulating circuit 160 and the digital accumulating circuit 135 respectively to determine whether to enable the analog accumulating circuit 160 or the digital accumulating circuit 135 .

「類比式累加」代表致能類比式累加電路160但不致能數位式累加電路135。「數位式累加」代表致能數位式累加電路135但不致能類比式累加電路160。「混合式累加」代表致能數位式累加電路135與類比式累加電路160。"Analog accumulation" means that the analog accumulation circuit 160 is enabled but the digital accumulation circuit 135 is not enabled. "Digital accumulation" means that the digital accumulation circuit 135 is enabled but the analog accumulation circuit 160 is not enabled. "Hybrid accumulation" means that the digital accumulation circuit 135 and the analog accumulation circuit 160 are enabled.

類比數位轉換單元161耦接至記憶體陣列110的該些記憶體單元111。該些記憶體單元111的該些單元電流可累加後輸入至類比數位轉換單元161，以轉換成第一MAC操作結果OUT1。The analog-to-digital conversion unit 161 is coupled to the memory cells 111 of the memory array 110 . The cell currents of the memory cells 111 can be accumulated and input to the analog-to-digital conversion unit 161 to be converted into the first MAC operation result OUT1 .

比較器163耦接至類比數位轉換單元161，用以比較第一MAC操作結果OUT1與一觸發參考值。於選擇「混合式累加」時，當第一MAC操作結果OUT1低於該觸發參考值，該比較器163不輸出一觸發信號TS至數位式累加電路135(亦即，該數位式累加電路135不會被觸發)；以及，當第一MAC操作結果OUT1高於該觸發參考值，該比較器163輸出該觸發信號TS至數位式累加電路135，以觸發該數位式累加電路135進行數位式累加。當選擇「類比式累加」時，該比較器163所輸出的觸發信號TS會被數位式累加電路135所忽略。The comparator 163 is coupled to the analog-to-digital conversion unit 161 for comparing the first MAC operation result OUT1 with a trigger reference value. When "Hybrid accumulation" is selected, when the first MAC operation result OUT1 is lower than the trigger reference value, the comparator 163 does not output a trigger signal TS to the digital accumulation circuit 135 (that is, the digital accumulation circuit 135 does not output a trigger signal TS to the digital accumulation circuit 135). will be triggered); and, when the first MAC operation result OUT1 is higher than the trigger reference value, the comparator 163 outputs the trigger signal TS to the digital accumulation circuit 135 to trigger the digital accumulation circuit 135 to perform digital accumulation. When the “analog accumulation” is selected, the trigger signal TS output by the comparator 163 will be ignored by the digital accumulation circuit 135 .

在本案一實施例中，「類比式累加」可用於快速濾掉無用資料，以增快MAC的操作速度。而「數位式累加」則可累積未被濾掉的資料，以增加MAC的準確度。而「混合式累加」則由於使用低解析度量化操作，可以減少變動影響(variation influence)，此外，也可以避免累積無用資料，且可以維持解析度。亦即，「混合式累加」兼顧「類比式累加」與「混合式累加」的優點，而減少其缺點。In an embodiment of the present application, the "analog accumulation" can be used to quickly filter out useless data, so as to speed up the operation speed of the MAC. The "digital accumulation" can accumulate the data that is not filtered out to increase the accuracy of the MAC. The "hybrid accumulation" uses low-resolution quantization operations, which can reduce the variation influence. In addition, it can avoid the accumulation of useless data and maintain the resolution. That is, "hybrid accumulation" takes into account the advantages of "analog accumulation" and "hybrid accumulation", while reducing its disadvantages.

分群電路140耦接至乘法電路120。分群電路140包括複數個分群單元141。該些分群單元141對於該些單位元乘法單元121的多個乘法結果進行分群操作，以得到複數個分群結果。在本案一可能實施例中，分群操作例如可由多數決技術(majority technique)所實施，例如多數決功能技術(majority function technique)，分群電路140由根據多數決功能技術的多數決群體電路(majority grouping circuit)所實施，分群單元141由分散式多數決群體單元(majority grouping unit)所實施，但本案並不受限於此。分群技術可由其他相似技術來實施。在本案一實施例中，分群電路140可選擇性地設置。The grouping circuit 140 is coupled to the multiplying circuit 120 . The grouping circuit 140 includes a plurality of grouping units 141 . The grouping units 141 perform grouping operations on the multiplication results of the unit element multiplication units 121 to obtain a plurality of grouping results. In a possible embodiment of the present case, the grouping operation can be implemented by, for example, a majority technique, such as a majority function technique, and the grouping circuit 140 is performed by a majority grouping circuit according to the majority function technique. circuit), the grouping unit 141 is implemented by a decentralized majority grouping unit, but the present case is not limited to this. The clustering technique can be implemented by other similar techniques. In an embodiment of the present application, the grouping circuit 140 can be selectively provided.

計數單元150耦接至分群電路140或乘法電路120。在本案一實施例中，計數單元150用以對乘法電路120的乘法結果進行位元計數(bitwise counting)或位元累積(bitwise accumulation)，以產生第二MAC操作結果OUT2(當記憶體裝置100不包括分群電路140時)。或者是，計數單元150用以對分群電路140的分群結果(例如，多數決結果)進行位元計數或位元累積，以產生第二MAC操作結果OUT2(當記憶體裝置100包括分群電路140時)。在本案一實施例中, 計數單元150可以用已知的計數電路，例如但不限於，漣波計數器(ripple counter)來實現。在本案說明中，計數與累積基本上具有相同意思，而計數器與累積器基本上具有相同意思。The counting unit 150 is coupled to the grouping circuit 140 or the multiplying circuit 120 . In an embodiment of the present application, the counting unit 150 is used to perform bitwise counting or bitwise accumulation on the multiplication result of the multiplication circuit 120 to generate the second MAC operation result OUT2 (when the memory device 100 when the grouping circuit 140 is not included). Alternatively, the counting unit 150 is configured to perform bit count or bit accumulation on the grouping result (eg, the majority result) of the grouping circuit 140 to generate the second MAC operation result OUT2 (when the memory device 100 includes the grouping circuit 140 ) ). In an embodiment of the present application, the counting unit 150 may be implemented by a known counting circuit, such as, but not limited to, a ripple counter. In the description of this case, counting and accumulation basically have the same meaning, and counter and accumulator basically have the same meaning.

細言之，當決定單元170決定記憶體裝置100進行類比式累加時，以第一MAC操作結果OUT1為MAC操作結果。當決定單元170決定記憶體裝置100進行數位式累加時，以第二MAC操作結果OUT2為MAC操作結果。當決定單元170決定記憶體裝置100進行混合式累加時，於數位式累加電路135未被觸發信號TS觸發前，以第一MAC操作結果OUT1為MAC操作結果；以及，於數位式累加電路135被觸發信號TS觸發後，以第二MAC操作結果OUT2為MAC操作結果。Specifically, when the determination unit 170 determines that the memory device 100 performs the analog accumulation, the first MAC operation result OUT1 is used as the MAC operation result. When the determination unit 170 determines that the memory device 100 performs digital accumulation, the second MAC operation result OUT2 is used as the MAC operation result. When the determination unit 170 determines that the memory device 100 performs the hybrid accumulation, before the digital accumulation circuit 135 is not triggered by the trigger signal TS, the first MAC operation result OUT1 is used as the MAC operation result; After the trigger signal TS is triggered, the second MAC operation result OUT2 is used as the MAC operation result.

現請參照第2圖，其顯示根據本案一實施例的資料對映(data mapping)示意圖。如第2圖所示，以各輸入資料(或各權重值)具有N個維度(N是正整數)的8位元為例(但當知本案並不受限於此)。Please refer to FIG. 2 , which shows a schematic diagram of data mapping according to an embodiment of the present application. As shown in FIG. 2, each input data (or each weight value) has an 8-bit element with N dimensions (N is a positive integer) as an example (but it should be understood that this case is not limited to this).

底下以輸入資料的資料對映為例做說明，但當知本案並不受限於此。底下的說明同樣適用於權重值的資料對映。The following is an example of the data mapping of the input data, but it should be known that this case is not limited to this. The following instructions also apply to the data mapping of weight values.

當將輸入資料以二進位8位元表示時，輸入資料(或權重值)分為最高有效位元(most significant bit，MSB)向量(vector)與最低有效位元(least significant bit，LSB)向量。8位元輸入資料(或權重值)的最高有效位向量包括4位元B7~B4，而最低有效位向量包括4位元B3~B0。When the input data is represented by binary 8-bit, the input data (or weight value) is divided into the most significant bit (most significant bit, MSB) vector (vector) and the least significant bit (least significant bit, LSB) vector . The most significant bit vector of the 8-bit input data (or weight value) includes 4 bits B7~B4, and the least significant bit vector includes 4 bits B3~B0.

將輸入資料的MSB向量與LSB向量的各位元以一元編碼(Unary coding)(亦即數值形式(value format))表示。例如，輸入資料的最高有效位向量的位元B7可以表示為B7₀ ~B7₇ ，輸入資料的最高有效位向量的位元B6可以表示為B6₀ ~B6₃ ，輸入資料的最高有效位向量的位元B5可以表示為B5₀ ~B5₁ ，輸入資料的最高有效位向量的位元B4一樣表示為B4。The bits of the MSB vector and the LSB vector of the input data are represented by unary coding (ie, value format). For example, the bit B7 of the most significant bit vector of the input data can be expressed as B7 ₀ ~B7 ₇ , the bit B6 of the most significant bit vector of the input data can be expressed as B6 ₀ ~B6 ₃ , the most significant bit vector of the input data Bit B5 can be represented as B5 ₀ ~B5 ₁ , and bit B4 of the most significant bit vector of the input data is also represented as B4.

將以一元編碼(數值形式)表示的輸入資料的MSB向量的各位元與輸入資料的LSB向量的各位元重複多次以成為展乘積(unfolding dot product, unFDP)形式。例如，輸入資料的MSB的各位元被重複(2⁴ -1)次，同樣地，輸入資料的LSB的各位元被重複(2⁴ -1)次。如此可以將輸入資料以展乘積形式表示。Each bit element of the MSB vector of the input data represented by the unary code (in numerical form) and each bit cell of the LSB vector of the input data are repeated multiple times to be in the form of unfolding dot product (unFDP). For example, the bits of the MSB of the input data are repeated (2 ⁴ -1) times, and similarly, the bits of the LSB of the input data are repeated (2 ⁴ -1) times. In this way, the input data can be expressed in the form of a spread product.

對輸入資料(展乘積形式)與權重值進行乘法操作，以得到乘法操作結果。Multiply the input data (in the form of product product) and the weight value to obtain the result of the multiplication operation.

為方便了解，底下以一例做說明，但當知其並非用於限制本案。For the convenience of understanding, the following example is used to illustrate, but it should be understood that it is not used to limit this case.

現請參照第3A圖，其顯示根據本案一實施例的一維資料對映的一例。如第3A圖所示，輸入資料=(IN₁ , IN₂ )= (2, 1) ，與權重值=(We₁ , We₂ )= (1, 2)。將輸入資料的MSB與LSB以二進位形式表示，所以，IN₁ =10，而 IN₂ =01，相似地，將權重值的MSB與LSB的各位元以二進位形式表示，所以，We₁ =01，而 We₂ =10。Please refer now to FIG. 3A, which shows an example of one-dimensional data mapping according to an embodiment of the present application. As shown in Figure 3A, the input data=(IN ₁ , IN ₂ )= (2, 1) , and the weight value=(We ₁ , We ₂ )= (1, 2). The MSB and LSB of the input data are expressed in binary form, so IN ₁ =10, and IN ₂ =01, similarly, the MSB and LSB bits of the weight value are expressed in binary form, so, We ₁ = 01, and We ₂ =10.

將輸入資料的MSB與LSB，以及，權重值的MSB與LSB，編碼為以一元編碼(數值形式)表示。亦即，將輸入資料的MSB編碼為110，將輸入資料的LSB編碼為001，相似地，將權重值的MSB編碼為001，將權重值的LSB編碼為110。The MSB and LSB of the input data, as well as the MSB and LSB of the weight value, are encoded to be represented by a unary code (in numerical form). That is, the MSB of the input data is coded as 110, the LSB of the input data is coded as 001, similarly, the MSB of the weight value is coded as 001, and the LSB of the weight value is coded as 110.

之後，將以編碼為一元編碼的輸入資料的MSB(110) 的各位元與編碼為一元編碼的輸入資料的LSB(001)的各位元重複多次以成為展乘積(unfolding dot product, unFDP)形式。例如，輸入資料的MSB(110)的各位元被重複3次，所以得到輸入資料的MSB的展乘積形式為111111000。輸入資料的LSB(001)的各位元被重複3次，所以得到輸入資料的LSB的展乘積形式為000000111。After that, the bits of the MSB(110) of the input data encoded as the unary code and the bits of the LSB(001) of the input data of the unary code are repeated many times to become the unfolding dot product (unFDP) form . For example, the bits of the MSB (110) of the input data are repeated three times, so the expanded product form of the MSB of the input data is 111111000. The bits of the LSB(001) of the input data are repeated 3 times, so the LSB of the input data is obtained in the form of the expanded product of 000000111.

對輸入資料(展乘積形式)與權重值進行MAC操作，以得到MAC操作結果。MAC操作結果為：1*0=0、1*0=0、1*1=1、1*0=0、1*0=0、1*1=1、0*0=0、0*0=0、0*1=0、0*1=0、0*1=0、0*0=0、0*1=0、0*1=0、0*0=0、1*1=1、1*1=1、1*0=0。將這些數值相加，則可以得到：0+0+1+0+0+1+0+0+0+0+0+0+0+0+0+1+1+0=4。The MAC operation is performed on the input data (in the form of the product of expansion) and the weight value to obtain the result of the MAC operation. The result of the MAC operation is: 1*0=0, 1*0=0, 1*1=1, 1*0=0, 1*0=0, 1*1=1, 0*0=0, 0*0 =0, 0*1=0, 0*1=0, 0*1=0, 0*0=0, 0*1=0, 0*1=0, 0*0=0, 1*1=1 , 1*1=1, 1*0=0. Adding these values together gives: 0+0+1+0+0+1+0+0+0+0+0+0+0+0+0+1+1+0=4.

由上述可知，如果輸入資料是i位元而權重值是j位元(i與j皆為正整數)，則所用的記憶體單元數量為：(2ⁱ -1)*(2^j -1)。As can be seen from the above, if the input data is i-bit and the weight value is j-bit (both i and j are positive integers), the number of memory cells used is: (2 ⁱ -1)*(2 ^j -1) .

現請參照第3B圖，其顯示根據本案一實施例的資料映對的另一可能例。在第3B圖中，輸入資料是(IN₁ )= (2)，而權重值是(We₁ )=(1)。輸入資料與權重值是4位元。Please refer now to FIG. 3B, which shows another possible example of data mapping according to an embodiment of the present application. In Figure 3B, the input data is (IN ₁ )=(2), and the weight value is (We ₁ )=(1). Input data and weight values are 4 bits.

輸入資料表示為二進位格式時，IN₁ =0010。相似地，權重值表示為二進位格式時，We₁ =0001。When the input data is expressed in binary format, IN ₁ =0010. Similarly, when the weight value is expressed in binary format, We ₁ =0001.

將輸入資料與權重值編碼成一元編碼(數值形式)。例如，輸入資料的最高位元“0”編碼成“00000000”，而輸入資料的最低位元“0”編碼成“0”，以此類推。相似地，權重值的最高位元“0”編碼成“00000000”，而權重值的最低位元“1” 編碼成“1”。Encode the input data and weight values into a unary code (numerical form). For example, the most significant bit "0" of the input data is encoded as "00000000", and the least significant bit "0" of the input data is encoded as "0", and so on. Similarly, the most significant bit "0" of the weight value is encoded as "00000000", and the least significant bit "1" of the weight value is encoded as "1".

編碼成一元編碼的輸入資料的各位元被複製多次以成為展乘積形式。例如，編碼成一元編碼的輸入資料的最高位元301A被複製15次以成為位元303A；以及，編碼成一元編碼的輸入資料的最低位元301B被複製15次以成為位元303B。The bits of the input data coded into the unary code are copied multiple times to form the spread product. For example, the most significant bit 301A of the input data encoded in the unary encoding is copied 15 times to become the bit 303A; and the least significant bit 301B of the input data encoded in the unary encoding is copied 15 times to become the bit 303B.

編碼成一元編碼的權重值302也被複製15次，以表示為展乘積形式。The weight value 302 coded into the unary code is also replicated 15 times to represent it in product form.

對表示為展乘積形式的輸入資料與表示為展乘積形式的權重值進行乘法操作以產生MAC操作結果。詳細而言，輸入資料的位元303A乘上權重值302；輸入資料的位元303B乘上權重值302，依此類推。將乘法值加總可以產生MAC操作結果(“2”)。A multiplication operation is performed on the input data expressed in the form of the spread product and the weight value expressed in the form of the spread product to generate the MAC operation result. Specifically, the bit 303A of the input data is multiplied by the weight value 302; the bit 303B of the input data is multiplied by the weight value 302, and so on. Summing the multiplication values can produce the MAC operation result ("2").

現請參照第3C圖，其顯示根據本案一實施例的資料映對的另一可能例。在第3C圖中，輸入資料是(IN₁ )= (1)，而權重值是(We₁ )=(5)。輸入資料與權重值是4位元。Please refer now to FIG. 3C, which shows another possible example of data mapping according to an embodiment of the present application. In Figure 3C, the input data is (IN ₁ )=(1), and the weight value is (We ₁ )=(5). Input data and weight values are 4 bits.

輸入資料表示為二進位格式時，IN₁ =0001。相似地，權重值表示為二進位格式時，We₁ =0101。When the input data is expressed in binary format, IN ₁ =0001. Similarly, when the weight value is expressed in binary format, We ₁ =0101.

將輸入資料與權重值編碼成一元編碼(數值形式)。Encode the input data and weight values into a unary code (numerical form).

編碼成一元編碼的輸入資料的各位元被複製多次以成為展乘積形式。在第3C圖中，當複製輸入資料的各位元與權重值的各位元時，加入位元“0”。例如，編碼成一元編碼的輸入資料的最高位元311A被複製15次並加入位元“0”以成為位元313A；以及，編碼成一元編碼的輸入資料的最低位元311B被複製15次並加入位元“0”以成為位元313B。藉此將輸入資料表示為展乘積形式。The bits of the input data coded into the unary code are copied multiple times to form the spread product. In Fig. 3C, when duplicating each bit of the input data and each bit of the weight value, a bit "0" is added. For example, the most significant bit 311A of the input data encoded in the unary code is copied 15 times and the bit "0" is added to become the bit 313A; and the least significant bit 311B of the input data encoded in the unary code is copied 15 times and Bit "0" is added to become bit 313B. In this way, the input data is represented in the expanded product form.

相似地，編碼成一元編碼的權重值312也被複製15次，並額外加入位元“0”至各權重值314。藉此將權重值表示為展乘積形式。Similarly, the weight values 312 encoded as unary codes are also replicated 15 times, and an additional bit of "0" is added to each weight value 314 . Thereby, the weight value is expressed in the product form.

對表示為展乘積形式的輸入資料與表示為展乘積形式的權重值進行乘法操作以產生MAC操作結果。詳細而言，輸入資料的位元313A乘上權重值314；輸入資料的位元313B乘上權重值314，依此類推。將乘法值加總可以產生MAC操作結果(“5”)。A multiplication operation is performed on the input data expressed in the form of the spread product and the weight value expressed in the form of the spread product to generate the MAC operation result. Specifically, the bit 313A of the input data is multiplied by the weight value 314; the bit 313B of the input data is multiplied by the weight value 314, and so on. Summing the multiplication values can produce the MAC operation result ("5").

在習知技術中，對8位元輸入資料與8位元權重值進行MAC操作，如果採用直接MAC運算法，則所用的記憶體單元數量為255*255*512=33,292,822。In the prior art, the MAC operation is performed on the 8-bit input data and the 8-bit weight value. If the direct MAC algorithm is used, the number of memory cells used is 255*255*512=33,292,822.

相反地，如上述般，在本案實施例中，對8位元輸入資料與8位元權重值進行MAC操作，則所用的記憶體單元數量為15*15*512*2=115,200*2=230,400。故而，本案實施例在進行MAC操作中所用的記憶體單元數量約為習知技術的0.7%。On the contrary, as described above, in the embodiment of this case, the MAC operation is performed on the 8-bit input data and the 8-bit weight value, and the number of memory cells used is 15*15*512*2=115,200*2=230,400 . Therefore, the number of memory cells used in the MAC operation in this embodiment is about 0.7% of that of the prior art.

在本案實施例中，利用unFDP式的資料映對，可以減少運算時所用的記憶體單元數量，故而可以減少運算成本，且減少錯誤校正碼(ECC，error correction code)成本。另外，也可以容忍錯誤位元效應(fail-bit effect)。In the embodiment of the present application, the unFDP-type data mapping can be used to reduce the number of memory cells used in the operation, thereby reducing the operation cost and the error correction code (ECC, error correction code) cost. In addition, fail-bit effects can also be tolerated.

請再參考第1圖。在本案實施例中，於進行乘法運算時，權重值(轉導值)乃是儲存於記憶體陣列110的該些記憶體單元111內，而輸入資料(電壓)則是由輸出入電路130讀取並傳送給共同資料閂鎖器121D。共同資料閂鎖器121D傳送輸入資料給輸入閂鎖器121A。Please refer to Figure 1 again. In the embodiment of this case, when performing the multiplication operation, the weight value (transduction value) is stored in the memory cells 111 of the memory array 110 , and the input data (voltage) is read by the I/O circuit 130 Fetch and transmit to common data latch 121D. Common data latch 121D transmits input data to input latch 121A.

為更了解本案實施例的乘法運算，現請參考第4圖，其顯示本案實施例的乘法運算的示範例示意圖。第4圖應用於記憶體裝置支援受選位元線讀取(the selected bit-line read function)。第4圖中，輸入閂鎖器121A包括閂鎖器(第一閂鎖器)405與位元線開關410。For a better understanding of the multiplication operation of the embodiment of the present application, please refer to FIG. 4 , which shows a schematic diagram of an exemplary multiplication operation of the embodiment of the present application. Figure 4 applies to a memory device supporting the selected bit-line read function. In FIG. 4 , the input latch 121A includes a latch (first latch) 405 and a bit line switch 410 .

如第4圖所示，將權重值以一元編碼(數值形式)表示(如第2圖)。故而，權重值的最高位元存於8個記憶體單元111內，權重值的次高位元存於4個記憶體單元111內，權重值的第三高位元存於2個記憶體單元111內，權重值的最低位元存於1個記憶體單元111內。As shown in Fig. 4, the weight value is represented by a unary code (numerical form) (as shown in Fig. 2). Therefore, the highest bit of the weight value is stored in 8 memory cells 111 , the second highest bit of the weight value is stored in 4 memory cells 111 , and the third highest bit of the weight value is stored in 2 memory cells 111 . , the lowest bit of the weight value is stored in one memory unit 111 .

同樣地，將輸入資料以一元編碼(數值形式)表示(如第2圖)，故而，輸入資料的最高位元存於8個共同資料閂鎖器121D內，輸入資料的次高位元存於4個共同資料閂鎖器121D內，輸入資料的第三高位元存於2個共同資料閂鎖器121D內，輸入資料的最低位元存於1個共同資料閂鎖器121D內。輸入資料從共同資料閂鎖器121D送至閂鎖器405。Similarly, the input data is represented by unary code (in the form of numerical value) (as shown in Figure 2), therefore, the most significant bit of the input data is stored in the eight common data latches 121D, and the second most significant bit of the input data is stored in 4 In the common data latches 121D, the third high-order bit of the input data is stored in the two common data latches 121D, and the least significant bit of the input data is stored in one common data latch 121D. Input data is sent to latch 405 from common data latch 121D.

於第4圖中，該些複數個位元線開關410耦接於記憶體單元111與感應放大器121B之間。位元線開關410受控於閂鎖器405。例如，當閂鎖器405輸出位元1時，位元線開關410為導通，而當閂鎖器405輸出位元0時，位元線開關410為關閉。In FIG. 4 , the plurality of bit line switches 410 are coupled between the memory unit 111 and the sense amplifier 121B. Bit line switch 410 is controlled by latch 405 . For example, when the latch 405 outputs bit 1, the bit line switch 410 is on, and when the latch 405 outputs bit 0, the bit line switch 410 is off.

此外，當記憶體單元111內的權重值為位元1且位元線開關410為導通(輸入資料為位元1)時，感應放大器121B將感應到記憶體單元電流以產生乘法結果“1”。當記憶體單元111內的權重值為位元0且位元線開關410為導通(輸入資料為位元1)時，感應放大器121B感應不到記憶體單元電流。當記憶體單元111內的權重值為位元1且位元線開關410為關閉(輸入資料為位元0)時，感應放大器121B感應不到記憶體單元電流以產生乘法結果“0”。當記憶體單元111內的權重值為位元0且位元線開關410為關閉(輸入資料為位元0)時，感應放大器121B感應不到記憶體單元電流。In addition, when the weight value in the memory cell 111 is bit 1 and the bit line switch 410 is turned on (the input data is bit 1), the sense amplifier 121B will sense the memory cell current to generate the multiplication result "1" . When the weight value in the memory cell 111 is bit 0 and the bit line switch 410 is turned on (the input data is bit 1), the sense amplifier 121B cannot sense the memory cell current. When the weight value in the memory cell 111 is bit 1 and the bit line switch 410 is off (input data is bit 0), the sense amplifier 121B cannot sense the current of the memory cell to generate the multiplication result "0". When the weight value in the memory cell 111 is bit 0 and the bit line switch 410 is off (the input data is bit 0), the sense amplifier 121B cannot sense the memory cell current.

亦即，經由第4圖的佈局方式，當輸入資料為位元1且權重值為位元1時，感應放大器121B感應到記憶體單元電流以產生乘法結果“1”。至於其他情況，感應放大器121B感應不到記憶體單元電流，以產生乘法結果“0”。That is, through the layout of FIG. 4, when the input data is bit 1 and the weight value is bit 1, the sense amplifier 121B senses the memory cell current to generate a multiplication result “1”. As for other cases, the sense amplifier 121B cannot sense the current of the memory cell, so that the multiplication result "0" is generated.

由該些記憶體單元111所產生的記憶體單元電流IMC會共同輸入至類比數位轉換單元161。The memory cell currents IMC generated by the memory cells 111 are jointly input to the analog-to-digital conversion unit 161 .

至於輸入資料、權重值、數式乘法結果與類比記憶體單元電流IMC之間的關係如下表所示：輸入資料權重值數式乘法結果 IMC 0 0(HVT) 0 0 0 +1(LVT) 0 0 1 0(HVT) 0 IHVT 1 +1(LVT) 1 ILVT As for the relationship between input data, weight value, numerical multiplication result and analog memory cell current IMC, the following table shows: input data Weights result of multiplication IMC 0 0(HVT) 0 0 0 +1 (LVT) 0 0 1 0(HVT) 0 IHVT 1 +1 (LVT) 1 ILVT

在上表中，HVT與LVT分別代表高臨界值記憶體單元與低臨界值記憶體單元。而IHVT與ILVT則分別代表當輸入資料為邏輯1時，高臨界值記憶體單元與低臨界值記憶體單元(權重值分別為0(HTV)與+1(LTV))所產生的類比記憶體單元電流IMC。In the above table, HVT and LVT represent high threshold memory cells and low threshold memory cells, respectively. The IHVT and ILVT respectively represent the analog memory generated by the high-threshold memory cell and the low-threshold memory cell (with weight values of 0 (HTV) and +1 (LTV), respectively) when the input data is logic 1 Cell current IMC.

在本案實施例中，於進行乘法運算時，可以重複使用受選位元線讀取(selected bit line read，SBL-read)指令。故而，本案實施例可以減少以單位元表示(single-bit representation)所來的變動影響(variation influence)。In the embodiment of the present application, the selected bit line read (SBL-read) instruction can be used repeatedly during the multiplication operation. Therefore, the embodiment of the present case can reduce the variation influence caused by the single-bit representation.

現請參考第5A圖，其顯示根據本案一實施例的分群操作(多數決操作)與位元計數(bitwise counting)示意圖。如第5A圖所示，參考符號GM1代表對輸入資料的第一MSB向量與權重值進行位元乘法(bitwise multiplication)後所得到的第一乘法運算結果；參考符號GM2代表對輸入資料的第二MSB向量與權重值進行位元乘法後所得到的第二乘法運算結果；參考符號GM3代表對輸入資料的第三MSB向量與權重值進行位元乘法後所得到的第三乘法運算結果；參考符號GL代表對輸入資料的LSB與權重值進行位元乘法後所得到的第四乘法運算結果。在分群操作(多數決操作)後，對第一乘法運算結果GM1進行分群結果是第一分群結果CB1(其累積權重是2² )；對第二乘法運算結果GM2進行分群結果是第二分群結果CB2(其累積權重是2² )；對第三乘法運算結果GM3進行分群結果是第三分群結果CB3(其累積權重是2² )；以及，對第四乘法運算結果GL進行分群結果是第四分群結果CB4(其累積權重是2⁰ )。Please refer to FIG. 5A , which shows a schematic diagram of grouping operation (majority decision operation) and bitwise counting according to an embodiment of the present application. As shown in FIG. 5A, reference symbol GM1 represents the first multiplication result obtained by performing bitwise multiplication on the first MSB vector of the input data and the weight value; reference symbol GM2 represents the second multiplication result of the input data. The second multiplication result obtained by the bitwise multiplication of the MSB vector and the weight value; the reference symbol GM3 represents the third multiplication result obtained by the bitwise multiplication of the third MSB vector of the input data and the weight value; the reference symbol GL represents the fourth multiplication result obtained by performing the bitwise multiplication of the LSB of the input data and the weight value. After the grouping operation (majority decision operation), the result of grouping the first multiplication result GM1 is the first grouping result CB1 (its cumulative weight is 2 ² ); the result of grouping the second multiplication result GM2 is the second grouping The result CB2 (its cumulative weight is 2 ² ); the result of grouping the third multiplication result GM3 is the third grouping result CB3 (its cumulative weight is 2 ² ); and the result of grouping the fourth multiplication result GL is the The quartile results CB4 (its cumulative weight is 2 ⁰ ).

第5B圖顯示第3C圖的累積例。請參照第3C圖與第5B圖。如第5B圖所示，輸入資料(第3C圖)的位元313B乘上權重值314。從輸入資料(第3C圖)的位元313B乘上權重值314所產生的乘法結果的前四個位元(“0000”)被分群為第一乘法結果“GM1”。相似地，從輸入資料(第3C圖)的位元313B乘上權重值314所產生的乘法結果的第五至第八位元(“0000”)被分群為第二乘法結果“GM2”。從輸入資料(第3C圖)的位元313B乘上權重值314所產生的乘法結果的第九至第十二位元(“1111”)被分群為第三乘法結果“GM3”。從輸入資料(第3C圖)的位元313B乘上權重值314所產生的乘法結果的第十三至第十六位元(“0010”)則被直接計數。Figure 5B shows a cumulative example of Figure 3C. Please refer to Figure 3C and Figure 5B. As shown in Figure 5B, the bit 313B of the input data (Figure 3C) is multiplied by the weight value 314. The first four bits ("0000") of the multiplication result resulting from multiplying bits 313B of the input data (FIG. 3C) by the weight value 314 are grouped into the first multiplication result "GM1". Similarly, the fifth to eighth bits ("0000") of the multiplication result resulting from multiplying bit 313B of the input data (FIG. 3C) by the weight value 314 are grouped into the second multiplication result "GM2". The ninth to twelfth bits ("1111") of the multiplication result resulting from multiplying bits 313B of the input data (FIG. 3C) by the weight value 314 are grouped into a third multiplication result "GM3". The thirteenth to sixteenth bits ("0010") of the multiplication result generated by multiplying the weight value 314 by the bit 313B of the input data (FIG. 3C) are counted directly.

在分群操作(多數決操作)後，第一分群結果CB1是“0”(其累積權重是2² )；第二分群結果CB2是“0”(其累積權重是2² )；第三分群結果CB3是“1”(其累積權重是2² )。在計數時，將這些分群結果CB1~CB4乘上個別累積權重並加以累積而產生MAC操作結果。例如，如第5B圖所示，MAC操作結果(第二MAC操作結果OUT2)是CB1*2² +CB2*2² +CB3*2² +CB4*2⁰ = 0*2² +0*2² +1*2² +1*2⁰ =0000 0000 0000 0000 0000 0000 0000 0101=5。After the grouping operation (majority decision operation), the first grouping result CB1 is "0" (its cumulative weight is 2 ² ); the second grouping result CB2 is "0" (its cumulative weight is 2 ² ); the third grouping result The result CB3 is "1" (its cumulative weight is 2 ² ). During counting, these clustering results CB1 to CB4 are multiplied by the individual cumulative weights and accumulated to generate the MAC operation result. For example, as shown in FIG. 5B, the MAC operation result (the second MAC operation result OUT2) is CB1*2 ² +CB2*2 ² +CB3*2 ² +CB4*2 ⁰ = 0*2 ² +0*2 ² +1*2 ² +1*2 ⁰ =0000 0000 0000 0000 0000 0000 0000 0101=5.

在本案一實施例中，分群原則(多數決原則)可如下所示：群組位元分群結果(多數決結果) 1111(狀況A) 1 1110(狀況B) 1 1100(狀況C) 1或0 1000(狀況D) 0 0000(狀況E) 0 In an embodiment of this case, the grouping principle (majority rule) can be as follows: group bit Group result (majority result) 1111 (Condition A) 1 1110 (Condition B) 1 1100 (Condition C) 1 or 0 1000 (Condition D) 0 0000 (Condition E) 0

在上表中，以狀況A而言，由於群組皆為正確的(“1111”沒有錯誤位元)，故而，其多數決結果為1。以狀況E而言，由於群組皆為正確的(“0000”沒有錯誤位元)，故而，其多數決結果為0。In the above table, in the case of condition A, since the groups are all correct ("1111" has no error bits), the majority result is 1. In case E, since the groups are all correct ("0000" has no error bits), the majority result is 0.

以狀況B而言，由於群組中有1個位元是錯誤(“1110”中的“0”是錯誤的)，透過多數決，可以將“1110”決定為“1”。以狀況D而言，由於群組中有1個位元是錯誤(“0001”中的“1”是錯誤的)，透過多數決，可以將“0001”決定為“0”。In case B, since 1 bit in the group is wrong ("0" in "1110" is wrong), "1110" can be determined as "1" by majority vote. In case D, since 1 bit in the group is wrong ("1" in "0001" is wrong), "0001" can be determined as "0" by majority vote.

以狀況C而言，群組中有2個位元是錯誤的(“1100”中的“00”是錯誤的，或者，“1100”中的“11”是錯誤的) ，透過多數決，可以將“1100”決定為“1” 或“0”。In case C, 2 bits in the group are wrong ("00" in "1100" is wrong, or "11" in "1100" is wrong), by majority decision, it is possible to Determine "1100" as "1" or "0".

故而，在本案實施例中，透過分群(多數決)功能，可以減少錯誤位元。Therefore, in the embodiment of the present application, through the grouping (majority decision) function, the erroneous bits can be reduced.

分群電路140的分群結果係輸入至計數單元150進行位元計數。The grouping result of the grouping circuit 140 is input to the counting unit 150 for bit counting.

在進行計數時，將MSB向量的乘法運算結果的計數結果與LSB向量的乘法運算結果的計數結果進行累積。以第5A圖的情況而言，使用兩種累積器。第一種累積器要被分配較高的累積權重值(例如是2² )。第一種累積器要累積：(1)「對乘法運算結果GM1進行分群(多數決)的所得到分群(多數決)結果：1位元」加上「對乘法運算結果GM2進行分群(多數決)的分群(多數決)結果：1位元」再加上「對乘法運算結果GM3進行分群(多數決)的分群(多數決)結果：1位元」。第一種累積器所得到的計數結果再乘以較高的累積權重值(例如是2² )。第二種累積器要被分配較低的累積權重值(例如是2⁰ )。第二種累積器則是對乘法運算結果GL(多位元)進行直接計數。將經累積權重加權過的兩種累積結果相加，即可得到MAC結果。例如，對乘法運算結果GM1進行分群的所得到分群結果為1(1位元)，對乘法運算結果GM2進行分群的分群結果為0(1位元)，對乘法運算結果GM3進行分群的分群結果為1(1位元)。第一種累積器所得到的計數結果(1+0+1)再乘以2² ，等於2*2² =8。對乘法運算結果GL為4(3位元) ，可直接計數。將經累積權重加權過的兩種累積結果相加，即可得到MAC結果為8+4=12。When the count is performed, the count result of the multiplication operation result of the MSB vector and the count result of the multiplication operation result of the LSB vector are accumulated. In the case of Figure 5A, two types of accumulators are used. The first type of accumulator is to be assigned a higher accumulation weight value (eg 2 ² ). The first accumulator is to accumulate: (1) "The result of grouping (majority decision) of multiplication operation result GM1: 1 bit" plus "grouping (majority decision) of multiplication result GM2" ) of the grouping (majority decision) result: 1 bit" plus "the grouping (majority decision) result of the multiplication result GM3: 1 bit". The count result obtained by the first accumulator is then multiplied by a higher accumulative weight value (eg, 2 ² ). The second type of accumulator is to be assigned a lower accumulation weight value (eg 2 ⁰ ). The second accumulator directly counts the multiplication result GL (multi-bit). The MAC result is obtained by adding the two cumulative results weighted by the cumulative weight. For example, the grouping result obtained by grouping the multiplication result GM1 is 1 (1 bit), the grouping result of grouping the multiplication result GM2 is 0 (1 bit), and the grouping result of grouping the multiplication result GM3 is 1 (1 bit). The count result (1+0+1) obtained by the first accumulator is multiplied by 2 ² , which is equal to 2*2 ² =8. The multiplication result GL is 4 (3 bits), which can be counted directly. Adding the two cumulative results weighted by the cumulative weight, the MAC result is 8+4=12.

由上述可知，在本案實施例中，於進行計數或累積(accumulation)時，由於輸入資料已展開為unFDP形式，可以對儲存在共同資料閂鎖器內的資料進行分群(亦即，分為MSB向量與LSB向量)，藉由分群機制(多數決機制)可以減少在MSB向量/LSB向量內的錯誤位元。As can be seen from the above, in the embodiment of the present case, since the input data has been expanded into the unFDP format, the data stored in the common data latch can be grouped (that is, divided into MSBs) during the counting or accumulation. vector and LSB vector), the error bits in MSB vector/LSB vector can be reduced by grouping mechanism (majority decision mechanism).

此外，在本案實施例中，即便使用傳統的累積器(計數器)，仍可以減少計數/累積時間，這是因為本案實施例乃是使用數位計數指令(錯誤位元計數)，以及對於不同向量(MSB向量與LSB向量)的累積結果給予不同累積權重。以一例而言，可以將累積運算時間減少至約40%。In addition, in the present embodiment, even if a conventional accumulator (counter) is used, the counting/accumulating time can still be reduced, because the present embodiment uses a digit count instruction (error bit count), and for different vectors ( The cumulative results of MSB vector and LSB vector) are given different cumulative weights. In one example, the cumulative computing time can be reduced to about 40%.

第6圖顯示本案一實施例的MAC運算流程。在第6圖中，DMAC代表第一種數位式累加(但不進行分群操作，亦即記憶體裝置100不包括分群電路140)，mDMAC代表第二種數位式累加(進行分群操作，亦即記憶體裝置100包括分群電路140)，AMAC代表「類比式累加」，HMAC代表「混合式累加」。FIG. 6 shows the MAC operation flow of an embodiment of the present application. In FIG. 6, DMAC represents the first type of digital accumulation (but does not perform grouping operations, that is, the memory device 100 does not include the grouping circuit 140), and mDMAC represents the second type of digital accumulation (performs grouping operations, that is, memory The device 100 includes a grouping circuit 140), AMAC stands for "Analog Accumulation" and HMAC stands for "Hybrid Accumulation".

以本案實施例的第一種數位式累加運算流程而言，輸入資料傳輸至記憶體裝置。同時進行位元線設定與字元線設定。於位元線設定完成後，進行感應。進行數位式累加操作。並將數位式累加操作結果回傳。重複上述操作，直到所有輸入資料皆已運算完成。For the first digital accumulation operation process of the embodiment of the present application, the input data is transmitted to the memory device. Bit line setting and word line setting are performed at the same time. Sensing is performed after the bit line setting is completed. Perform a digital accumulation operation. And return the result of the digital accumulation operation. Repeat the above operation until all input data have been calculated.

以本案實施例的第二種數位式累加運算流程而言，透過分群操作可以加速數位式累加的操作速度。For the second digital accumulation operation flow of the embodiment of the present application, the operation speed of the digital accumulation can be accelerated through the grouping operation.

以本案實施例的類比式累加運算流程而言，當在進行感應時，可同時完成ADC轉換與比較操作，所以可以更加提昇MAC操作。According to the analog accumulation operation flow of the embodiment of the present application, when sensing is performed, ADC conversion and comparison operations can be completed at the same time, so the MAC operation can be further improved.

以本案實施例的混合式累加運算流程而言，由於要進行類比式累加與數位式累加，混合式累加的操作速度慢於類比式累加但快於數位式累加。但是，混合式累加的準確度可幾乎等同於數位式累加且高於類比式累加。In terms of the hybrid accumulation operation flow of the embodiment of the present application, since the analog accumulation and digital accumulation are performed, the operation speed of the hybrid accumulation is slower than that of the analog accumulation but faster than that of the digital accumulation. However, hybrid accumulation can be nearly as accurate as digital accumulation and higher than analog accumulation.

由第6圖可知，本案實施例的MAC操作可以分為兩種子操作類型。第一種子操作類型是乘法操作，將輸入資料乘上權重值，乃是根據受選位元線讀取指令而進行。第二種子操作類型則是累積(資料計數)，特別是錯誤位元計數(fail bit counting)。在本案其他可能實施例中，可以加入更多計數單元，以加速計數/累積操作。It can be seen from FIG. 6 that the MAC operation in the embodiment of this case can be divided into two sub-operation types. The first sub-operation type is the multiplication operation, which multiplies the input data by the weight value according to the selected bit line read command. The second type of sub-operation is accumulation (data counting), especially fail bit counting. In other possible embodiments of this case, more counting units may be added to speed up the counting/accumulation operation.

數位式累加的操作時間主要取決於計數單元150的累積速度，因為計數單元150是逐位元計算。類比數位轉換單元161的量化準確度主要取決於記憶體單元的變化容忍度(variation tolerance)。故而，數位式累加相較於類比式累加具有高準確但低累加速度。The operation time of the digital accumulation mainly depends on the accumulation speed of the counting unit 150 because the counting unit 150 calculates bit by bit. The quantization accuracy of the analog-to-digital conversion unit 161 mainly depends on the variation tolerance of the memory cells. Therefore, digital accumulation has higher accuracy but lower accumulation speed than analog accumulation.

此外，在本案實施例中，亦可調整讀取電壓。第7A圖顯示本案實施例中的程式化固定記憶體頁面(memory page)的流程圖，第7B圖顯示本案實施例中的調整讀取電壓的流程圖。In addition, in this embodiment, the read voltage can also be adjusted. FIG. 7A shows a flowchart of programming a fixed memory page in the embodiment of the present invention, and FIG. 7B shows a flowchart of adjusting the read voltage in the embodiment of the present invention.

在第7A圖中，於步驟710中，將一筆已知輸入資料程式化至一固定記憶體頁面內。例如但不受限於，該筆已知輸入資料的位元比率是：75%是位元0，25%是位元1。In FIG. 7A, in step 710, a known input data is programmed into a fixed memory page. For example and without limitation, the bit ratio of the known input data is: 75% is bit 0 and 25% is bit 1.

在第7B圖中，步驟720中，讀取該固定記憶體頁面，並致能ADC。步驟730中，判斷ADC的輸出值是否接近參考測試值(如果ADC是8位元，則參考測試值為127，但本案不受限於此)。如果步驟730為否，則流程接至步驟740。如果步驟730為是，則流程接至步驟750。In FIG. 7B, in step 720, the fixed memory page is read and the ADC is enabled. In step 730, it is determined whether the output value of the ADC is close to the reference test value (if the ADC is 8 bits, the reference test value is 127, but this case is not limited to this). If step 730 is no, the flow proceeds to step 740 . If step 730 is YES, the flow proceeds to step 750 .

步驟740中，如果ADC的輸出值小於參考測試值，則增加讀取電壓；以及，如果ADC的輸出值大於參考測試值，則減少讀取電壓。步驟740結束後，流程回至步驟720。In step 740, if the output value of the ADC is less than the reference test value, the read voltage is increased; and if the output value of the ADC is greater than the reference test value, the read voltage is decreased. After step 740 ends, the flow returns to step 720 .

步驟750中，記錄目前的讀取電壓，以做為後續讀取操作所用。In step 750, the current read voltage is recorded for use in subsequent read operations.

如所知般，讀取電壓將會影響到ADC輸出值與位元1的讀取。故而，在本案實施例中，可以根據操作條件(例如但不受限於，程式化周期，溫度或讀取干擾等)，周期性地校正讀取電壓，以保持高正確性與可靠度。As is known, reading the voltage will affect the ADC output value and the reading of bit 1. Therefore, in the present embodiment, the read voltage can be periodically corrected according to operating conditions (such as, but not limited to, programming cycle, temperature, or read disturbance, etc.) to maintain high accuracy and reliability.

第8圖顯示根據本案一實施例的MAC操作流程。於步驟810中，儲存複數個權重值於該記憶體裝置之一記憶體陣列的複數個記憶體單元內。於步驟820中，對複數個輸入資料與該些權重值進行位元乘法，以得到複數個乘法結果，其中於進行乘法時，該些記憶體單元產生複數個記憶體單元電流。於步驟830中，決定進行一類比式累加、一數位式累加或一混合式累加。於步驟840中，於進行該類比式累加時，對該些記憶體單元電流進行類比式累加以得到一第一乘積累加運算(MAC)操作結果。於步驟850中，於進行該數位式累加時，對該些乘法結果進行數位式累加得到一第二乘積累加運算(MAC)操作結果。於步驟860中，於進行該混合式累加時，根據該第一乘積累加運算操作結果決定是否觸發該數位式累加。FIG. 8 shows a MAC operation flow according to an embodiment of the present application. In step 810, a plurality of weight values are stored in a plurality of memory cells of a memory array of the memory device. In step 820, bitwise multiplication is performed on the plurality of input data and the weight values to obtain a plurality of multiplication results, wherein the memory cells generate a plurality of memory cell currents during the multiplication. In step 830, it is determined to perform an analog accumulation, a digital accumulation or a hybrid accumulation. In step 840, when performing the analog accumulation, analog accumulation is performed on the memory cell currents to obtain a first multiply-accumulate (MAC) operation result. In step 850, when the digital accumulation is performed, the multiplication results are digitally accumulated to obtain a second multiply-accumulate (MAC) operation result. In step 860, when the hybrid accumulation is performed, whether to trigger the digital accumulation is determined according to the result of the first multiply-accumulate operation.

本案實施例可應用於NAND型快閃記憶體，或者敏感於保持與熱變化的記憶體裝置，例如但不受限於，NOR型快閃記憶體，相變(PCM)型快閃記憶體，磁式隨機存取記憶體(magnetic RAM)或電阻式RAM。The embodiments of this case can be applied to NAND-type flash memory, or memory devices that are sensitive to retention and thermal changes, such as, but not limited to, NOR-type flash memory, phase-change (PCM)-type flash memory, Magnetic random access memory (magnetic RAM) or resistive RAM.

本案實施例可應用於3D型記憶體與2D型記憶體，例如但不受限於，2D/3D NAND型快閃記憶體，2D/3D NOR型快閃記憶體，2D/3D相變(PCM)型快閃記憶體，2D/3D磁式隨機存取記憶體(magnetic RAM)或2D/3D電阻式RAM。The embodiments of this case can be applied to 3D memory and 2D memory, such as, but not limited to, 2D/3D NAND flash memory, 2D/3D NOR flash memory, 2D/3D phase change (PCM) ) type flash memory, 2D/3D magnetic random access memory (magnetic RAM) or 2D/3D resistive RAM.

雖然上述實施例中，將輸入資料及/或權重值分為MSB向量與LSB向量(2個向量)，但本案並不受限於此。於本案其他可能實施例中，輸入資料及/或權重值亦可分為更多個向量，此亦在本案精神範圍內。Although in the above-mentioned embodiment, the input data and/or the weight value are divided into MSB vector and LSB vector (two vectors), but the present case is not limited to this. In other possible embodiments of this case, the input data and/or weight values can also be divided into more vectors, which are also within the scope of the spirit of this case.

本案實施例不只可應用多數決分群技術，也可應用其他分群技術，以加速累積。In the embodiment of this case, not only the majority decision grouping technique but also other grouping techniques can be applied to accelerate the accumulation.

本案實施例可應用於，例如但不受限於，臉部辨認等AI技術之中。The embodiments of this case can be applied to, for example, but not limited to, AI technologies such as face recognition.

本案實施例中，類比數位轉換單元161可為電流模式類比數位轉換單元，或者是電壓模式類比數位轉換單元，或者是混合模式類比數位轉換單元。In this embodiment, the analog-to-digital conversion unit 161 may be a current-mode analog-to-digital conversion unit, a voltage-mode analog-to-digital conversion unit, or a mixed-mode analog-to-digital conversion unit.

本案實施例不只可應用於串聯式MAC操作，也可應用於並聯式MAC操作。The embodiment of this case can be applied not only to the serial MAC operation, but also to the parallel MAC operation.

綜上所述，雖然本發明已以實施例揭露如上，然其並非用以限定本發明。本發明所屬技術領域中具有通常知識者，在不脫離本發明之精神和範圍內，當可作各種之更動與潤飾。因此，本發明之保護範圍當視後附之申請專利範圍所界定者為準。To sum up, although the present invention has been disclosed by the above embodiments, it is not intended to limit the present invention. Those skilled in the art to which the present invention pertains can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention shall be determined by the scope of the appended patent application.

100:記憶體裝置 110:記憶體陣列 120:乘法電路 130:輸出入電路 140:分群電路 150:計數單元 111:記憶體單元 121:單位元乘法單元 121A:輸入閂鎖器 121B:感應放大器 121C:輸出閂鎖器 121D:共同資料閂鎖器 141:分群單元 135:數位式累加電路 160:類比式累加電路 170:決定單元 161:類比數位轉換單元 163:比較器 301A、303A、301B、303B、311A、313A、311B、313B:位元 302、312、314:權重值 405:閂鎖器 410:位元線開關 710-750:步驟 810-860:步驟100: Memory device 110: Memory array 120: Multiplication circuit 130: I/O circuit 140: Grouping circuit 150: counting unit 111: Memory unit 121: Identity Multiplication Unit 121A: Input Latch 121B: Sense Amplifier 121C: Output Latch 121D: Common data latch 141: Grouping unit 135: Digital accumulator circuit 160: Analogue accumulator circuit 170: Decision Unit 161: Analog-to-digital conversion unit 163: Comparator 301A, 303A, 301B, 303B, 311A, 313A, 311B, 313B: Bit 302, 312, 314: Weight value 405: Latch 410: Bit line switch 710-750: Procedure 810-860: Procedure

第1圖繪示根據本案一實施例的具有記憶體內運算功能之記憶體裝置之功能方塊圖。第2圖顯示根據本案一實施例的資料對映(data mapping)示意圖。第3A圖至第3C圖顯示根據本案一實施例的資料對映的數個例。第4圖顯示本案實施例的乘法運算的兩種示範例示意圖。第5A圖與第5B圖顯示根據本案一實施例的分群操作(多數決操作)與計數示意圖。第6圖顯示比較本案一實施例的數種MAC運算流程。第7A圖顯示本案實施例中的程式化固定記憶體頁面(memory page)的流程圖，第7B圖顯示本案實施例中的調整讀取電壓的流程圖。第8圖顯示根據本案一實施例的MAC操作流程。FIG. 1 is a functional block diagram of a memory device with an in-memory computing function according to an embodiment of the present application. FIG. 2 shows a schematic diagram of data mapping according to an embodiment of the present application. Figures 3A to 3C show several examples of data mapping according to an embodiment of the present application. FIG. 4 shows a schematic diagram of two exemplary examples of the multiplication operation in the embodiment of the present invention. 5A and 5B show schematic diagrams of grouping operation (majority decision operation) and counting according to an embodiment of the present invention. FIG. 6 shows a comparison of several MAC operation flows of an embodiment of the present application. FIG. 7A shows a flowchart of programming a fixed memory page in the embodiment of the present invention, and FIG. 7B shows a flowchart of adjusting the read voltage in the embodiment of the present invention. FIG. 8 shows a MAC operation flow according to an embodiment of the present application.

810-860:步驟810-860: Procedure

Claims

A memory device comprising: a memory array including a plurality of memory cells for storing a plurality of weight values in the memory cells of the memory array; a multiplication circuit, coupled to the memory array, the multiplication circuit multiplies a plurality of input data and the weight values to obtain a plurality of multiplication results, wherein when the multiplication is performed, the memory cells generate a plurality of memories body current; a digital accumulating circuit, coupled to the multiplying circuit, and performing a digital accumulating on the multiplication results; an analog accumulation circuit, coupled to the memory array, performs analog accumulation on the memory cell currents to generate a first multiply-accumulate (MAC) operation result; and a determining unit, coupled to the digital accumulating circuit and the analog accumulating circuit, determines to perform the analog accumulating, the digital accumulating or a hybrid accumulating, Wherein, when the hybrid accumulation is performed, whether to trigger the digital accumulation circuit is determined according to the operation result of the first multiply-accumulate operation.

The memory device of claim 1, wherein, When performing the analog accumulation, the determination unit enables the analog accumulation circuit but not the digital accumulation circuit; When performing the digital accumulation, the determination unit enables the digital accumulation circuit but disables the analog accumulation circuit; and When performing the hybrid accumulation, the determination unit enables the digital accumulation circuit and the analog accumulation circuit.

The memory device of claim 1, further comprising: a comparator, coupled to the analog accumulating circuit and the digital accumulating circuit, compares the first MAC operation result with a trigger reference value to output a trigger signal to the digital accumulating circuit to trigger the digital accumulating circuit Do this digital accumulation, Wherein, the analog accumulation circuit includes an analog-digital conversion unit coupled to the memory array, and the memory cell currents of the memory cells are accumulated and input to the analog-digital conversion unit for conversion into the first analog-digital conversion unit MAC operation result.

The memory device of claim 1, wherein the digital accumulating circuit comprises: A counting unit, coupled to the multiplication circuit, performs bit counting on the multiplication results to obtain a second MAC operation result.

The memory device of claim 4, further comprising a grouping circuit coupled to the multiplying circuit and the counting unit, the grouping circuit performs a grouping operation on the multiplication results of the multiplying circuit to obtain a plurality of grouping results , and input the grouping results into the counting unit.

A memory device as claimed in claim 1, the plurality of bits of each of the input data or each of the weight values is divided into a plurality of bit vectors; Convert the bits of these bit vectors from a binary form to a unary code representation; repeating the bits of the bit vectors represented by the unary code a plurality of times to form a product product; and The multiplication circuit multiplies the input data in the spread product form and the weight values in the spread product form to obtain the multiplication results.

The memory device of claim 5, wherein, When performing the grouping operation on the multiplication results, the grouping circuit respectively performs the grouping operation on the plurality of multiplication results of the vectors to obtain the grouping results; When performing bit counting, different accumulation weights are given to the grouping results and then accumulated to obtain the second multiplication accumulation result; and The grouping circuit is a majority circuit, including a plurality of majority units.

A method of operating a memory device, comprising: storing a plurality of weight values in a plurality of memory cells of a memory array of the memory device; performing a bitwise multiplication on a plurality of input data and the weight values to obtain a plurality of multiplication results, wherein the memory cells generate a plurality of memory cell currents during the multiplication; and Decide to perform an analog accumulation, a digital accumulation, or a hybrid accumulation, in, When performing the analog accumulation, the analog accumulation is performed on the memory cell currents to generate a first multiply-accumulate (MAC) operation result; When performing the digital accumulation, performing the digital accumulation on the multiplication results to generate a second multiply-accumulate operation result; and When the hybrid accumulation is performed, whether to trigger the digital accumulation is determined according to the operation result of the first multiply-accumulate operation.

The operation method of a memory device according to claim 8, wherein the memory cell currents of the memory cells are accumulated and then converted into the first MAC operation result by analog-digital conversion; and The first MAC operation result is compared with a trigger reference value to determine whether to trigger the digital accumulation.

The operating method of a memory device as claimed in claim 8, wherein, dividing each of the input data or each of the weight values into a plurality of bit vectors; Convert the bits of these bit vectors from a binary form to a unary code representation; repeating the bits of the bit vectors represented by the unary code a plurality of times to form a product product; and performing a multiplication operation on the input data in the extended product form and the weight values in the extended product form to obtain the multiplication operation results; When performing bit accumulation, different accumulation weights are given to the grouping results and then accumulation is performed to obtain the second multiplication accumulation operation result; and Performing a grouping operation on the multiplication results performs a majority operation on the multiplication results.