TWI798798B - In-memory computing method and in-memory computing apparatus - Google Patents
In-memory computing method and in-memory computing apparatus Download PDFInfo
- Publication number
- TWI798798B TWI798798B TW110131424A TW110131424A TWI798798B TW I798798 B TWI798798 B TW I798798B TW 110131424 A TW110131424 A TW 110131424A TW 110131424 A TW110131424 A TW 110131424A TW I798798 B TWI798798 B TW I798798B
- Authority
- TW
- Taiwan
- Prior art keywords
- data
- input
- operation result
- weight
- memory
- Prior art date
Links
Images
Landscapes
- Memory System (AREA)
- Information Transfer Systems (AREA)
- Dram (AREA)
- Advance Control (AREA)
Abstract
Description
本揭露是有關於一種運算方法及裝置,且特別是有關於一種記憶體內運算方法及裝置。The present disclosure relates to a computing method and device, and more particularly to an in-memory computing method and device.
傳統的運算系統在執行資料密集應用(data-intensive application)時,需要進行大量運算,且需要在處理器與記憶體之間頻繁地移動資料。其中,執行大量運算將導致系統效能降低,而大量的資料移動則會造成高功率消耗。When a traditional computing system executes a data-intensive application, it needs to perform a large number of calculations and frequently move data between the processor and the memory. Among them, performing a large number of calculations will result in reduced system performance, and a large amount of data movement will result in high power consumption.
為了解決上述效能限制及功率消耗的問題,近年來有人提出了新的演算法及/或記憶體架構,其中包括最鄰近搜索(Nearest Neighbor Search)、決策樹學習(Decision tree learning)、分散式系統、記憶體內計算(In-memory computing)等。然而,決策樹學習仍需要大量的資料搬移、分散式系統存在成本過高及裝置間溝通的問題,記憶體內計算則無法支援複雜運算。In order to solve the above problems of performance limitation and power consumption, new algorithms and/or memory architectures have been proposed in recent years, including Nearest Neighbor Search, Decision tree learning, distributed system , In-memory computing, etc. However, decision tree learning still requires a large amount of data transfer, distributed systems have problems of high cost and communication between devices, and in-memory computing cannot support complex operations.
鑒於上述內容,本揭露提供一種能夠增進運算系統效能的記憶體內運算方法及記憶體內運算裝置。In view of the above, the present disclosure provides an in-memory computing method and an in-memory computing device capable of improving the performance of a computing system.
本揭露提供一種記憶體內運算方法,適於處理器對記憶體進行乘加(MAC)運算。所述記憶體包括彼此交叉的多個輸入線及多個輸出線,分別配置於輸入線及輸出線的相交點處的多個記憶單元,以及分別連接至輸出線的多個感測放大器。所述方法包括下列步驟:對待寫入輸入線及記憶單元的輸入資料及權重資料分別進行前處理,以區分為主要部分及次要部分;將被區分為主要部分及次要部分的輸入資料及權重資料,分批寫入輸入線及記憶單元以進行乘加運算,獲得多筆運算結果;根據各運算結果的數值大小濾除運算結果;以及依據運算結果所對應的部分,對經濾除的運算結果進行後處理,以獲得輸出資料。The disclosure provides an in-memory operation method, which is suitable for a processor to perform a multiply-add (MAC) operation on a memory. The memory includes a plurality of input lines and a plurality of output lines intersecting with each other, a plurality of memory units respectively arranged at intersection points of the input lines and output lines, and a plurality of sense amplifiers respectively connected to the output lines. The method includes the following steps: pre-processing the input data and weight data to be written into the input line and the memory unit respectively, so as to distinguish them into main parts and secondary parts; The weight data is written into the input line and the memory unit in batches for multiplication and addition operations to obtain multiple operation results; the operation results are filtered out according to the numerical value of each operation result; The calculation results are post-processed to obtain output data.
在本揭露的一實施例中,所述根據各運算結果的數值大小濾除運算結果的步驟包括濾除數值大小不大於預設門檻值的運算結果,以及排序經濾除的運算結果,並選擇排序在前的至少一個運算結果以進行後處理。In an embodiment of the present disclosure, the step of filtering the calculation results according to the numerical value of each calculation result includes filtering the calculation results whose numerical value is not greater than the preset threshold value, sorting the filtered calculation results, and selecting Sort the results of at least one previous operation for postprocessing.
在本揭露的一實施例中,所述方法更包括在對輸入資料及權重資料進行前處理時,對輸入資料及權重資料進行編碼,以及在對經濾除的運算結果進行後處理時,對運算結果執行對應於編碼的加權運算。In an embodiment of the present disclosure, the method further includes encoding the input data and weight data when pre-processing the input data and weight data, and encoding As a result of the operation, a weighting operation corresponding to encoding is performed.
在本揭露的一實施例中,所述對運算結果執行對應於編碼的加權運算的步驟包括反應於運算結果對應於輸入資料的主要部分及權重資料的主要部分,對運算結果乘上第一權重以獲得第一乘積,反應於運算結果對應於輸入資料的主要部分及權重資料的次要部分,對運算結果乘上第二權重以獲得第二乘積,反應於運算結果對應於輸入資料的次要部分及權重資料的主要部分,對運算結果乘上第三權重以獲得第三乘積,反應於運算結果對應於輸入資料的次要部分及權重資料的次要部分,對運算結果乘上第四權重以獲得第四乘積,以及累加對運算結果進行加權運算所得的第一乘積、第二乘積、第三乘積及第四乘積,並將累加結果作為輸出資料輸出。In an embodiment of the present disclosure, the step of performing a weighting operation corresponding to the encoding on the operation result includes multiplying the operation result by a first weight in response to the operation result corresponding to the main part of the input data and the main part of the weight data To obtain a first product, in response to the result of the operation corresponding to the major part of the input data and the minor part of the weighted data, multiply the result of the operation by a second weight to obtain a second product, in response to the result of the operation corresponding to the minor part of the input data The main part of the part and weight data, the operation result is multiplied by the third weight to obtain the third product, and the result of the operation is multiplied by the fourth weight in response to the operation result corresponding to the sub part of the input data and the sub part of the weight data Obtaining the fourth product, accumulating the first product, the second product, the third product and the fourth product obtained by weighting the operation results, and outputting the accumulated result as output data.
本揭露提供一種記憶體內運算裝置,其包括記憶體及處理器。所述記憶體包括彼此交叉的多個輸入線及多個輸出線、分別配置於輸入線及輸出線的相交點處的多個單元,以及分別連接至輸出線的多個感測放大器。所述處理器耦接至記憶體且經配置以對待寫入輸入線及記憶單元的輸入資料及權重資料分別進行前處理,以區分為主要部分及次要部分,將被區分為主要部分及次要部分的輸入資料及權重資料,分批寫入輸入線及記憶單元以進行乘加運算,並累加所述感測放大器的感測值以獲得多筆運算結果,根據各運算結果的數值大小濾除運算結果,以及依據運算結果所對應的部分,對經濾除的運算結果進行後處理,以獲得輸出資料。The disclosure provides an in-memory computing device, which includes a memory and a processor. The memory includes a plurality of input lines and a plurality of output lines intersecting each other, a plurality of units respectively arranged at intersection points of the input lines and output lines, and a plurality of sense amplifiers respectively connected to the output lines. The processor is coupled to the memory and is configured to pre-process the input data and weight data to be written into the input line and the memory unit, respectively, to be divided into primary and secondary Part of the input data and weight data are written into the input line and memory unit in batches to perform multiplication and addition operations, and the sensed values of the sense amplifier are accumulated to obtain multiple operation results, and are filtered according to the numerical value of each operation result The operation result is removed, and the filtered operation result is post-processed according to the corresponding part of the operation result to obtain output data.
在本揭露的一實施例中,所述主要部分為所處理資料的多位元的最高有效位元(Most significant bit,MSB),且所述次要部分為所處理資料的多位元的最低有效位元(Least significant bit,LSB)。In an embodiment of the present disclosure, the main part is the most significant bit (Most significant bit, MSB) of the multi-bits of the processed data, and the secondary part is the least significant bit of the multi-bits of the processed data Effective bit (Least significant bit, LSB).
在本揭露的一實施例中,所述記憶體內運算裝置更包括過濾器,用以濾除數值大小不大於預設門檻值的運算結果,其中處理器包括排序經濾除的運算結果,並選擇排序在前的至少一個運算結果以進行後處理。In an embodiment of the present disclosure, the in-memory computing device further includes a filter for filtering out computing results whose values are not greater than a preset threshold, wherein the processor includes sorting the filtered computing results and selecting Sort the results of at least one previous operation for postprocessing.
在本揭露的一實施例中,所述處理器包括在對輸入資料及權重資料進行前處理時,對輸入資料及權重資料進行編碼,以及在對經濾除的運算結果進行後處理時,對運算結果執行對應於編碼的加權運算。In an embodiment of the present disclosure, the processor includes encoding the input data and the weight data when performing pre-processing on the input data and weight data, and encoding the input data and weight data when performing post-processing on the filtered calculation results As a result of the operation, a weighting operation corresponding to encoding is performed.
在本揭露的一實施例中,所述處理器包括反應於運算結果對應於輸入資料的主要部分及權重資料的主要部分,對運算結果乘上第一權重以獲得第一乘積,反應於運算結果對應於輸入資料的主要部分及權重資料的次要部分,對運算結果乘上第二權重以獲得第二乘積,反應於運算結果對應於輸入資料的次要部分及權重資料的主要部分,對運算結果乘上第三權重以獲得第三乘積,反應於運算結果對應於輸入資料的次要部分及權重資料的次要部分,對運算結果乘上第四權重以獲得第四乘積,以及累加對運算結果進行加權運算所得的第一乘積、第二乘積、第三乘積及第四乘積,並將累加結果作為輸出資料輸出。In an embodiment of the present disclosure, the processor includes a main part corresponding to the input data and a main part of the weight data in response to the operation result, and the operation result is multiplied by the first weight to obtain the first product, which is reflected in the operation result Corresponding to the main part of the input data and the secondary part of the weight data, the operation result is multiplied by the second weight to obtain the second product, in response to the fact that the operation result corresponds to the secondary part of the input data and the main part of the weight data, for the operation The result is multiplied by a third weight to obtain a third product, in response to the result of the operation corresponding to the subsection of the input data and the subsection of the weighted data, the result of the operation is multiplied by a fourth weight to obtain a fourth product, and the cumulative pair operation The first product, the second product, the third product and the fourth product obtained by performing weighting operations on the results, and outputting the accumulated results as output data.
為使本揭露的前述特徵及優勢更可理解,下文詳細描述隨附圖式的實施例。In order to make the aforementioned features and advantages of the present disclosure more comprehensible, the following describes the embodiments of the accompanying drawings in detail.
圖1為根據本揭露的實施例的記憶體內運算裝置的示意圖。請參考圖1,本實施例的記憶體內運算裝置10為例如憶阻器,所述憶阻器經配置以實現記憶體內處理(process-in-memory;PIM),適用於人臉搜尋等資料密集應用。運算裝置10包括記憶體12及處理器14,其功能分述如下:FIG. 1 is a schematic diagram of an in-memory computing device according to an embodiment of the disclosure. Please refer to FIG. 1 , the in-
記憶體12為例如反及閘(NAND)快閃記憶體、反或閘(NOR)快閃記憶體、相變記憶體(phase change memory;PCM)、自旋轉移矩隨機存取記憶體(spin-transfer torque random-access memory;STT-RAM),或2D或3D結構的電阻式隨機存取記憶體(ReRAM),此在本文中不受限制。在一些實施例中,可整合各種揮發性記憶體,諸如靜態隨機存取記憶體(static random access memory;SRAM)、動態隨機存取記憶體(dynamic random access memory;DRAM),以及各種非揮發性記憶體,諸如ReRAM、PCM、快閃、磁阻性RAM、鐵電RAM,以進行記憶體內運算,此在本文中不受限制。
記憶體12包括彼此交叉的多個輸入線ILi
及多個輸出線OLj
、分別配置於輸入線ILi
與輸出線OLj
的相交點處的多個記憶單元(由電阻Rij
表示),以及分別連接至輸出線OLj
以用於感測自輸出線OLj
輸出的電流Ij
的多個感測放大器SA。在一些實施例中,輸入線ILi
為字元線而輸出線OLj
為位元線,且在一些實施例中,輸入線ILi
為位元線而輸出線OLj
為字元線,此在本文中不受限制。The
處理器14為例如中央處理單元(central processing unit;CPU)或其他可程式化的通用或專用微處理器、微控制器(microcontroller;MCU)、可程式化控制器、特殊應用積體電路(application specific integrated circuit;ASIC)、可程式化邏輯裝置(programmable logic device;PLD)或其他類似裝置或此等裝置的組合,本實施例不對其限制。在本實施例中,處理器14經配置以執行用於進行記憶體內運算的指令。所述的記憶體內運算可實施至各種人工智慧(artificial intelligent;AI)應用,諸如全連接層、卷積層(convolution layer)、多層感知、支援向量機或使用憶阻器實施的其他應用,此在本文中不受限制。The
圖2為根據本揭露的實施例的記憶體內運算方法的流程圖。請參照圖1及圖2,本實施例的方法適合於上述記憶體內運算裝置10,且將在下文參考記憶體內運算裝置10的各種裝置及組件描述本實施例的記憶體內運算方法的詳細步驟。FIG. 2 is a flowchart of an in-memory computing method according to an embodiment of the disclosure. Please refer to FIG. 1 and FIG. 2 , the method of this embodiment is suitable for the above-mentioned in-
首先,在步驟S202中,處理器14對待寫入輸入線及記憶單元的輸入資料及權重資料分別進行前處理,以區分為主要部分及次要部分。在一實施例中,處理器14是將輸入資料區分為多位元的最高有效位元(Most significant bit,MSB)及多位元的最低有效位元(Least significant bit,LSB),且將權重資料區分為多位元的MSB及多位元的LSB。在輸入資料為8位元的情況下,處理器14例如是將輸入資料區分為4位元的MSB及4位元的LSB,且將權重資料區分為4位元的MSB及4位元的LSB。在其他情況下,處理器14可根據實施需求,將輸入資料及權重資料區分為相同數目或不同數目的一或多位元的MSB以及一或多位元的LSB,本實施例在此不設限。在其他實施例中,處理器14也可以將輸入資料中不重要的一或多個位元(即,次要部分)遮蔽或濾除,而僅保留其中的較重要的位元(即,主要部分)進行後續運算,本實施例亦不設限。First, in step S202, the
在其他實施例中,處理器14可對輸入資料及權重資料進行進一步的編碼,例如將輸入資料或權重資料的多位元的MSB及多位元的LSB從二進位格式(binary format)轉換為一元編碼(unary code)的數值格式(value format)。處理器14可再將轉換後的一元編碼進行複製以展開(unfold)為內積(dot product)格式。In other embodiments, the
舉例來說,圖3是根據本揭露的實施例的資料編碼的範例。請參照圖3,本實施例假設有N維待寫入的輸入資料和權重資料,其中N為正整數,每筆資料具有以二進位表示的8個位元B0~B7。以N維輸入資料<1>~<N>為例,本實施例將每筆輸入資料<1>~<N>區分為MSB向量和LSB向量,其中MSB向量包括4位元MSB B7~B4,LSB向量包括4位元LSB B3~B0。接著,將MSB向量和LSB向量的每個位元依數值轉換為一元編碼,例如將位元B7轉換為位元B70 ~B77 、將位元B6轉換為B60 ~B63 、將位元B5轉換為B50 ~B51 ,且將位元B4維持不變。然後,將轉換後的一元編碼進行複製以展開為內積格式,例如將每筆輸入資料的MSB向量的轉換後的(24 -1)個一元編碼複製(24 -1)次以展開為225個位元,而生成圖3所示的展開內積格式(unfolding dot product,unFDP)的資料。類似地,權重資料亦可以上述輸入資料的編碼方式進行前處理,在此不再贅述。For example, FIG. 3 is an example of data encoding according to an embodiment of the disclosure. Please refer to FIG. 3 . In this embodiment, it is assumed that there are N-dimensional input data and weight data to be written, wherein N is a positive integer, and each data has 8 bits B0~B7 expressed in binary. Taking N-dimensional input data <1>~<N> as an example, in this embodiment, each input data <1>~<N> is divided into MSB vector and LSB vector, wherein the MSB vector includes 4 bits MSB B7~B4, The LSB vector includes 4 bits LSB B3~B0. Next, convert each bit of the MSB vector and LSB vector into a unary code according to the value, for example, convert bit B7 into bit B7 0 ~B7 7 , convert bit B6 into B6 0 ~B6 3 , convert bit B5 is converted to B5 0 ~B5 1 , and bit B4 remains unchanged. Then, copy the converted unary code to expand into the inner product format, for example, copy (2 4 -1) converted unary codes of the MSB vector of each input data (2 4 -1) times to expand into 225 bits to generate data in the unfolding dot product (unFDP) format shown in FIG. 3 . Similarly, the weight data can also be pre-processed in the encoding manner of the above-mentioned input data, which will not be repeated here.
回到圖2的流程,在步驟S204中,處理器14將被區分為主要部分及次要部分的輸入資料及權重資料,分批寫入輸入線及記憶單元以進行乘加運算,獲得多筆運算結果。詳細而言,處理器14例如是將被區分為主要部分的權重資料寫入記憶體12中對應的記憶單元,並且將被區分為主要部分的輸入資料輸入記憶體12中對應的輸入線ILi
,以由連接各個輸出線OLj
的感測放大器SA感測自輸出線OLj
輸出的電流Ij
,從而經由計數器(counter)或累加器(accumulator)累加感測放大器SA的感測值而獲得所述輸入資料與權重資料的乘加運算的運算結果。類似地,處理器14例如是將被區分為主要部分的權重資料寫入記憶體12中對應的記憶單元,並且將被區分為次要部分的輸入資料輸入記憶體12中對應的輸入線ILi
,以獲得乘加運算的運算結果;將被區分為次要部分的權重資料寫入記憶體12中對應的記憶單元,並且將被區分為主要部分的輸入資料輸入記憶體12中對應的輸入線ILi
,以獲得乘加運算的運算結果;以及將被區分為次要部分的權重資料寫入記憶體12中對應的記憶單元,並且將被區分為次要部分的輸入資料輸入記憶體12中對應的輸入線ILi
,以獲得乘加運算的運算結果。Returning to the flow chart of FIG. 2, in step S204, the
在一些實施例中,記憶體12還可支援反相(Inverse)、邏輯和(AND)、邏輯或(OR)、互斥或(XOR)、互斥反或(XNOR)等運算,而不限於乘加運算。此外,記憶體12亦不限於使用數位電路實現,而可以使用類比電路實現,本實施例不限制其實現方式。In some embodiments, the
舉例來說,在數位電路中,處理器14可將輸入資料區分為多位元的MSB和多位元的LSB(不限定位元數),並經過不同的編碼(即,前處理)方法進行處理後,再送入記憶體12以進行反相、邏輯和、邏輯或、互斥或、互斥反或乘加運算或上述運算的組合,最後經過相對應的後處理進行過濾後,即可獲得最終的運算結果。在類比電路中,處理器14可將輸入資料中的部分位元進行遮蔽或濾除(即,前處理)後,再送入記憶體12以進行反相、邏輯和、邏輯或、互斥或、互斥反或乘加運算或上述運算的組合,最後經過相對應的後處理進行過濾後,即可獲得最終的運算結果。以上僅為舉例說明,處理器14可對輸入資料實施任意種類的前處理及後處理,以獲得專屬的運算結果。For example, in a digital circuit, the
在步驟S206中,處理器14根據各運算結果的數值大小濾除運算結果。在一實施例中,記憶體內運算裝置10例如包括過濾器(未繪示),而用以濾除數值大小不大於預設門檻值的運算結果。處理器14則會對經濾除的運算結果進行排序,並選擇排序在前的N筆運算結果進行後處理,所述N例如是3、5、10、20或任意的正整數,在此不設限。In step S206, the
在步驟S208中,處理器14依據運算結果所對應的部分,對經濾除的運算結果進行後處理,以獲得輸出資料。在一實施例中,處理器14在對輸入資料及權重資料進行前處理時,例如是對輸入資料及權重資料進行編碼,而在對經濾除的運算結果進行後處理時,則是對運算結果執行對應於該編碼的加權運算。In step S208, the
詳細而言,反應於運算結果是對應於輸入資料的主要部分及權重資料的主要部分,處理器14會對運算結果乘上第一權重以獲得第一乘積;反應於運算結果對應於輸入資料的主要部分及權重資料的次要部分,處理器14會對運算結果乘上第二權重以獲得第二乘積;反應於運算結果對應於輸入資料的次要部分及權重資料的主要部分,處理器14會對運算結果乘上第三權重以獲得第三乘積;反應於運算結果對應於輸入資料的次要部分及權重資料的次要部分,處理器14會對運算結果乘上第四權重以獲得第四乘積。最終,處理器14會將上述對上運算結果進行加權運算所得的第一乘積、第二乘積、第三乘積及第四乘積進行累加,從而將累加結果作為輸出資料輸出。In detail, the
舉例來說,圖4是根據本揭露的實施例的資料後處理的範例。請參照圖4,本實施例說明對應於圖3的編碼方式的後處理。其中,反應於運算結果對應於輸入資料的主要部分(即,MSB)及權重資料的主要部分,其對應的權重值為16*16;反應於運算結果對應於輸入資料的主要部分及權重資料的次要部分(即,LSB),其對應的權重值為16*1;反應於運算結果對應於輸入資料的主要部分及權重資料的主要部分,其對應的權重值為1*16;反應於運算結果對應於輸入資料的主要部分及權重資料的主要部分,其對應的權重值為1*1。藉由將分批寫入記憶體12的輸入資料及權重資料所得的運算結果乘上對應的權重值,則可還原出原始輸入資料及權重資料的乘加運算的運算結果。For example, FIG. 4 is an example of data post-processing according to an embodiment of the disclosure. Referring to FIG. 4 , this embodiment illustrates the post-processing corresponding to the encoding method in FIG. 3 . Among them, it reflects that the operation result corresponds to the main part of the input data (ie, MSB) and the main part of the weight data, and the corresponding weight value is 16*16; it reflects that the operation result corresponds to the main part of the input data and the weight data The secondary part (ie, LSB), its corresponding weight value is 16*1; it reflects the main part of the input data and the main part of the weight data corresponding to the operation result, and its corresponding weight value is 1*16; it reflects the operation The result corresponds to the main part of the input data and the main part of the weight data, and the corresponding weight value is 1*1. By multiplying the operation results of the input data and weight data written in batches into the
待完成每一筆輸入資料與權重資料的乘加運算並獲得運算結果後,處理器14將回到步驟S204,繼續將下一筆輸入資料及權重資料寫入記憶體12以進行乘加運算,直到完成所有輸入資料及權重資料的運算結果,而完成記憶體內運算。After the multiplication and addition operation of each input data and weight data is completed and the operation result is obtained, the
綜上所述,本揭露實施例的記憶體內運算方法及裝置係結合記憶體內運算及階層式過濾架構(scheme),藉由對待寫入記憶體的輸入資料及權重資料進行前處理,以選擇性地刪減對資料數值所佔比重較低的位元(即,LSB)的運算,而優先針對所佔比重較高的位元(即,MSB)進行運算,且藉由對運算結果進行過濾,而選擇數值較高的運算結果進行相應的資料後處理,最終獲得輸出資料。藉此,可在不過度影響運算結果數值的前提下增進運算系統的效能。In summary, the in-memory computing method and device of the embodiments of the present disclosure combine in-memory computing and a hierarchical filtering scheme to selectively The operations on the bits with a lower proportion of the data value (ie, LSB) are effectively deleted, and the operations on the bits with a higher proportion (ie, MSB) are prioritized, and by filtering the operation results, And select the operation result with higher value to carry out the corresponding data post-processing, and finally obtain the output data. In this way, the performance of the computing system can be improved without excessively affecting the value of the computing result.
儘管已藉由上述實施例揭露本揭露,但實施例並不意欲限制本揭露。對於所屬技術領域中具有通常知識者將顯而易見的是,在不脫離本揭露的範圍或精神的情況下,可對本揭露的結構進行各種修改及改變。因此,本揭露的保護範圍落入隨附申請專利範圍中。Although the present disclosure has been disclosed by the above-mentioned embodiments, the embodiments are not intended to limit the present disclosure. It will be apparent to those having ordinary skill in the art that various modifications and changes can be made in the structures of the present disclosure without departing from the scope or spirit of the present disclosure. Therefore, the scope of protection of the present disclosure falls within the patent scope of the appended application.
10:運算裝置 12:記憶體 14:處理器 B7~B0、B70 ~B77 、B60 ~B63 :位元 ILi :輸入線 OLj :輸出線 Rij :電阻 S202~S208:步驟 SA:感測放大器10: computing device 12: memory 14: processor B7~B0, B70 ~ B77 , B60 ~ B63 : bit IL i : input line OL j : output line R ij : resistance S202~S208: step SA : sense amplifier
圖1為根據本揭露的實施例的記憶體內運算裝置的示意圖。 圖2為根據本揭露的實施例的記憶體內運算方法的流程圖。 圖3是根據本揭露的實施例的資料編碼的範例。 圖4是根據本揭露的實施例的資料後處理的範例。FIG. 1 is a schematic diagram of an in-memory computing device according to an embodiment of the disclosure. FIG. 2 is a flowchart of an in-memory computing method according to an embodiment of the disclosure. FIG. 3 is an example of data encoding according to an embodiment of the disclosure. FIG. 4 is an example of data post-processing according to an embodiment of the disclosure.
S202~S208:步驟S202~S208: steps
Claims (10)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202063075309P | 2020-09-08 | 2020-09-08 | |
US63/075,309 | 2020-09-08 |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202211216A TW202211216A (en) | 2022-03-16 |
TWI798798B true TWI798798B (en) | 2023-04-11 |
Family
ID=81731795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW110131424A TWI798798B (en) | 2020-09-08 | 2021-08-25 | In-memory computing method and in-memory computing apparatus |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI798798B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI849566B (en) * | 2022-11-07 | 2024-07-21 | 國立陽明交通大學 | Memory array for compute-in-memory and operating method thereof |
TWI849732B (en) * | 2023-02-09 | 2024-07-21 | 旺宏電子股份有限公司 | Hyperdimension computing device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5014235A (en) * | 1987-12-15 | 1991-05-07 | Steven G. Morton | Convolution memory |
TW201714091A (en) * | 2015-10-08 | 2017-04-16 | 上海兆芯集成電路有限公司 | Neural network unit that performs concurrent LSTM cell calculations |
CN108805793A (en) * | 2017-04-28 | 2018-11-13 | 英特尔公司 | Multiply-accumulate " 0 " data gate |
US20190358515A1 (en) * | 2016-05-02 | 2019-11-28 | Bao Tran | Blockchain |
-
2021
- 2021-08-25 TW TW110131424A patent/TWI798798B/en active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5014235A (en) * | 1987-12-15 | 1991-05-07 | Steven G. Morton | Convolution memory |
TW201714091A (en) * | 2015-10-08 | 2017-04-16 | 上海兆芯集成電路有限公司 | Neural network unit that performs concurrent LSTM cell calculations |
US20190358515A1 (en) * | 2016-05-02 | 2019-11-28 | Bao Tran | Blockchain |
CN108805793A (en) * | 2017-04-28 | 2018-11-13 | 英特尔公司 | Multiply-accumulate " 0 " data gate |
Also Published As
Publication number | Publication date |
---|---|
TW202211216A (en) | 2022-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sun et al. | Fully parallel RRAM synaptic array for implementing binary neural network with (+ 1,− 1) weights and (+ 1, 0) neurons | |
TWI798798B (en) | In-memory computing method and in-memory computing apparatus | |
Sun et al. | XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks | |
US11625584B2 (en) | Reconfigurable memory compression techniques for deep neural networks | |
TWI696072B (en) | Data storage apparatus, system and method | |
CN113688984B (en) | Memory binarization neural network calculation circuit based on magnetic random access memory | |
Liu et al. | Bit-transformer: Transforming bit-level sparsity into higher preformance in reram-based accelerator | |
Yan et al. | iCELIA: A full-stack framework for STT-MRAM-based deep learning acceleration | |
Challapalle et al. | Crossbar based processing in memory accelerator architecture for graph convolutional networks | |
CN113965205A (en) | Bit string compression | |
Zhang et al. | A practical highly paralleled ReRAM-based DNN accelerator by reusing weight pattern repetitions | |
Liu et al. | IM3A: Boosting Deep Neural Network Efficiency via I n-M emory A ddressing-A ssisted A cceleration | |
US20220075601A1 (en) | In-memory computing method and in-memory computing apparatus | |
CN113052307B (en) | Memristor accelerator-oriented neural network model compression method and system | |
Zhou et al. | Bring memristive in-memory computing into general-purpose machine learning: A perspective | |
CN107103358A (en) | Processing with Neural Network method and system based on spin transfer torque magnetic memory | |
CN116306855A (en) | Data processing method and device based on memory and calculation integrated system | |
CN116543807A (en) | High-energy-efficiency SRAM (static random Access memory) in-memory computing circuit and method based on approximate computation | |
CN114153421A (en) | Memory device and operation method thereof | |
Zhao et al. | Flipping bits to share crossbars in reram-based dnn accelerator | |
CN114004344A (en) | Neural network circuit | |
Li et al. | A neuromorphic computing system for bitwise neural networks based on ReRAM synaptic array | |
TWI775402B (en) | Data processing circuit and fault-mitigating method | |
US20230237307A1 (en) | Systems and methods for a hardware neural network engine | |
TWI777645B (en) | Memory device and operation method thereof |