TWI798798B

TWI798798B - In-memory computing method and in-memory computing apparatus

Info

Publication number: TWI798798B
Application number: TW110131424A
Authority: TW
Inventors: 林柏榕; 李永駿; 胡瀚文; 王淮慕
Original assignee: 旺宏電子股份有限公司
Priority date: 2020-09-08
Filing date: 2021-08-25
Publication date: 2023-04-11
Also published as: TW202211216A

Abstract

An in-memory computing method and an in-memory computing apparatus, adapted for a processor to perform MAC operations on a memory, are provided. In the method, a preprocessing is respectively performed on an input data and a weight data to be written to input lines and memory cells of the memory to divide the input data and the weight data into a primary portion and a secondary portion. Next, the input data and the weight data divided into the primary portion and the secondary portion are written into the input lines and the memory cells in batches to perform the MAC operations and obtain multiple calculation results. Then, the calculation results are filtered according to a numeral value of each calculation result. Finally, a post-processing is performed on the filtered calculation results according to the portions to which the calculation results correspond, and accordingly obtain output data.

Description

In-memory computing method and device

本揭露是有關於一種運算方法及裝置，且特別是有關於一種記憶體內運算方法及裝置。The present disclosure relates to a computing method and device, and more particularly to an in-memory computing method and device.

傳統的運算系統在執行資料密集應用（data-intensive application）時，需要進行大量運算，且需要在處理器與記憶體之間頻繁地移動資料。其中，執行大量運算將導致系統效能降低，而大量的資料移動則會造成高功率消耗。When a traditional computing system executes a data-intensive application, it needs to perform a large number of calculations and frequently move data between the processor and the memory. Among them, performing a large number of calculations will result in reduced system performance, and a large amount of data movement will result in high power consumption.

為了解決上述效能限制及功率消耗的問題，近年來有人提出了新的演算法及/或記憶體架構，其中包括最鄰近搜索（Nearest Neighbor Search）、決策樹學習（Decision tree learning）、分散式系統、記憶體內計算（In-memory computing）等。然而，決策樹學習仍需要大量的資料搬移、分散式系統存在成本過高及裝置間溝通的問題，記憶體內計算則無法支援複雜運算。In order to solve the above problems of performance limitation and power consumption, new algorithms and/or memory architectures have been proposed in recent years, including Nearest Neighbor Search, Decision tree learning, distributed system , In-memory computing, etc. However, decision tree learning still requires a large amount of data transfer, distributed systems have problems of high cost and communication between devices, and in-memory computing cannot support complex operations.

鑒於上述內容，本揭露提供一種能夠增進運算系統效能的記憶體內運算方法及記憶體內運算裝置。In view of the above, the present disclosure provides an in-memory computing method and an in-memory computing device capable of improving the performance of a computing system.

本揭露提供一種記憶體內運算方法，適於處理器對記憶體進行乘加（MAC）運算。所述記憶體包括彼此交叉的多個輸入線及多個輸出線，分別配置於輸入線及輸出線的相交點處的多個記憶單元，以及分別連接至輸出線的多個感測放大器。所述方法包括下列步驟：對待寫入輸入線及記憶單元的輸入資料及權重資料分別進行前處理，以區分為主要部分及次要部分；將被區分為主要部分及次要部分的輸入資料及權重資料，分批寫入輸入線及記憶單元以進行乘加運算，獲得多筆運算結果；根據各運算結果的數值大小濾除運算結果；以及依據運算結果所對應的部分，對經濾除的運算結果進行後處理，以獲得輸出資料。The disclosure provides an in-memory operation method, which is suitable for a processor to perform a multiply-add (MAC) operation on a memory. The memory includes a plurality of input lines and a plurality of output lines intersecting with each other, a plurality of memory units respectively arranged at intersection points of the input lines and output lines, and a plurality of sense amplifiers respectively connected to the output lines. The method includes the following steps: pre-processing the input data and weight data to be written into the input line and the memory unit respectively, so as to distinguish them into main parts and secondary parts; The weight data is written into the input line and the memory unit in batches for multiplication and addition operations to obtain multiple operation results; the operation results are filtered out according to the numerical value of each operation result; The calculation results are post-processed to obtain output data.

在本揭露的一實施例中，所述根據各運算結果的數值大小濾除運算結果的步驟包括濾除數值大小不大於預設門檻值的運算結果，以及排序經濾除的運算結果，並選擇排序在前的至少一個運算結果以進行後處理。In an embodiment of the present disclosure, the step of filtering the calculation results according to the numerical value of each calculation result includes filtering the calculation results whose numerical value is not greater than the preset threshold value, sorting the filtered calculation results, and selecting Sort the results of at least one previous operation for postprocessing.

在本揭露的一實施例中，所述方法更包括在對輸入資料及權重資料進行前處理時，對輸入資料及權重資料進行編碼，以及在對經濾除的運算結果進行後處理時，對運算結果執行對應於編碼的加權運算。In an embodiment of the present disclosure, the method further includes encoding the input data and weight data when pre-processing the input data and weight data, and encoding As a result of the operation, a weighting operation corresponding to encoding is performed.

在本揭露的一實施例中，所述對運算結果執行對應於編碼的加權運算的步驟包括反應於運算結果對應於輸入資料的主要部分及權重資料的主要部分，對運算結果乘上第一權重以獲得第一乘積，反應於運算結果對應於輸入資料的主要部分及權重資料的次要部分，對運算結果乘上第二權重以獲得第二乘積，反應於運算結果對應於輸入資料的次要部分及權重資料的主要部分，對運算結果乘上第三權重以獲得第三乘積，反應於運算結果對應於輸入資料的次要部分及權重資料的次要部分，對運算結果乘上第四權重以獲得第四乘積，以及累加對運算結果進行加權運算所得的第一乘積、第二乘積、第三乘積及第四乘積，並將累加結果作為輸出資料輸出。In an embodiment of the present disclosure, the step of performing a weighting operation corresponding to the encoding on the operation result includes multiplying the operation result by a first weight in response to the operation result corresponding to the main part of the input data and the main part of the weight data To obtain a first product, in response to the result of the operation corresponding to the major part of the input data and the minor part of the weighted data, multiply the result of the operation by a second weight to obtain a second product, in response to the result of the operation corresponding to the minor part of the input data The main part of the part and weight data, the operation result is multiplied by the third weight to obtain the third product, and the result of the operation is multiplied by the fourth weight in response to the operation result corresponding to the sub part of the input data and the sub part of the weight data Obtaining the fourth product, accumulating the first product, the second product, the third product and the fourth product obtained by weighting the operation results, and outputting the accumulated result as output data.

本揭露提供一種記憶體內運算裝置，其包括記憶體及處理器。所述記憶體包括彼此交叉的多個輸入線及多個輸出線、分別配置於輸入線及輸出線的相交點處的多個單元，以及分別連接至輸出線的多個感測放大器。所述處理器耦接至記憶體且經配置以對待寫入輸入線及記憶單元的輸入資料及權重資料分別進行前處理，以區分為主要部分及次要部分，將被區分為主要部分及次要部分的輸入資料及權重資料，分批寫入輸入線及記憶單元以進行乘加運算，並累加所述感測放大器的感測值以獲得多筆運算結果，根據各運算結果的數值大小濾除運算結果，以及依據運算結果所對應的部分，對經濾除的運算結果進行後處理，以獲得輸出資料。The disclosure provides an in-memory computing device, which includes a memory and a processor. The memory includes a plurality of input lines and a plurality of output lines intersecting each other, a plurality of units respectively arranged at intersection points of the input lines and output lines, and a plurality of sense amplifiers respectively connected to the output lines. The processor is coupled to the memory and is configured to pre-process the input data and weight data to be written into the input line and the memory unit, respectively, to be divided into primary and secondary Part of the input data and weight data are written into the input line and memory unit in batches to perform multiplication and addition operations, and the sensed values of the sense amplifier are accumulated to obtain multiple operation results, and are filtered according to the numerical value of each operation result The operation result is removed, and the filtered operation result is post-processed according to the corresponding part of the operation result to obtain output data.

在本揭露的一實施例中，所述主要部分為所處理資料的多位元的最高有效位元（Most significant bit，MSB），且所述次要部分為所處理資料的多位元的最低有效位元（Least significant bit，LSB）。In an embodiment of the present disclosure, the main part is the most significant bit (Most significant bit, MSB) of the multi-bits of the processed data, and the secondary part is the least significant bit of the multi-bits of the processed data Effective bit (Least significant bit, LSB).

在本揭露的一實施例中，所述記憶體內運算裝置更包括過濾器，用以濾除數值大小不大於預設門檻值的運算結果，其中處理器包括排序經濾除的運算結果，並選擇排序在前的至少一個運算結果以進行後處理。In an embodiment of the present disclosure, the in-memory computing device further includes a filter for filtering out computing results whose values are not greater than a preset threshold, wherein the processor includes sorting the filtered computing results and selecting Sort the results of at least one previous operation for postprocessing.

在本揭露的一實施例中，所述處理器包括在對輸入資料及權重資料進行前處理時，對輸入資料及權重資料進行編碼，以及在對經濾除的運算結果進行後處理時，對運算結果執行對應於編碼的加權運算。In an embodiment of the present disclosure, the processor includes encoding the input data and the weight data when performing pre-processing on the input data and weight data, and encoding the input data and weight data when performing post-processing on the filtered calculation results As a result of the operation, a weighting operation corresponding to encoding is performed.

在本揭露的一實施例中，所述處理器包括反應於運算結果對應於輸入資料的主要部分及權重資料的主要部分，對運算結果乘上第一權重以獲得第一乘積，反應於運算結果對應於輸入資料的主要部分及權重資料的次要部分，對運算結果乘上第二權重以獲得第二乘積，反應於運算結果對應於輸入資料的次要部分及權重資料的主要部分，對運算結果乘上第三權重以獲得第三乘積，反應於運算結果對應於輸入資料的次要部分及權重資料的次要部分，對運算結果乘上第四權重以獲得第四乘積，以及累加對運算結果進行加權運算所得的第一乘積、第二乘積、第三乘積及第四乘積，並將累加結果作為輸出資料輸出。In an embodiment of the present disclosure, the processor includes a main part corresponding to the input data and a main part of the weight data in response to the operation result, and the operation result is multiplied by the first weight to obtain the first product, which is reflected in the operation result Corresponding to the main part of the input data and the secondary part of the weight data, the operation result is multiplied by the second weight to obtain the second product, in response to the fact that the operation result corresponds to the secondary part of the input data and the main part of the weight data, for the operation The result is multiplied by a third weight to obtain a third product, in response to the result of the operation corresponding to the subsection of the input data and the subsection of the weighted data, the result of the operation is multiplied by a fourth weight to obtain a fourth product, and the cumulative pair operation The first product, the second product, the third product and the fourth product obtained by performing weighting operations on the results, and outputting the accumulated results as output data.

為使本揭露的前述特徵及優勢更可理解，下文詳細描述隨附圖式的實施例。In order to make the aforementioned features and advantages of the present disclosure more comprehensible, the following describes the embodiments of the accompanying drawings in detail.

圖1為根據本揭露的實施例的記憶體內運算裝置的示意圖。請參考圖1，本實施例的記憶體內運算裝置10為例如憶阻器，所述憶阻器經配置以實現記憶體內處理（process-in-memory；PIM），適用於人臉搜尋等資料密集應用。運算裝置10包括記憶體12及處理器14，其功能分述如下：FIG. 1 is a schematic diagram of an in-memory computing device according to an embodiment of the disclosure. Please refer to FIG. 1 , the in-memory computing device 10 of the present embodiment is, for example, a memristor configured to implement process-in-memory (PIM), which is suitable for data-intensive applications such as face search. application. The computing device 10 includes a memory 12 and a processor 14, and its functions are described as follows:

記憶體12為例如反及閘（NAND）快閃記憶體、反或閘（NOR）快閃記憶體、相變記憶體（phase change memory；PCM）、自旋轉移矩隨機存取記憶體（spin-transfer torque random-access memory；STT-RAM），或2D或3D結構的電阻式隨機存取記憶體（ReRAM），此在本文中不受限制。在一些實施例中，可整合各種揮發性記憶體，諸如靜態隨機存取記憶體（static random access memory；SRAM）、動態隨機存取記憶體（dynamic random access memory；DRAM），以及各種非揮發性記憶體，諸如ReRAM、PCM、快閃、磁阻性RAM、鐵電RAM，以進行記憶體內運算，此在本文中不受限制。Memory 12 is, for example, NAND flash memory, NOR flash memory, phase change memory (phase change memory; PCM), spin transfer torque random access memory (spin -transfer torque random-access memory; STT-RAM), or 2D or 3D structured resistive random-access memory (ReRAM), which is not limited herein. In some embodiments, various volatile memories such as static random access memory (static random access memory; SRAM), dynamic random access memory (dynamic random access memory; DRAM), and various non-volatile memories can be integrated. Memory, such as ReRAM, PCM, Flash, magnetoresistive RAM, ferroelectric RAM, for in-memory operations, without limitation herein.

記憶體12包括彼此交叉的多個輸入線IL_i 及多個輸出線OL_j 、分別配置於輸入線IL_i 與輸出線OL_j 的相交點處的多個記憶單元（由電阻R_ij 表示），以及分別連接至輸出線OL_j 以用於感測自輸出線OL_j 輸出的電流I_j 的多個感測放大器SA。在一些實施例中，輸入線IL_i 為字元線而輸出線OL_j 為位元線，且在一些實施例中，輸入線IL_i 為位元線而輸出線OL_j 為字元線，此在本文中不受限制。The memory 12 includes a plurality of input lines IL _i and a plurality of output lines OL _j intersecting each other, and a plurality of memory cells (represented by resistors R _ij ) respectively arranged at intersections of the input lines IL _i and output lines OL _j , And a plurality of sense amplifiers SA respectively connected to the output line OL _j for sensing the current I _j output from the output line OL _j . In some embodiments, input line IL _i is a word line and output line OL _j is a bit line, and in some embodiments, input line IL _i is a bit line and output line OL _j is a word line, where Not limited herein.

處理器14為例如中央處理單元（central processing unit；CPU）或其他可程式化的通用或專用微處理器、微控制器（microcontroller；MCU）、可程式化控制器、特殊應用積體電路（application specific integrated circuit；ASIC）、可程式化邏輯裝置（programmable logic device；PLD）或其他類似裝置或此等裝置的組合，本實施例不對其限制。在本實施例中，處理器14經配置以執行用於進行記憶體內運算的指令。所述的記憶體內運算可實施至各種人工智慧（artificial intelligent；AI）應用，諸如全連接層、卷積層（convolution layer）、多層感知、支援向量機或使用憶阻器實施的其他應用，此在本文中不受限制。The processor 14 is, for example, a central processing unit (central processing unit; CPU) or other programmable general-purpose or special-purpose microprocessors, microcontrollers (microcontroller; MCU), programmable controllers, application-specific integrated circuits (application specific integrated circuit (ASIC), programmable logic device (programmable logic device; PLD) or other similar devices or a combination of these devices, which is not limited in this embodiment. In this embodiment, the processor 14 is configured to execute instructions for performing in-memory operations. The described in-memory operations can be implemented into various artificial intelligence (AI) applications, such as fully connected layers, convolution layers, multi-layer perception, support vector machines, or other applications implemented using memristors, here Not limited herein.

圖2為根據本揭露的實施例的記憶體內運算方法的流程圖。請參照圖1及圖2，本實施例的方法適合於上述記憶體內運算裝置10，且將在下文參考記憶體內運算裝置10的各種裝置及組件描述本實施例的記憶體內運算方法的詳細步驟。FIG. 2 is a flowchart of an in-memory computing method according to an embodiment of the disclosure. Please refer to FIG. 1 and FIG. 2 , the method of this embodiment is suitable for the above-mentioned in-memory computing device 10 , and the detailed steps of the in-memory computing method of this embodiment will be described below with reference to various devices and components of the in-memory computing device 10 .

首先，在步驟S202中，處理器14對待寫入輸入線及記憶單元的輸入資料及權重資料分別進行前處理，以區分為主要部分及次要部分。在一實施例中，處理器14是將輸入資料區分為多位元的最高有效位元（Most significant bit，MSB）及多位元的最低有效位元（Least significant bit，LSB），且將權重資料區分為多位元的MSB及多位元的LSB。在輸入資料為8位元的情況下，處理器14例如是將輸入資料區分為4位元的MSB及4位元的LSB，且將權重資料區分為4位元的MSB及4位元的LSB。在其他情況下，處理器14可根據實施需求，將輸入資料及權重資料區分為相同數目或不同數目的一或多位元的MSB以及一或多位元的LSB，本實施例在此不設限。在其他實施例中，處理器14也可以將輸入資料中不重要的一或多個位元（即，次要部分）遮蔽或濾除，而僅保留其中的較重要的位元（即，主要部分）進行後續運算，本實施例亦不設限。First, in step S202, the processor 14 performs pre-processing on the input data and weight data to be written into the input line and the memory unit respectively, so as to distinguish them into main parts and secondary parts. In one embodiment, the processor 14 distinguishes the input data into a multi-bit most significant bit (Most significant bit, MSB) and a multi-bit least significant bit (Least significant bit, LSB), and the weight The data area is divided into multi-bit MSB and multi-bit LSB. In the case that the input data is 8 bits, the processor 14, for example, divides the input data into 4-bit MSB and 4-bit LSB, and divides the weight data into 4-bit MSB and 4-bit LSB . In other cases, the processor 14 can distinguish the input data and the weight data into the same number or different numbers of one or more bits of MSB and one or more bits of LSB according to implementation requirements, and this embodiment does not set limit. In other embodiments, the processor 14 may also mask or filter out one or more unimportant bits (ie, secondary parts) in the input data, and only keep the more important bits (ie, the main part) for subsequent operations, and this embodiment is not limited.

在其他實施例中，處理器14可對輸入資料及權重資料進行進一步的編碼，例如將輸入資料或權重資料的多位元的MSB及多位元的LSB從二進位格式（binary format）轉換為一元編碼（unary code）的數值格式（value format）。處理器14可再將轉換後的一元編碼進行複製以展開（unfold）為內積（dot product）格式。In other embodiments, the processor 14 may further encode the input data and weight data, for example, convert the multi-bit MSB and multi-bit LSB of the input data or weight data from a binary format to The value format of the unary code. The processor 14 can then copy the converted unary code to unfold (unfold) into a dot product format.

舉例來說，圖3是根據本揭露的實施例的資料編碼的範例。請參照圖3，本實施例假設有N維待寫入的輸入資料和權重資料，其中N為正整數，每筆資料具有以二進位表示的8個位元B0~B7。以N維輸入資料＜1＞~＜N＞為例，本實施例將每筆輸入資料＜1＞~＜N＞區分為MSB向量和LSB向量，其中MSB向量包括4位元MSB B7~B4，LSB向量包括4位元LSB B3~B0。接著，將MSB向量和LSB向量的每個位元依數值轉換為一元編碼，例如將位元B7轉換為位元B7₀ ~B7₇ 、將位元B6轉換為B6₀ ~B6₃ 、將位元B5轉換為B5₀ ~B5₁ ，且將位元B4維持不變。然後，將轉換後的一元編碼進行複製以展開為內積格式，例如將每筆輸入資料的MSB向量的轉換後的（2⁴ -1）個一元編碼複製（2⁴ -1）次以展開為225個位元，而生成圖3所示的展開內積格式（unfolding dot product，unFDP）的資料。類似地，權重資料亦可以上述輸入資料的編碼方式進行前處理，在此不再贅述。For example, FIG. 3 is an example of data encoding according to an embodiment of the disclosure. Please refer to FIG. 3 . In this embodiment, it is assumed that there are N-dimensional input data and weight data to be written, wherein N is a positive integer, and each data has 8 bits B0~B7 expressed in binary. Taking N-dimensional input data <1>~<N> as an example, in this embodiment, each input data <1>~<N> is divided into MSB vector and LSB vector, wherein the MSB vector includes 4 bits MSB B7~B4, The LSB vector includes 4 bits LSB B3~B0. Next, convert each bit of the MSB vector and LSB vector into a unary code according to the value, for example, convert bit B7 into bit B7 ₀ ~B7 ₇ , convert bit B6 into B6 ₀ ~B6 ₃ , convert bit B5 is converted to B5 ₀ ~B5 ₁ , and bit B4 remains unchanged. Then, copy the converted unary code to expand into the inner product format, for example, copy (2 ⁴ -1) converted unary codes of the MSB vector of each input data (2 ⁴ -1) times to expand into 225 bits to generate data in the unfolding dot product (unFDP) format shown in FIG. 3 . Similarly, the weight data can also be pre-processed in the encoding manner of the above-mentioned input data, which will not be repeated here.

回到圖2的流程，在步驟S204中，處理器14將被區分為主要部分及次要部分的輸入資料及權重資料，分批寫入輸入線及記憶單元以進行乘加運算，獲得多筆運算結果。詳細而言，處理器14例如是將被區分為主要部分的權重資料寫入記憶體12中對應的記憶單元，並且將被區分為主要部分的輸入資料輸入記憶體12中對應的輸入線IL_i ，以由連接各個輸出線OL_j 的感測放大器SA感測自輸出線OL_j 輸出的電流I_j ，從而經由計數器（counter）或累加器（accumulator）累加感測放大器SA的感測值而獲得所述輸入資料與權重資料的乘加運算的運算結果。類似地，處理器14例如是將被區分為主要部分的權重資料寫入記憶體12中對應的記憶單元，並且將被區分為次要部分的輸入資料輸入記憶體12中對應的輸入線IL_i ，以獲得乘加運算的運算結果；將被區分為次要部分的權重資料寫入記憶體12中對應的記憶單元，並且將被區分為主要部分的輸入資料輸入記憶體12中對應的輸入線IL_i ，以獲得乘加運算的運算結果；以及將被區分為次要部分的權重資料寫入記憶體12中對應的記憶單元，並且將被區分為次要部分的輸入資料輸入記憶體12中對應的輸入線IL_i ，以獲得乘加運算的運算結果。Returning to the flow chart of FIG. 2, in step S204, the processor 14 will divide the input data and weight data into the main part and the secondary part, and write them into the input lines and memory units in batches to perform multiplication and addition operations to obtain multiple Operation result. In detail, the processor 14, for example, writes the weight data classified as the main part into the corresponding memory unit in the memory 12, and inputs the input data classified as the main part into the corresponding input line IL _i in the memory 12. , to be obtained by sensing the current I _j output from the output line OL _j by the sense amplifier SA connected to each output line OL _j , thereby accumulating the sensed value of the sense amplifier SA via a counter (counter) or an accumulator (accumulator) The operation result of the multiplication and addition operation of the input data and weight data. Similarly, the processor 14, for example, writes the weight data classified as the main part into the corresponding memory unit in the memory 12, and inputs the input data classified as the secondary part into the corresponding input line IL _i in the memory 12. , to obtain the operation result of the multiplication and addition operation; write the weight data that is divided into the secondary part into the corresponding memory unit in the memory 12, and input the input data that is divided into the main part into the corresponding input line in the memory 12 IL _i , to obtain the operation result of the multiplication and addition operation; The corresponding input line IL _i is used to obtain the operation result of the multiplication and addition operation.

在一些實施例中，記憶體12還可支援反相（Inverse）、邏輯和（AND）、邏輯或（OR）、互斥或（XOR）、互斥反或（XNOR）等運算，而不限於乘加運算。此外，記憶體12亦不限於使用數位電路實現，而可以使用類比電路實現，本實施例不限制其實現方式。In some embodiments, the memory 12 can also support operations such as Inverse, Logical And (AND), Logical Or (OR), Exclusive Or (XOR), Exclusive Negative Or (XNOR), etc., and are not limited to Multiply and add operations. In addition, the memory 12 is not limited to be implemented by using digital circuits, but can be implemented by using analog circuits, and this embodiment does not limit its implementation.

舉例來說，在數位電路中，處理器14可將輸入資料區分為多位元的MSB和多位元的LSB（不限定位元數），並經過不同的編碼（即，前處理）方法進行處理後，再送入記憶體12以進行反相、邏輯和、邏輯或、互斥或、互斥反或乘加運算或上述運算的組合，最後經過相對應的後處理進行過濾後，即可獲得最終的運算結果。在類比電路中，處理器14可將輸入資料中的部分位元進行遮蔽或濾除（即，前處理）後，再送入記憶體12以進行反相、邏輯和、邏輯或、互斥或、互斥反或乘加運算或上述運算的組合，最後經過相對應的後處理進行過濾後，即可獲得最終的運算結果。以上僅為舉例說明，處理器14可對輸入資料實施任意種類的前處理及後處理，以獲得專屬的運算結果。For example, in a digital circuit, the processor 14 can distinguish the input data into multi-bit MSB and multi-bit LSB (the number of bits is not limited), and perform different encoding (that is, pre-processing) methods After processing, it is sent to the memory 12 for inversion, logical and, logical or, mutual exclusive or, mutual exclusive inverse or multiplication and addition operations or a combination of the above operations, and finally after corresponding post-processing and filtering, you can get The final calculation result. In the analog circuit, the processor 14 can mask or filter out some bits in the input data (that is, pre-processing), and then send it to the memory 12 for inversion, logical and, logical or, exclusive or, Mutual exclusion or multiplication and addition operations or a combination of the above operations, and finally after corresponding post-processing and filtering, the final operation result can be obtained. The above is just an example, and the processor 14 can implement any kind of pre-processing and post-processing on the input data to obtain exclusive calculation results.

在步驟S206中，處理器14根據各運算結果的數值大小濾除運算結果。在一實施例中，記憶體內運算裝置10例如包括過濾器（未繪示），而用以濾除數值大小不大於預設門檻值的運算結果。處理器14則會對經濾除的運算結果進行排序，並選擇排序在前的N筆運算結果進行後處理，所述N例如是3、5、10、20或任意的正整數，在此不設限。In step S206, the processor 14 filters out the calculation results according to the numerical value of each calculation result. In one embodiment, the in-memory computing device 10 includes, for example, a filter (not shown) for filtering out computation results whose values are not greater than a preset threshold. The processor 14 will sort the filtered calculation results, and select the top N calculation results for post-processing, and the N is, for example, 3, 5, 10, 20 or any positive integer. Set limits.

在步驟S208中，處理器14依據運算結果所對應的部分，對經濾除的運算結果進行後處理，以獲得輸出資料。在一實施例中，處理器14在對輸入資料及權重資料進行前處理時，例如是對輸入資料及權重資料進行編碼，而在對經濾除的運算結果進行後處理時，則是對運算結果執行對應於該編碼的加權運算。In step S208, the processor 14 performs post-processing on the filtered operation result according to the corresponding portion of the operation result to obtain output data. In one embodiment, when the processor 14 pre-processes the input data and the weight data, for example, encodes the input data and the weight data; As a result, a weighting operation corresponding to the encoding is performed.

詳細而言，反應於運算結果是對應於輸入資料的主要部分及權重資料的主要部分，處理器14會對運算結果乘上第一權重以獲得第一乘積；反應於運算結果對應於輸入資料的主要部分及權重資料的次要部分，處理器14會對運算結果乘上第二權重以獲得第二乘積；反應於運算結果對應於輸入資料的次要部分及權重資料的主要部分，處理器14會對運算結果乘上第三權重以獲得第三乘積；反應於運算結果對應於輸入資料的次要部分及權重資料的次要部分，處理器14會對運算結果乘上第四權重以獲得第四乘積。最終，處理器14會將上述對上運算結果進行加權運算所得的第一乘積、第二乘積、第三乘積及第四乘積進行累加，從而將累加結果作為輸出資料輸出。In detail, the processor 14 multiplies the operation result by the first weight to obtain the first product in response to the operation result corresponding to the main part of the input data and the main part of the weight data; in response to the operation result corresponding to the input data The main part and the secondary part of the weight data, the processor 14 will multiply the operation result by the second weight to obtain the second product; in response to the operation result corresponding to the secondary part of the input data and the main part of the weight data, the processor 14 The operation result is multiplied by a third weight to obtain a third product; in response to the operation result corresponding to the sub-part of the input data and the sub-part of the weight data, the processor 14 multiplies the operation result by a fourth weight to obtain the first Four times. Finally, the processor 14 accumulates the first product, the second product, the third product and the fourth product obtained by weighting the above operation result, so as to output the accumulated result as output data.

舉例來說，圖4是根據本揭露的實施例的資料後處理的範例。請參照圖4，本實施例說明對應於圖3的編碼方式的後處理。其中，反應於運算結果對應於輸入資料的主要部分（即，MSB）及權重資料的主要部分，其對應的權重值為16*16；反應於運算結果對應於輸入資料的主要部分及權重資料的次要部分（即，LSB），其對應的權重值為16*1；反應於運算結果對應於輸入資料的主要部分及權重資料的主要部分，其對應的權重值為1*16；反應於運算結果對應於輸入資料的主要部分及權重資料的主要部分，其對應的權重值為1*1。藉由將分批寫入記憶體12的輸入資料及權重資料所得的運算結果乘上對應的權重值，則可還原出原始輸入資料及權重資料的乘加運算的運算結果。For example, FIG. 4 is an example of data post-processing according to an embodiment of the disclosure. Referring to FIG. 4 , this embodiment illustrates the post-processing corresponding to the encoding method in FIG. 3 . Among them, it reflects that the operation result corresponds to the main part of the input data (ie, MSB) and the main part of the weight data, and the corresponding weight value is 16*16; it reflects that the operation result corresponds to the main part of the input data and the weight data The secondary part (ie, LSB), its corresponding weight value is 16*1; it reflects the main part of the input data and the main part of the weight data corresponding to the operation result, and its corresponding weight value is 1*16; it reflects the operation The result corresponds to the main part of the input data and the main part of the weight data, and the corresponding weight value is 1*1. By multiplying the operation results of the input data and weight data written in batches into the memory 12 by the corresponding weight value, the operation result of the multiplication and addition operation of the original input data and weight data can be restored.

待完成每一筆輸入資料與權重資料的乘加運算並獲得運算結果後，處理器14將回到步驟S204，繼續將下一筆輸入資料及權重資料寫入記憶體12以進行乘加運算，直到完成所有輸入資料及權重資料的運算結果，而完成記憶體內運算。After the multiplication and addition operation of each input data and weight data is completed and the operation result is obtained, the processor 14 will return to step S204, and continue to write the next input data and weight data into the memory 12 to perform the multiplication and addition operation until the completion The operation results of all input data and weight data are completed in memory.

綜上所述，本揭露實施例的記憶體內運算方法及裝置係結合記憶體內運算及階層式過濾架構（scheme），藉由對待寫入記憶體的輸入資料及權重資料進行前處理，以選擇性地刪減對資料數值所佔比重較低的位元（即，LSB）的運算，而優先針對所佔比重較高的位元（即，MSB）進行運算，且藉由對運算結果進行過濾，而選擇數值較高的運算結果進行相應的資料後處理，最終獲得輸出資料。藉此，可在不過度影響運算結果數值的前提下增進運算系統的效能。In summary, the in-memory computing method and device of the embodiments of the present disclosure combine in-memory computing and a hierarchical filtering scheme to selectively The operations on the bits with a lower proportion of the data value (ie, LSB) are effectively deleted, and the operations on the bits with a higher proportion (ie, MSB) are prioritized, and by filtering the operation results, And select the operation result with higher value to carry out the corresponding data post-processing, and finally obtain the output data. In this way, the performance of the computing system can be improved without excessively affecting the value of the computing result.

儘管已藉由上述實施例揭露本揭露，但實施例並不意欲限制本揭露。對於所屬技術領域中具有通常知識者將顯而易見的是，在不脫離本揭露的範圍或精神的情況下，可對本揭露的結構進行各種修改及改變。因此，本揭露的保護範圍落入隨附申請專利範圍中。Although the present disclosure has been disclosed by the above-mentioned embodiments, the embodiments are not intended to limit the present disclosure. It will be apparent to those having ordinary skill in the art that various modifications and changes can be made in the structures of the present disclosure without departing from the scope or spirit of the present disclosure. Therefore, the scope of protection of the present disclosure falls within the patent scope of the appended application.

10:運算裝置 12:記憶體 14:處理器 B7~B0、B7₀ ~B7₇ 、B6₀ ~B6₃ :位元 IL_i :輸入線 OL_j :輸出線 R_ij :電阻 S202~S208:步驟 SA:感測放大器10: computing device 12: memory 14: processor B7~B0, _B70 ~ _B77 , _B60 ~ _B63 : bit IL _i : input line OL _j : output line R _ij : resistance S202~S208: step SA : sense amplifier

圖1為根據本揭露的實施例的記憶體內運算裝置的示意圖。圖2為根據本揭露的實施例的記憶體內運算方法的流程圖。圖3是根據本揭露的實施例的資料編碼的範例。圖4是根據本揭露的實施例的資料後處理的範例。FIG. 1 is a schematic diagram of an in-memory computing device according to an embodiment of the disclosure. FIG. 2 is a flowchart of an in-memory computing method according to an embodiment of the disclosure. FIG. 3 is an example of data encoding according to an embodiment of the disclosure. FIG. 4 is an example of data post-processing according to an embodiment of the disclosure.

S202~S208:步驟S202~S208: steps

Claims

An operation method in a memory is suitable for a processor to use a memory to perform a multiply-add (MAC) operation, wherein the memory includes a plurality of input lines and a plurality of output lines crossing each other, respectively configured on the input lines and the A plurality of memory units at the intersection points of the output lines, and a plurality of sense amplifiers respectively connected to the output lines, the method includes: input data and weights to be written into the input lines and the memory units The data are respectively pre-processed to be divided into main parts and secondary parts; the input data and the weight data that are divided into the main parts and the secondary parts are written into the input lines and the weight data in batches. The memory unit is used to perform the multiplication and addition operation to obtain multiple operation results, including writing the weight data of the main part into the corresponding memory unit and inputting the input data of the main part corresponding to the input line, write the weight data of the main part into the corresponding memory unit and input the input data of the secondary part into the corresponding input line, write the secondary Writing part of the weight data into the corresponding memory unit and inputting the input data of the main part into the corresponding input line, and writing the weight data of the secondary part into the corresponding the memory unit and input the input data of the secondary part into the corresponding input line, so as to obtain the operation result of the multiplication and addition operation of the input data and the weight data; according to each of the operation results filtering out the operation result; and performing post-processing on the filtered out operation result according to the part corresponding to the operation result to obtain output data.

The operation method in memory as described in claim item 1, wherein the main The important part is the most significant bit (Most significant bit, MSB) of the processed data, and the secondary part is the least significant bit (Least significant bit, LSB) of the processed data.

The in-memory operation method according to claim 1, wherein the step of filtering out the operation results according to the numerical value of each of the operation results includes: filtering out the operation results whose numerical values are not greater than a preset threshold; and sorting the filtered operation results, and selecting at least one operation result that is ranked first to perform the post-processing.

The calculation method in memory as described in claim 1, further comprising: when performing the preprocessing on the input data and the weight data, encoding the input data and the weight data; and encoding the input data and the weight data; When the filtered operation result is subjected to the post-processing, a weighting operation corresponding to the encoding is performed on the operation result.

The in-memory calculation method according to claim 4, wherein the step of performing the weighting calculation corresponding to the code on the calculation result comprises: responding to the fact that the calculation result corresponds to the main part of the input data and the main part of the weight data, multiplying the operation result by a first weight to obtain a first product; in response to the fact that the operation result corresponds to the main part of the input data and the weight data the secondary portion, multiplying the operation result by a second weight to obtain a second product; responsive to the operation result corresponding to the secondary portion of the input data and the primary portion of the weighted data , multiplying the operation result by a third weight to obtain a third product; In response to the operation result corresponding to the sub-portion of the input data and the sub-portion of the weight data, multiplying the operation result by a fourth weight to obtain a fourth product; The first product, the second product, the third product, and the fourth product obtained by performing the weighting operation on the operation result, and output the accumulation result as the output data.

An in-memory computing device, comprising: a memory, including: a plurality of input lines and a plurality of output lines intersecting each other; a plurality of memory units respectively arranged at intersections of the input lines and the output lines; and a plurality of a sense amplifier, respectively connected to the output lines; a processor, coupled to the memory and configured to: perform pre-processing on input data and weight data to be written into the input lines and the memory unit, respectively , to be divided into a main part and a secondary part; the input data and the weight data that are divided into the main part and the secondary part are written into the input line and the memory unit in batches to performing the multiply-accumulate operation, and accumulating the sensing values of the sense amplifier to obtain multiple operation results, including writing the weight data of the main part into the corresponding memory unit and writing the main Input part of the input data into the corresponding input line, write the weight data of the main part into the corresponding memory unit and input the input data of the secondary part into the corresponding input line, write the weight data of the secondary part into the corresponding memory unit and input the input data of the main part into the corresponding input line, and write the secondary part Write the weight data of the sub-part into the corresponding memory unit and input the input data of the secondary part into the corresponding input line, so as to obtain the multiplication and addition of the input data and the weight data the operation result of the operation; filtering the operation result according to the numerical value of each operation result; and performing post-processing on the filtered operation result according to the corresponding part of the operation result to obtain output data.

The in-memory computing device as claimed in claim 6, wherein the main part is the most significant bit of the multi-bits of the processed data, and the secondary part is the least significant bit of the multi-bits of the processed data .

The in-memory computing device as described in claim 6 further includes: a filter for filtering out the operation results whose numerical values are not greater than a preset threshold value, wherein the processor includes sorting the filtered out operation results result, and select at least one operation result that is ranked first to perform the post-processing.

The in-memory computing device according to claim 6, wherein the processor includes encoding the input data and the weight data when performing the preprocessing on the input data and the weight data, and When the post-processing is performed on the filtered operation result, a weighting operation corresponding to the encoding is performed on the operation result.

The in-memory computing device according to claim 9, wherein the processor includes: responding to the operation result corresponding to the main part of the input data and the main part of the weight data, for the multiplying the operation result by a first weight to obtain a first product; in response to the fact that the operation result corresponds to the main part of the input data and multiplying the operation result by a second weight on the sub-portion of the weight data to obtain a second product; responsive to the operation result corresponding to the sub-portion of the input data and the weight data The main part of the operation result is multiplied by a third weight to obtain a third product; in response to the operation result corresponding to the sub part of the input data and the sub part of the weight data part, multiplying the operation result by a fourth weight to obtain a fourth product; and accumulating the first product, the second product, the third product and the weighted operation on the operation result the fourth product, and output the accumulation result as the output data.