TW202014895A - Memory with processing in memory architecture and operating method thereof - Google Patents
Memory with processing in memory architecture and operating method thereof Download PDFInfo
- Publication number
- TW202014895A TW202014895A TW108119618A TW108119618A TW202014895A TW 202014895 A TW202014895 A TW 202014895A TW 108119618 A TW108119618 A TW 108119618A TW 108119618 A TW108119618 A TW 108119618A TW 202014895 A TW202014895 A TW 202014895A
- Authority
- TW
- Taiwan
- Prior art keywords
- memory
- artificial intelligence
- core
- areas
- data
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0656—Data buffering arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
Description
本發明是有關於一種電路架構,且特別是有關於一種具有記憶體內運算(Processing In Memory, PIM)架構的記憶體及其操作方法。The present invention relates to a circuit architecture, and particularly to a memory having a processing in memory (PIM) architecture and an operation method thereof.
隨著人工智慧(Artificial Intelligence, AI)運算的演進,人工智慧運算的應用越來越廣泛,例如經由神經網路(Neural network)模型來進行影像(Image)資料分析、語音(Voice)資料分析、自然語言(Natural language)處理等神經網路運算。並且,隨著神經網路的運算複雜度越來越高,目前用於執行人工智慧運算的電腦設備已逐漸無法應付當前的神經網路運算需求,來提供有效且快速的運算性能。With the evolution of Artificial Intelligence (AI) computing, the application of artificial intelligence computing is becoming more and more widely used, such as image data analysis, voice data analysis, and neural data analysis via Neural network models. Neural network operations such as Natural language processing. Moreover, as the computational complexity of neural networks is getting higher and higher, the current computer equipment used to perform artificial intelligence computing has gradually been unable to cope with the current neural network computing needs to provide effective and fast computing performance.
因此,目前已有專屬的處理核心被設計出來,以利用專屬的處理核心來進行神經網路運算。然而,雖然將神經網路運算獨立由專屬的處理核心執行可充分發揮處理核心的運算能力,但是專屬的處理核心的處理速度仍然受限於資料存取速度。由於專屬的處理核心與其他特殊功能處理核心經由相同的通用匯流排(Bus)來讀取記憶體的資料,因此在其他特殊功能處理核心占用通用匯流排的情況下,導致專屬的處理核心無法即時的取得執行人工智慧運算所需的資料。有鑑於此,如何設計一種能快速執行人工智慧運算的處理架構,以下將提出幾個實施例的解決方案。Therefore, currently dedicated processing cores have been designed to utilize dedicated processing cores for neural network operations. However, although the neural network operation is independently executed by the dedicated processing core to fully utilize the computing power of the processing core, the processing speed of the dedicated processing core is still limited by the data access speed. Because the dedicated processing core and other special function processing cores read the data of the memory through the same general bus (Bus), when the other special function processing core occupies the general bus, the dedicated processing core cannot be real-time To obtain the data needed to perform artificial intelligence operations. In view of this, how to design a processing architecture that can quickly perform artificial intelligence operations, the solutions of several embodiments will be proposed below.
本發明提供一種具有記憶體內運算架構的記憶體及其操作方法,可藉由整合在記憶體當中的人工智慧(Artificial Intelligence, AI)核心來直接讀取儲存在記憶體晶片當中的執行神經網路(Neural network)運算所需的資料,以實現快速地神經網路運算的功效。The invention provides a memory with an in-memory computing architecture and an operation method thereof, which can directly read the execution neural network stored in the memory chip by an artificial intelligence (Artificial Intelligence, AI) core integrated in the memory (Neural network) The data required for the operation to achieve the effect of fast neural network operation.
本發明的具有記憶體內運算架構的記憶體包括記憶體陣列、模式暫存器、記憶體介面以及人工智慧核心。記憶體陣列包括多個記憶體區域。模式暫存器用以儲存多個記憶體模式設定。記憶體介面耦接記憶體陣列以及模式暫存器,並且外部耦接至特殊功能處理核心。人工智慧核心耦接記憶體陣列以及模式暫存器。所述多個記憶體區域依據模式暫存器的所述多個記憶體模式設定來分別選擇性地被定址於特殊功能處理核心以及人工智慧核心,以使特殊功能處理核心以及人工智慧核心依據所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域。The memory with an in-memory computing architecture of the present invention includes a memory array, a mode register, a memory interface, and an artificial intelligence core. The memory array includes a plurality of memory areas. The mode register is used to store multiple memory mode settings. The memory interface is coupled to the memory array and the mode register, and is externally coupled to the special function processing core. The artificial intelligence core is coupled to the memory array and the mode register. The plurality of memory areas are selectively addressed to the special function processing core and the artificial intelligence core respectively according to the plurality of memory mode settings of the mode register, so that the special function processing core and the artificial intelligence core are based on The multiple memory mode settings are used to respectively access different memory areas in the memory array.
在本發明的一實施例中,上述的特殊功能處理核心以及人工智慧核心分別經由各自專屬的記憶體匯流排來同時存取記憶體陣列的不同記憶體區域。In an embodiment of the present invention, the above-mentioned special function processing core and artificial intelligence core respectively access different memory areas of the memory array through their own dedicated memory buses.
在本發明的一實施例中,上述的所述多個記憶體區域包括第一記憶體區域以及第二記憶體區域。第一記憶體區域用以供人工智慧核心專屬存取。第二記憶體區域用以供特殊功能處理核心專屬存取。In an embodiment of the invention, the above-mentioned plurality of memory regions include a first memory region and a second memory region. The first memory area is used for exclusive access by the artificial intelligence core. The second memory area is used for exclusive access of the special function processing core.
在本發明的一實施例中,上述的所述多個記憶體區域更包括多個資料緩衝區域。人工智慧引擎以及記憶體介面交替地至所述多個資料緩衝區域存取不同資料。In an embodiment of the invention, the aforementioned plurality of memory areas further include a plurality of data buffer areas. The artificial intelligence engine and the memory interface alternately access the multiple data buffer areas to access different data.
在本發明的一實施例中,上述的當該人工智慧核心執行神經網路運算時,人工智慧核心讀取所述多個資料緩衝區域的其中之一的輸入資料作為輸入參數,並且讀取第一記憶體區域的權重資料。人工智慧核心輸出特徵資料至第一記憶體區域。In an embodiment of the present invention, when the artificial intelligence core performs a neural network operation, the artificial intelligence core reads input data of one of the plurality of data buffer areas as input parameters, and reads the first A weight data of the memory area. The artificial intelligence core outputs characteristic data to the first memory area.
在本發明的一實施例中,上述的當人工智慧核心執行神經網路運算時,人工智慧核心讀取第一記憶體區域的特徵資料作為下一輸入參數,並且讀取第一記憶體區域的另一權重資料。人工智慧核心輸出下一特徵圖資料至所述多個資料緩衝區的其中之一,以覆寫所述多個資料緩衝區的其中之一。In an embodiment of the present invention, when the artificial intelligence core performs a neural network operation, the artificial intelligence core reads the characteristic data of the first memory area as the next input parameter, and reads the first memory area. Another weight information. The artificial intelligence core outputs the next feature map data to one of the multiple data buffers to overwrite one of the multiple data buffers.
在本發明的一實施例中,上述的所述多個資料緩衝區域分別可交替地被定址於特殊功能處理核心以及人工智慧核心,以使對應於人工智慧核心的第一記憶體空間包括第一記憶體區域以及所述多個資料緩衝區域的其中之一,並且對應於特殊功能處理核心的第二記憶體空間包括第二記憶體區域以及所述多個資料緩衝區域的其中之另一。In an embodiment of the present invention, the plurality of data buffer regions described above may be alternately addressed to the special function processing core and the artificial intelligence core, so that the first memory space corresponding to the artificial intelligence core includes the first One of the memory area and the plurality of data buffer areas, and the second memory space corresponding to the special function processing core includes the second memory area and the other one of the plurality of data buffer areas.
在本發明的一實施例中,上述的專屬於人工智慧核心與所述多個記憶體區域之間的匯流排的寬度大於特殊功能處理核心與記憶體介面之間的外部匯流排的寬度。In an embodiment of the present invention, the width of the bus dedicated between the artificial intelligence core and the plurality of memory regions is larger than the width of the external bus between the special function processing core and the memory interface.
在本發明的一實施例中,上述的所述多個記憶體區域分別對應於多個列緩衝區塊,並且所述多個記憶體區域各別包括多個記憶體庫。專屬於該人工智慧核心與所述多個記憶體區域之間的一匯流排的寬度大於或等於所述多個記憶體庫的一整列的資料數。In an embodiment of the present invention, the aforementioned plurality of memory regions respectively correspond to a plurality of column buffer blocks, and the plurality of memory regions each include a plurality of memory banks. The width of a bus dedicated to the artificial intelligence core and the plurality of memory regions is greater than or equal to the number of data in a whole row of the plurality of memory banks.
在本發明的一實施例中,上述的該記憶體為動態隨機存取記憶體晶片。In an embodiment of the invention, the above-mentioned memory is a dynamic random access memory chip.
本發明的具有記憶體內運算架構的記憶體操作方法適於一記憶體包括記憶體陣列、模式暫存器、記憶體介面以及人工智慧核心。所述方法包括以下步驟:依據模式暫存器的所述多個記憶體模式設定來分別選擇性地將記憶體中的多個記憶體區域被定址於特殊功能處理核心以及人工智慧核心;以及藉由特殊功能處理核心以及人工智慧核心依據所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域。The memory operation method with an in-memory computing architecture of the present invention is suitable for a memory including a memory array, a mode register, a memory interface, and an artificial intelligence core. The method includes the following steps: selectively addressing a plurality of memory regions in the memory to the special function processing core and the artificial intelligence core according to the plurality of memory mode settings of the mode register; and The special function processing core and the artificial intelligence core respectively access different memory areas in the memory array according to the plurality of memory mode settings.
在本發明的一實施例中,上述的特殊功能處理核心以及人工智慧核心分別經由各自專屬的記憶體匯流排來同時存取記憶體陣列的不同記憶體區域。In an embodiment of the present invention, the above-mentioned special function processing core and artificial intelligence core respectively access different memory areas of the memory array through their own dedicated memory buses.
在本發明的一實施例中,上述的所述多個記憶體區域包括第一記憶體區域以及第二記憶體區域,第一記憶體區域用以供人工智慧核心專屬存取,並且第二記憶體區域用以供特殊功能處理核心專屬存取。In an embodiment of the present invention, the plurality of memory areas include a first memory area and a second memory area, the first memory area is used for exclusive access by the artificial intelligence core, and the second memory The body area is used for exclusive access to the special function processing core.
在本發明的一實施例中,上述的所述多個記憶體區域更包括多個資料緩衝區域,並且人工智慧引擎以及記憶體介面交替地至所述多個資料緩衝區域存取不同資料。In an embodiment of the present invention, the aforementioned plurality of memory regions further include a plurality of data buffer regions, and the artificial intelligence engine and the memory interface alternately access the plurality of data buffer regions to access different data.
在本發明的一實施例中,上述的當人工智慧核心執行神經網路運算時,其中藉由特殊功能處理核心以及人工智慧核心依據模式暫存器的所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域的步驟包括:藉由人工智慧核心讀取所述多個資料緩衝區域的其中之一的輸入資料作為輸入參數;藉由人工智慧核心讀取該第一記憶體區域的權重資料;以及藉由人工智慧核心輸出特徵資料至第一記憶體區域。In an embodiment of the present invention, when the artificial intelligence core performs a neural network operation, the special function processing core and the artificial intelligence core are stored separately according to the plurality of memory mode settings of the mode register The steps of fetching different memory areas in the memory array include: reading the input data of one of the plurality of data buffer areas as input parameters by the artificial intelligence core; and reading the first memory by the artificial intelligence core Weight data of the body area; and output characteristic data to the first memory area through the artificial intelligence core.
在本發明的一實施例中,上述的當人工智慧核心執行神經網路運算時,其中藉由特殊功能處理核心以及人工智慧核心依據模式暫存器的所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域的步驟更包括:藉由人工智慧核心讀取第一記憶體區域的特徵資料作為下一輸入參數;藉由人工智慧核心讀取第一記憶體區域的另一權重資料;以及藉由人工智慧核心輸出下一特徵圖資料至所述多個資料緩衝區的其中之一,以覆寫所述多個資料緩衝區的其中之一。In an embodiment of the present invention, when the artificial intelligence core performs a neural network operation, the special function processing core and the artificial intelligence core are stored separately according to the plurality of memory mode settings of the mode register The step of fetching different memory areas in the memory array further includes: reading the characteristic data of the first memory area by the artificial intelligence core as the next input parameter; reading the other memory area of the first memory area by the artificial intelligence core A weight data; and output the next feature map data to one of the multiple data buffers by the artificial intelligence core, to overwrite one of the multiple data buffers.
在本發明的一實施例中,上述的所述多個資料緩衝區域分別可交替地被定址於特殊功能處理核心以及人工智慧核心,以使對應於人工智慧核心的第一記憶體空間包括第一記憶體區域以及所述多個資料緩衝區域的其中之一,並且對應於特殊功能處理核心的第二記憶體空間包括第二記憶體區域以及所述多個資料緩衝區域的其中之另一。In an embodiment of the present invention, the plurality of data buffer regions described above may be alternately addressed to the special function processing core and the artificial intelligence core, so that the first memory space corresponding to the artificial intelligence core includes the first One of the memory area and the plurality of data buffer areas, and the second memory space corresponding to the special function processing core includes the second memory area and the other one of the plurality of data buffer areas.
在本發明的一實施例中,上述的專屬於該人工智慧核心與所述多個記憶體區域之間的匯流排的寬度大於特殊功能處理核心與記憶體介面之間的外部匯流排的寬度。In an embodiment of the present invention, the width of the bus dedicated to the artificial intelligence core and the plurality of memory regions is larger than the width of the external bus between the special function processing core and the memory interface.
在本發明的一實施例中,上述的所述多個記憶體區域分別對應於多個列緩衝區塊,並且所述多個記憶體區域各別包括多個記憶體庫。專屬於人工智慧核心與所述多個記憶體區域之間的匯流排的寬度大於或等於所述多個記憶體庫的整列的資料數。In an embodiment of the present invention, the aforementioned plurality of memory regions respectively correspond to a plurality of column buffer blocks, and the plurality of memory regions each include a plurality of memory banks. The width of the bus dedicated to the artificial intelligence core and the plurality of memory regions is greater than or equal to the number of data in the entire row of the plurality of memory banks.
在本發明的一實施例中,上述的該記憶體為動態隨機存取記憶體晶片。In an embodiment of the invention, the above-mentioned memory is a dynamic random access memory chip.
基於上述,本發明的記憶體及其操作方法,可使外部的特殊功能處理核心以及設置在記憶體當中的人工智慧核心可同時存取記憶體陣列中的不同記憶體區域。因此,本發明的記憶體可快速地執行神經網路運算。Based on the above, the memory and operation method of the present invention can enable external special function processing cores and artificial intelligence cores disposed in the memory to simultaneously access different memory areas in the memory array. Therefore, the memory of the present invention can quickly perform neural network operations.
為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more obvious and understandable, the embodiments are specifically described below in conjunction with the accompanying drawings for detailed description as follows.
為了使本發明之內容可以被更容易明瞭,以下特舉實施例做為本發明確實能夠據以實施的範例。另外,凡可能之處,在圖式及實施方式中使用相同標號的元件/構件/步驟,係代表相同或類似部件。In order to make the content of the present invention easier to understand, the following specific embodiments are taken as examples on which the present invention can indeed be implemented. In addition, wherever possible, elements/components/steps with the same reference numerals in the drawings and embodiments represent the same or similar components.
圖1是繪製本發明的一實施例的記憶體的方塊示意圖。參考圖1,記憶體100包括記憶體陣列110、模式暫存器120、人工智慧(Artificial Intelligence, AI)核心130以及記憶體介面140。記憶體陣列110耦接人工智慧核心130以及記憶體介面140。模式暫存器(Mode register)120耦接記憶體陣列110、人工智慧核心130以及記憶體介面140。記憶體陣列110包括多個記憶體區域。所述多個記憶體區域各別用以儲存特定資料(或稱資料集(Dataset))。並且,在一實施例中,記憶體100還可進一步包括多個專屬記憶體控制單元。所述多個專屬記憶體控制單元以一對一地對應於所述多個記憶體區域,來分別執行資料存取動作。在本實施例中,記憶體介面140可外部耦接至特殊功能處理核心。並且,所述多個記憶體區域依據記錄在模式暫存器120當中的多個記憶體模式設定的來分別選擇性地被定址(Addressing)於特殊功能處理核心以及人工智慧核心130,以使特殊功能處理核心以及人工智慧核心130可依據所述多個記憶體模式設定來分別存取記憶體陣列110中的不同記憶體區域。並且,本實施例的記憶體100具有執行人工智慧運算的能力。FIG. 1 is a block diagram illustrating a memory according to an embodiment of the invention. Referring to FIG. 1, the
在本實施例中,記憶體100可為動態隨機存取記憶體(Dynamic Random Access Memory, DRAM)晶片,並且可例如是由控制邏輯、運算邏輯以及快取(Cache)單元等諸如此類的電路元件所建構而成的記憶體內運算(Processing In Memory, PIM)架構。人工智慧核心130可整合在記憶體100的周邊電路區域當中,以直接透過專屬的記憶體控制器以及專屬的匯流排(Bus)來存取記憶體陣列110的多個記憶體庫(Memory bank)。並且,人工智慧核心130可預先設計以具有執行特定的神經網路(Neural network)運算的功能及特性。換言之,本實施例的記憶體100具有執行人工智慧運算的功能,並且人工智慧核心130以及外部的特殊功能處理核心可同時存取記憶體陣列110,以提供高效率的資料存取以及運算效果。In this embodiment, the
在本實施例中,所述特殊功能處理核心可例如是中央處理單元(Central Processing Unit, CPU)核心、影像信號處理器(Image Signal Processor, ISP)核心、數位信號處理器(Digital Signal Processor, DSP)核心、繪圖處理器(Graphics Processing Unit, GPU)核心或其他類似特殊功能處理核心。在本實施例中,特殊功能處理核心經由通用的匯流排(或標準匯流排)耦接至記憶體介面140,以經由記憶體介面140存取記憶體陣列110。對此,人工智慧核心130是經由記憶體內部的專屬匯流排來存取記憶體陣列110,因此不受限於記憶體介面140的寬度或速度,並且人工智慧核心130可依據特定的資料存取模式來快速地存取記憶體陣列130。In this embodiment, the special function processing core may be, for example, a central processing unit (Central Processing Unit, CPU) core, an image signal processor (Image Signal Processor, ISP) core, and a digital signal processor (Digital Signal Processor, DSP) ) Core, Graphics Processing Unit (GPU) core or other similar special function processing core. In this embodiment, the special function processing core is coupled to the
圖2是繪製本發明的一實施例的記憶體與多個特殊功能處理核心的架構示意圖。參考圖2,記憶體200包括記憶體區域211、213、列緩衝區塊212、214、模式暫存器220、人工智慧核心230以及記憶體介面240。在本實施例中,模式暫存器220耦接人工智慧核心230以及記憶體介面240,以分別提供多個記憶體模式設定至人工智慧核心230以及記憶體介面240。人工智慧核心230以及記憶體介面240各自獨立運作,以分別存取記憶體陣列。記憶體陣列包括記憶體區域211、213以及列緩衝區塊212、214。記憶體區域211、213個別包括多個記憶體庫。記憶體區域211、213可為資料緩衝區域。在本實施例中,記憶體介面240外部耦接至另一記憶體介面340。記憶體介面340例如經由匯流排耦接至中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353。FIG. 2 is a schematic diagram illustrating the architecture of a memory and multiple special function processing cores according to an embodiment of the invention. Referring to FIG. 2, the
在本實施例中,當中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353需要存取列緩衝區塊212或列緩衝區塊214時,中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353需經由記憶體介面240、340依順序或依隊列來存取列緩衝區塊212或列緩衝區塊214。然而,無論上述的各種特殊功能處理核心的當前存取記憶體陣列的情況為何,人工智慧核心230可同時存取在記憶體陣列中的不同記憶體區域。在一實施例中,記憶體區域211或記憶體區域213可例如適用於存取執行神經網路運算或其他機器學習運算所需的數位化輸入資料、權重(Weight)資料或特徵圖(Feature map)資料等。In this embodiment, when the central
值得注意的是,上述的各種特殊功能處理核心以及人工智慧核心230是分別經由各自專屬的記憶體匯流排來同時存取記憶體陣列的不同記憶體區域。也就是說,當上述的各種特殊功能處理核心經由列緩衝區塊212存取記憶體區域211當中的資料時,人工智慧核心230可經由列緩衝區塊214存取記憶體區域213當中的資料。並且,當上述的各種特殊功能處理核心經由列緩衝區塊214存取記憶體區域213當中的資料時,人工智慧核心230可經由列緩衝區塊212存取記憶體區域211當中的資料。換言之,上述的各種特殊功能處理核心以及人工智慧核心230可交替地至作為資料緩衝區域的記憶體區域211、213存取不同資料。此外,在一實施例中,人工智慧核心230還可進一步包括多個快取(Cache)或佇列(Queue),並且人工智慧核心230可透過所述多個快取或所述多個佇列以管線式(Pipeline)的方式來快速存取記憶體區域211或記憶體區域213當中的資料。It is worth noting that the above-mentioned various special function processing cores and
圖3是繪製本發明的另一實施例的記憶體與多個特殊功能處理核心的架構示意圖。參考圖3,本實施例的處理器400包括記憶體區域411、413、415、417、列緩衝區塊412、414、416、418、模式暫存器420、人工智慧核心430以及記憶體介面440。在本實施例中,模式暫存器420耦接人工智慧核心430以及記憶體介面440,以分別提供多個記憶體模式設定至人工智慧核心430以及記憶體介面440。記憶體介面440例如經由匯流排耦接至中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353。在本實施例中,人工智慧核心430以及記憶體介面440各自獨立運作,以分別存取記憶體陣列。記憶體陣列包括記憶體區域411、413、415、417以及列緩衝區塊412、414、416、418,並且記憶體區域411、413、415、417各別包括多個記憶體庫。FIG. 3 is a schematic diagram illustrating the architecture of a memory and multiple special function processing cores according to another embodiment of the invention. Referring to FIG. 3, the
在本實施例中,記憶體區域413、415可為資料緩衝區域。記憶體區域411供上述的各種特殊功能處理核心專屬存取,其中所述各種特殊功能處理核心可例如是中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353。記憶體區域417供人工智慧核心430專屬存取。也就是說,當上述的各種特殊功能處理核心與人工智慧核心430分別專屬存取記憶體區域411以及記憶體區域417時,上述的各種特殊功能處理核心與人工智慧核心430之間不會互相影響存取動作。舉例而言,以執行神經網路運算為例,記憶體區域417的多個記憶體庫的一整列可例如儲存權重資料的多個權重值。人工智慧核心430可透過列緩衝區塊418來依序且交錯地讀取專屬於人工智慧核心430的記憶體區域417的所述多個記憶體庫的每一列,以快速地取得執行神經網路運算所需的資料。In this embodiment, the
圖4A以及圖4B是繪製本發明的一實施例的不同記憶體空間當中的不同記憶體區塊的交換定址的示意圖。請參考圖3、圖4A以及圖4B。以下將以對多個影像資料連續執行神經網路運算為例並且搭配圖4A以及圖4B來說明記憶體400的一種存取方式。人工智慧核心430所執行的人工智慧運算可例如是深度學習網路(Deep Neural Networks, DNN)運算、卷積神經網路(Convolutional Neural Networks, CNN)運算或循環神經網路(Recurrent Neural Network, RNN)運算等,本發明並不加以限制。在一實施情境中,記憶體區域417包括子記憶體區域417_1、417_2。子記憶體區域417_1例如用於儲存具有多個權重值的權重資料,並且子記憶體區域417_2例如用於儲存具有多個特徵值的特徵圖資料。在此一實施情境中,記憶體區域413例如被定址於特殊功能處理核心354,並且記憶體區域415例如被定址於人工智慧核心430。特殊功能處理核心354可例如是圖3的中央處理單元核心351、繪圖處理器核心352或數位信號處理器核心353。因此,如圖4A所示,對應於特殊功能處理核心354的記憶體空間450包括記憶體區域411、413,並且對應於人工智慧核心430的記憶體空間460包括記憶體區域415、417。4A and 4B are schematic diagrams illustrating the exchange addressing of different memory blocks in different memory spaces according to an embodiment of the invention. Please refer to FIG. 3, FIG. 4A and FIG. 4B. In the following, an access method of the
在此實施情境中,假設特殊功能處理核心354即圖3的數位信號處理器核心353,因此記憶體區域415可儲存有由數位信號處理器核心353先前儲存的數位化輸入資料,例如影像資料。人工智慧核心430可例如執行神經網路運算,以對儲存在記憶體區域415當中的當前影像資料進行影像辨識。人工智慧核心430可經由專屬匯流排來讀取記憶體區域417的權重資料,並且讀取記憶體區域415的影像資料作為神經網路運算所需的輸入參數,以進行神經網路運算。同時,數位信號處理器核心353可經由記憶體介面340、440對記憶體區域413儲存下一個影像資料。In this implementation scenario, it is assumed that the special
接著,當記憶體區域415的影像資料經由人工智慧核心430辨識完成後,透過設定模式暫存器420,可交換記憶體區域413、415的被定址對象,以交換記憶體區域413、415所處的記憶體空間。因此,記憶體區域413、415經由定址交換後,如圖4B所示,對應於數位信號處理器核心353的記憶體空間450’包括記憶體區域411、415,並且對應於人工智慧核心430的記憶體空間460’包括記憶體區域413、417。此時,人工智慧核心430可接續執行神經網路運算,以對儲存在記憶體區域413當中的新一個影像資料進行影像辨識。人工智慧核心430可經由專屬匯流排來讀取記憶體區域417-1的權重資料,並且讀取記憶體區域413的下一個影像資料作為神經網路運算所需的輸入參數,以進行神經網路運算。同時,數位信號處理器核心353可經由記憶體介面340、440對記憶體區域415進行覆寫,以儲存下下一個影像資料至記憶體區域415。據此,本實施例的記憶體400可提供高效率的資料存取操作,並且記憶體400可實現具有高速執行效果的神經網路運算。Then, after the image data of the
圖5A以及圖5B是繪製本發明的一實施例的同一記憶體空間的不同記憶體區塊的交換存取的示意圖。請參考圖3、圖5A以及圖5B。以下將以對影像資料執行神經網路運算為例並且搭配圖4A以及圖4B來說明記憶體400的另一種存取方式。在上述情境中,在神經網路運算的輸入層階段,對應於人工智慧核心430的記憶體空間550可例如包括記憶體區域415、子記憶體區域417_1、417_2。人工智慧核心430可讀取記憶體區域415,以取得輸入資料,並作為輸入參數。記憶體區域415儲存有由數位信號處理器核心353先前儲存的影像資料。並且,人工智慧核心430讀取子記憶體區域417_1的權重資料。因此,人工智慧核心430依據輸入參數以及權重資料執行神經網路運算,以產生特徵圖資料,並且人工智慧核心430將特徵圖資料儲存至子記憶體區域417_2。5A and 5B are schematic diagrams illustrating the swap access of different memory blocks in the same memory space according to an embodiment of the invention. Please refer to FIG. 3, FIG. 5A and FIG. 5B. In the following, another method of accessing the
接著,在神經網路運算的下一隱藏層階段,對應於人工智慧核心430的記憶體空間550’包括記憶體區域415、子記憶體區域417_1、417_2。人工智慧核心430讀取前次儲存在子記憶體區域417_2的特徵圖資料,以作為當前隱藏層的輸入參數,並且讀取子記憶體區域417_1的權重資料。因此,人工智慧核心430依據輸入參數以及權重資料執行神經網路運算,以產生新的特徵圖資料,並且人工智慧核心430將新的特徵圖資料複寫至記憶體區域415。換言之,被定址於人工智慧核心430的記憶體區域不變,但是人工智慧核心430的讀取及儲存目標位址交換。以此類推,本實施例的人工智慧核心430可利用記憶體區域415以及子記憶體區域417_2來輪替地讀取先前產生的特徵圖資料以及儲存人工智慧核心430在當前進行神經網路運算的過程中所產生的當前特徵圖資料。由於各記憶體區域有其獨立匯流排,因此本實施例的人工智慧核心430可快速地取得輸入資料以及權重資料,並且快速地進執行神經網路運算並儲存輸出資料。Next, in the next hidden layer stage of the neural network operation, the memory space 550' corresponding to the
圖6是繪製本發明的一實施例的記憶體操作方法的流程圖。參考圖6,本實施例的記憶體操作方法可至少適用於圖1的記憶體100,以使記憶體100執行步驟S610、S620。記憶體100的記憶體介面140可外部耦接至特殊功能處理核心。在步驟S610中,依據模式暫存器120的多個記憶體模式設定來分別選擇性地將記憶體陣列110的多個記憶體區域被定址於特殊功能處理核心以及人工智慧核心130的記憶體空間。在步驟S620中,特殊功能處理核心以及人工智慧核心130依據所述多個記憶體模式設定來分別存取記憶體陣列110中的不同記憶體區域。因此,本實施例的記憶體操作方法可使記憶體100可同時供特殊功能處理核心以及人工智慧核心130進行存取,以提供高效率的記憶體運作效果。FIG. 6 is a flowchart illustrating a memory operation method according to an embodiment of the invention. Referring to FIG. 6, the memory operation method of this embodiment may be at least applicable to the
另外,關於本實施例的記憶體100的相關內部元件、實施方式以及技術細節,可參考上述圖1至圖5B實施例的說明而獲致足夠的教示、建議以及實施說明,因此不再贅述。In addition, regarding the internal components, implementations, and technical details of the
綜上所述,本發明的記憶體及其操作方法,可藉由模式暫存器設計有多個特定記憶體模式設定,以使記憶體陣列的多個記憶體區域可依據所述多個特定記憶體模式設定來分別選擇性地被定址於外部的特殊功能處理核心以及人工智慧核心,以使外部的特殊功能處理核心以及人工智慧核心可同時存取記憶體陣列中的不同記憶體區域。因此,設置在記憶體當中的人工智慧核心可快速地執行神經網路運算。In summary, the memory of the present invention and its operation method can be designed with a plurality of specific memory mode settings through a mode register, so that the multiple memory areas of the memory array can be based on the multiple specific The memory mode is set to be selectively addressed to the external special function processing core and the artificial intelligence core, respectively, so that the external special function processing core and the artificial intelligence core can simultaneously access different memory areas in the memory array. Therefore, the artificial intelligence core set in the memory can quickly execute the neural network operation.
雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed as above with examples, it is not intended to limit the present invention. Any person with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the present invention. The scope of protection of the present invention shall be subject to the scope defined in the appended patent application.
100、200、400:記憶體
110:記憶體陣列
120、220、420:模式暫存器
130、230、430:人工智慧核心
140、240、440:記憶體介面
211、213、411、413、415、417:記憶體區域
212、214、412、414、416、418:列緩衝區塊
340:記憶體介面
351:中央處理單元核心
352:繪圖處理器核心
353:數位信號處理器核心
354:特殊功能處理核心
417_1、417_2:子記憶體區域
450、450’、460、460’、550、550’:記憶體空間
S610、S620:步驟100, 200, 400: memory
110:
圖1是繪製本發明的一實施例的記憶體的方塊示意圖。 圖2是繪製本發明的一實施例的記憶體與多個特殊功能處理核心的架構示意圖。 圖3是繪製本發明的另一實施例的記憶體與多個特殊功能處理核心的架構示意圖。 圖4A以及圖4B是繪製本發明的一實施例的不同記憶體空間當中的不同記憶體區塊的交換定址的示意圖。 圖5A以及圖5B是繪製本發明的一實施例的同一記憶體空間的不同記憶體區塊的交換存取的示意圖。 圖6是繪製本發明的一實施例的記憶體操作方法的流程圖。FIG. 1 is a block diagram illustrating a memory according to an embodiment of the invention. FIG. 2 is a schematic diagram illustrating the architecture of a memory and multiple special function processing cores according to an embodiment of the invention. FIG. 3 is a schematic diagram illustrating the architecture of a memory and multiple special function processing cores according to another embodiment of the invention. 4A and 4B are schematic diagrams illustrating the exchange addressing of different memory blocks in different memory spaces according to an embodiment of the invention. 5A and 5B are schematic diagrams illustrating the swap access of different memory blocks in the same memory space according to an embodiment of the invention. FIG. 6 is a flowchart illustrating a memory operation method according to an embodiment of the invention.
100:記憶體 100: memory
110:記憶體陣列 110: memory array
120:模式暫存器 120: Mode register
130:人工智慧核心 130: Artificial Intelligence Core
140:記憶體介面 140: memory interface
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/563,956 US10990524B2 (en) | 2018-10-11 | 2019-09-09 | Memory with processing in memory architecture and operating method thereof |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862744140P | 2018-10-11 | 2018-10-11 | |
US62/744,140 | 2018-10-11 | ||
US201862785234P | 2018-12-27 | 2018-12-27 | |
US62/785,234 | 2018-12-27 |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202014895A true TW202014895A (en) | 2020-04-16 |
TWI749331B TWI749331B (en) | 2021-12-11 |
Family
ID=70231709
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW108119618A TWI749331B (en) | 2018-10-11 | 2019-06-06 | Memory with processing in memory architecture and operating method thereof |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111047029B (en) |
TW (1) | TWI749331B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI821148B (en) * | 2023-04-26 | 2023-11-01 | 旺宏電子股份有限公司 | Electronic device and method for operating the same |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2284947C (en) * | 1999-10-04 | 2005-12-20 | Storagequest Inc. | Apparatus and method for managing data storage |
KR100816053B1 (en) * | 2006-11-21 | 2008-03-21 | 엠텍비젼 주식회사 | Memory device, memory system and dual port memory device with self-copy function |
US8719516B2 (en) * | 2009-10-21 | 2014-05-06 | Micron Technology, Inc. | Memory having internal processors and methods of controlling memory access |
CN105654419A (en) * | 2016-01-25 | 2016-06-08 | 上海华力创通半导体有限公司 | Operation processing system and operation processing method of image |
CN109074845B (en) * | 2016-03-23 | 2023-07-14 | Gsi 科技公司 | In-memory matrix multiplication and use thereof in neural networks |
KR102650828B1 (en) * | 2016-05-20 | 2024-03-26 | 삼성전자주식회사 | Memory device shared by two or more processors and su|ystem incluing the same |
-
2019
- 2019-06-06 TW TW108119618A patent/TWI749331B/en active
- 2019-06-24 CN CN201910547680.1A patent/CN111047029B/en active Active
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI821148B (en) * | 2023-04-26 | 2023-11-01 | 旺宏電子股份有限公司 | Electronic device and method for operating the same |
Also Published As
Publication number | Publication date |
---|---|
CN111047029A (en) | 2020-04-21 |
CN111047029B (en) | 2023-04-18 |
TWI749331B (en) | 2021-12-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10990524B2 (en) | Memory with processing in memory architecture and operating method thereof | |
US11294599B1 (en) | Registers for restricted memory | |
TWI766396B (en) | Data temporary storage apparatus, data temporary storage method and operation method | |
WO2017124642A1 (en) | Device and method for executing forward calculation of artificial neural network | |
JP6335335B2 (en) | Adaptive partition mechanism with arbitrary tile shapes for tile-based rendering GPU architecture | |
US11645533B2 (en) | IR drop prediction with maximum convolutional neural network | |
JP2018120549A (en) | Processor, information processing device, and operation method for processor | |
WO2019118363A1 (en) | On-chip computational network | |
US20200184002A1 (en) | Hardware accelerated convolution | |
TW202134861A (en) | Interleaving memory requests to accelerate memory accesses | |
WO2020073801A1 (en) | Data reading/writing method and system in 3d image processing, storage medium, and terminal | |
TW202127461A (en) | Concurrent testing of a logic device and a memory device within a system package | |
TWI749331B (en) | Memory with processing in memory architecture and operating method thereof | |
TWI714003B (en) | Memory chip capable of performing artificial intelligence operation and method thereof | |
JP6912535B2 (en) | Memory chips capable of performing artificial intelligence operations and their methods | |
CN110837483B (en) | Tensor dimension transformation method and device | |
Zhou et al. | Hygraph: Accelerating graph processing with hybrid memory-centric computing | |
WO2023124304A1 (en) | Chip cache system, data processing method, device, storage medium, and chip | |
CN113741977B (en) | Data operation method, data operation device and data processor | |
US9189448B2 (en) | Routing image data across on-chip networks | |
WO2021243489A1 (en) | Data processing method and apparatus for neural network | |
CN113407258A (en) | Self-adaptive resource allocation layout and wiring method and system of storage and computation integrated architecture | |
JP2021507368A (en) | Multiple pipeline architecture with special number detection | |
CN110826704B (en) | Processing device and system for preventing overfitting of neural network | |
US20230267992A1 (en) | Keeper-free volatile memory system |