TW202014895A - Memory with processing in memory architecture and operating method thereof - Google Patents

Memory with processing in memory architecture and operating method thereof Download PDF

Info

Publication number
TW202014895A
TW202014895A TW108119618A TW108119618A TW202014895A TW 202014895 A TW202014895 A TW 202014895A TW 108119618 A TW108119618 A TW 108119618A TW 108119618 A TW108119618 A TW 108119618A TW 202014895 A TW202014895 A TW 202014895A
Authority
TW
Taiwan
Prior art keywords
memory
artificial intelligence
core
areas
data
Prior art date
Application number
TW108119618A
Other languages
Chinese (zh)
Other versions
TWI749331B (en
Inventor
黃崇仁
葛永年
Original Assignee
力晶積成電子製造股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 力晶積成電子製造股份有限公司 filed Critical 力晶積成電子製造股份有限公司
Priority to US16/563,956 priority Critical patent/US10990524B2/en
Publication of TW202014895A publication Critical patent/TW202014895A/en
Application granted granted Critical
Publication of TWI749331B publication Critical patent/TWI749331B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

A memory with processing in memory architecture and an operating method thereof are provided. The memory includes a memory array, a mode register, an artificial intelligence core, and a memory interface. The memory array includes a plurality of memory regions. The mode register stores a plurality of memory mode settings. The memory interface is coupled to the memory array and the mode register, and is externally coupled to a special function processing core. The artificial intelligence core is coupled to the memory array and the mode register. The plurality of memory regions are selectively addressed to the special function processing core and the artificial intelligence core according to the plurality of memory mode settings of the mode register, so that the special function processing core and the artificial intelligence core respectively access different memory regions in the memory array according to the plurality of memory mode settings.

Description

具有記憶體內運算架構的記憶體及其操作方法Memory body with operation structure in memory and operation method thereof

本發明是有關於一種電路架構,且特別是有關於一種具有記憶體內運算(Processing In Memory, PIM)架構的記憶體及其操作方法。The present invention relates to a circuit architecture, and particularly to a memory having a processing in memory (PIM) architecture and an operation method thereof.

隨著人工智慧(Artificial Intelligence, AI)運算的演進,人工智慧運算的應用越來越廣泛,例如經由神經網路(Neural network)模型來進行影像(Image)資料分析、語音(Voice)資料分析、自然語言(Natural language)處理等神經網路運算。並且,隨著神經網路的運算複雜度越來越高,目前用於執行人工智慧運算的電腦設備已逐漸無法應付當前的神經網路運算需求,來提供有效且快速的運算性能。With the evolution of Artificial Intelligence (AI) computing, the application of artificial intelligence computing is becoming more and more widely used, such as image data analysis, voice data analysis, and neural data analysis via Neural network models. Neural network operations such as Natural language processing. Moreover, as the computational complexity of neural networks is getting higher and higher, the current computer equipment used to perform artificial intelligence computing has gradually been unable to cope with the current neural network computing needs to provide effective and fast computing performance.

因此,目前已有專屬的處理核心被設計出來,以利用專屬的處理核心來進行神經網路運算。然而,雖然將神經網路運算獨立由專屬的處理核心執行可充分發揮處理核心的運算能力,但是專屬的處理核心的處理速度仍然受限於資料存取速度。由於專屬的處理核心與其他特殊功能處理核心經由相同的通用匯流排(Bus)來讀取記憶體的資料,因此在其他特殊功能處理核心占用通用匯流排的情況下,導致專屬的處理核心無法即時的取得執行人工智慧運算所需的資料。有鑑於此,如何設計一種能快速執行人工智慧運算的處理架構,以下將提出幾個實施例的解決方案。Therefore, currently dedicated processing cores have been designed to utilize dedicated processing cores for neural network operations. However, although the neural network operation is independently executed by the dedicated processing core to fully utilize the computing power of the processing core, the processing speed of the dedicated processing core is still limited by the data access speed. Because the dedicated processing core and other special function processing cores read the data of the memory through the same general bus (Bus), when the other special function processing core occupies the general bus, the dedicated processing core cannot be real-time To obtain the data needed to perform artificial intelligence operations. In view of this, how to design a processing architecture that can quickly perform artificial intelligence operations, the solutions of several embodiments will be proposed below.

本發明提供一種具有記憶體內運算架構的記憶體及其操作方法,可藉由整合在記憶體當中的人工智慧(Artificial Intelligence, AI)核心來直接讀取儲存在記憶體晶片當中的執行神經網路(Neural network)運算所需的資料,以實現快速地神經網路運算的功效。The invention provides a memory with an in-memory computing architecture and an operation method thereof, which can directly read the execution neural network stored in the memory chip by an artificial intelligence (Artificial Intelligence, AI) core integrated in the memory (Neural network) The data required for the operation to achieve the effect of fast neural network operation.

本發明的具有記憶體內運算架構的記憶體包括記憶體陣列、模式暫存器、記憶體介面以及人工智慧核心。記憶體陣列包括多個記憶體區域。模式暫存器用以儲存多個記憶體模式設定。記憶體介面耦接記憶體陣列以及模式暫存器,並且外部耦接至特殊功能處理核心。人工智慧核心耦接記憶體陣列以及模式暫存器。所述多個記憶體區域依據模式暫存器的所述多個記憶體模式設定來分別選擇性地被定址於特殊功能處理核心以及人工智慧核心,以使特殊功能處理核心以及人工智慧核心依據所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域。The memory with an in-memory computing architecture of the present invention includes a memory array, a mode register, a memory interface, and an artificial intelligence core. The memory array includes a plurality of memory areas. The mode register is used to store multiple memory mode settings. The memory interface is coupled to the memory array and the mode register, and is externally coupled to the special function processing core. The artificial intelligence core is coupled to the memory array and the mode register. The plurality of memory areas are selectively addressed to the special function processing core and the artificial intelligence core respectively according to the plurality of memory mode settings of the mode register, so that the special function processing core and the artificial intelligence core are based on The multiple memory mode settings are used to respectively access different memory areas in the memory array.

在本發明的一實施例中,上述的特殊功能處理核心以及人工智慧核心分別經由各自專屬的記憶體匯流排來同時存取記憶體陣列的不同記憶體區域。In an embodiment of the present invention, the above-mentioned special function processing core and artificial intelligence core respectively access different memory areas of the memory array through their own dedicated memory buses.

在本發明的一實施例中,上述的所述多個記憶體區域包括第一記憶體區域以及第二記憶體區域。第一記憶體區域用以供人工智慧核心專屬存取。第二記憶體區域用以供特殊功能處理核心專屬存取。In an embodiment of the invention, the above-mentioned plurality of memory regions include a first memory region and a second memory region. The first memory area is used for exclusive access by the artificial intelligence core. The second memory area is used for exclusive access of the special function processing core.

在本發明的一實施例中,上述的所述多個記憶體區域更包括多個資料緩衝區域。人工智慧引擎以及記憶體介面交替地至所述多個資料緩衝區域存取不同資料。In an embodiment of the invention, the aforementioned plurality of memory areas further include a plurality of data buffer areas. The artificial intelligence engine and the memory interface alternately access the multiple data buffer areas to access different data.

在本發明的一實施例中,上述的當該人工智慧核心執行神經網路運算時,人工智慧核心讀取所述多個資料緩衝區域的其中之一的輸入資料作為輸入參數,並且讀取第一記憶體區域的權重資料。人工智慧核心輸出特徵資料至第一記憶體區域。In an embodiment of the present invention, when the artificial intelligence core performs a neural network operation, the artificial intelligence core reads input data of one of the plurality of data buffer areas as input parameters, and reads the first A weight data of the memory area. The artificial intelligence core outputs characteristic data to the first memory area.

在本發明的一實施例中,上述的當人工智慧核心執行神經網路運算時,人工智慧核心讀取第一記憶體區域的特徵資料作為下一輸入參數,並且讀取第一記憶體區域的另一權重資料。人工智慧核心輸出下一特徵圖資料至所述多個資料緩衝區的其中之一,以覆寫所述多個資料緩衝區的其中之一。In an embodiment of the present invention, when the artificial intelligence core performs a neural network operation, the artificial intelligence core reads the characteristic data of the first memory area as the next input parameter, and reads the first memory area. Another weight information. The artificial intelligence core outputs the next feature map data to one of the multiple data buffers to overwrite one of the multiple data buffers.

在本發明的一實施例中,上述的所述多個資料緩衝區域分別可交替地被定址於特殊功能處理核心以及人工智慧核心,以使對應於人工智慧核心的第一記憶體空間包括第一記憶體區域以及所述多個資料緩衝區域的其中之一,並且對應於特殊功能處理核心的第二記憶體空間包括第二記憶體區域以及所述多個資料緩衝區域的其中之另一。In an embodiment of the present invention, the plurality of data buffer regions described above may be alternately addressed to the special function processing core and the artificial intelligence core, so that the first memory space corresponding to the artificial intelligence core includes the first One of the memory area and the plurality of data buffer areas, and the second memory space corresponding to the special function processing core includes the second memory area and the other one of the plurality of data buffer areas.

在本發明的一實施例中,上述的專屬於人工智慧核心與所述多個記憶體區域之間的匯流排的寬度大於特殊功能處理核心與記憶體介面之間的外部匯流排的寬度。In an embodiment of the present invention, the width of the bus dedicated between the artificial intelligence core and the plurality of memory regions is larger than the width of the external bus between the special function processing core and the memory interface.

在本發明的一實施例中,上述的所述多個記憶體區域分別對應於多個列緩衝區塊,並且所述多個記憶體區域各別包括多個記憶體庫。專屬於該人工智慧核心與所述多個記憶體區域之間的一匯流排的寬度大於或等於所述多個記憶體庫的一整列的資料數。In an embodiment of the present invention, the aforementioned plurality of memory regions respectively correspond to a plurality of column buffer blocks, and the plurality of memory regions each include a plurality of memory banks. The width of a bus dedicated to the artificial intelligence core and the plurality of memory regions is greater than or equal to the number of data in a whole row of the plurality of memory banks.

在本發明的一實施例中,上述的該記憶體為動態隨機存取記憶體晶片。In an embodiment of the invention, the above-mentioned memory is a dynamic random access memory chip.

本發明的具有記憶體內運算架構的記憶體操作方法適於一記憶體包括記憶體陣列、模式暫存器、記憶體介面以及人工智慧核心。所述方法包括以下步驟:依據模式暫存器的所述多個記憶體模式設定來分別選擇性地將記憶體中的多個記憶體區域被定址於特殊功能處理核心以及人工智慧核心;以及藉由特殊功能處理核心以及人工智慧核心依據所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域。The memory operation method with an in-memory computing architecture of the present invention is suitable for a memory including a memory array, a mode register, a memory interface, and an artificial intelligence core. The method includes the following steps: selectively addressing a plurality of memory regions in the memory to the special function processing core and the artificial intelligence core according to the plurality of memory mode settings of the mode register; and The special function processing core and the artificial intelligence core respectively access different memory areas in the memory array according to the plurality of memory mode settings.

在本發明的一實施例中,上述的特殊功能處理核心以及人工智慧核心分別經由各自專屬的記憶體匯流排來同時存取記憶體陣列的不同記憶體區域。In an embodiment of the present invention, the above-mentioned special function processing core and artificial intelligence core respectively access different memory areas of the memory array through their own dedicated memory buses.

在本發明的一實施例中,上述的所述多個記憶體區域包括第一記憶體區域以及第二記憶體區域,第一記憶體區域用以供人工智慧核心專屬存取,並且第二記憶體區域用以供特殊功能處理核心專屬存取。In an embodiment of the present invention, the plurality of memory areas include a first memory area and a second memory area, the first memory area is used for exclusive access by the artificial intelligence core, and the second memory The body area is used for exclusive access to the special function processing core.

在本發明的一實施例中,上述的所述多個記憶體區域更包括多個資料緩衝區域,並且人工智慧引擎以及記憶體介面交替地至所述多個資料緩衝區域存取不同資料。In an embodiment of the present invention, the aforementioned plurality of memory regions further include a plurality of data buffer regions, and the artificial intelligence engine and the memory interface alternately access the plurality of data buffer regions to access different data.

在本發明的一實施例中,上述的當人工智慧核心執行神經網路運算時,其中藉由特殊功能處理核心以及人工智慧核心依據模式暫存器的所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域的步驟包括:藉由人工智慧核心讀取所述多個資料緩衝區域的其中之一的輸入資料作為輸入參數;藉由人工智慧核心讀取該第一記憶體區域的權重資料;以及藉由人工智慧核心輸出特徵資料至第一記憶體區域。In an embodiment of the present invention, when the artificial intelligence core performs a neural network operation, the special function processing core and the artificial intelligence core are stored separately according to the plurality of memory mode settings of the mode register The steps of fetching different memory areas in the memory array include: reading the input data of one of the plurality of data buffer areas as input parameters by the artificial intelligence core; and reading the first memory by the artificial intelligence core Weight data of the body area; and output characteristic data to the first memory area through the artificial intelligence core.

在本發明的一實施例中,上述的當人工智慧核心執行神經網路運算時,其中藉由特殊功能處理核心以及人工智慧核心依據模式暫存器的所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域的步驟更包括:藉由人工智慧核心讀取第一記憶體區域的特徵資料作為下一輸入參數;藉由人工智慧核心讀取第一記憶體區域的另一權重資料;以及藉由人工智慧核心輸出下一特徵圖資料至所述多個資料緩衝區的其中之一,以覆寫所述多個資料緩衝區的其中之一。In an embodiment of the present invention, when the artificial intelligence core performs a neural network operation, the special function processing core and the artificial intelligence core are stored separately according to the plurality of memory mode settings of the mode register The step of fetching different memory areas in the memory array further includes: reading the characteristic data of the first memory area by the artificial intelligence core as the next input parameter; reading the other memory area of the first memory area by the artificial intelligence core A weight data; and output the next feature map data to one of the multiple data buffers by the artificial intelligence core, to overwrite one of the multiple data buffers.

在本發明的一實施例中,上述的所述多個資料緩衝區域分別可交替地被定址於特殊功能處理核心以及人工智慧核心,以使對應於人工智慧核心的第一記憶體空間包括第一記憶體區域以及所述多個資料緩衝區域的其中之一,並且對應於特殊功能處理核心的第二記憶體空間包括第二記憶體區域以及所述多個資料緩衝區域的其中之另一。In an embodiment of the present invention, the plurality of data buffer regions described above may be alternately addressed to the special function processing core and the artificial intelligence core, so that the first memory space corresponding to the artificial intelligence core includes the first One of the memory area and the plurality of data buffer areas, and the second memory space corresponding to the special function processing core includes the second memory area and the other one of the plurality of data buffer areas.

在本發明的一實施例中,上述的專屬於該人工智慧核心與所述多個記憶體區域之間的匯流排的寬度大於特殊功能處理核心與記憶體介面之間的外部匯流排的寬度。In an embodiment of the present invention, the width of the bus dedicated to the artificial intelligence core and the plurality of memory regions is larger than the width of the external bus between the special function processing core and the memory interface.

在本發明的一實施例中,上述的所述多個記憶體區域分別對應於多個列緩衝區塊,並且所述多個記憶體區域各別包括多個記憶體庫。專屬於人工智慧核心與所述多個記憶體區域之間的匯流排的寬度大於或等於所述多個記憶體庫的整列的資料數。In an embodiment of the present invention, the aforementioned plurality of memory regions respectively correspond to a plurality of column buffer blocks, and the plurality of memory regions each include a plurality of memory banks. The width of the bus dedicated to the artificial intelligence core and the plurality of memory regions is greater than or equal to the number of data in the entire row of the plurality of memory banks.

在本發明的一實施例中,上述的該記憶體為動態隨機存取記憶體晶片。In an embodiment of the invention, the above-mentioned memory is a dynamic random access memory chip.

基於上述,本發明的記憶體及其操作方法,可使外部的特殊功能處理核心以及設置在記憶體當中的人工智慧核心可同時存取記憶體陣列中的不同記憶體區域。因此,本發明的記憶體可快速地執行神經網路運算。Based on the above, the memory and operation method of the present invention can enable external special function processing cores and artificial intelligence cores disposed in the memory to simultaneously access different memory areas in the memory array. Therefore, the memory of the present invention can quickly perform neural network operations.

為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more obvious and understandable, the embodiments are specifically described below in conjunction with the accompanying drawings for detailed description as follows.

為了使本發明之內容可以被更容易明瞭,以下特舉實施例做為本發明確實能夠據以實施的範例。另外,凡可能之處,在圖式及實施方式中使用相同標號的元件/構件/步驟,係代表相同或類似部件。In order to make the content of the present invention easier to understand, the following specific embodiments are taken as examples on which the present invention can indeed be implemented. In addition, wherever possible, elements/components/steps with the same reference numerals in the drawings and embodiments represent the same or similar components.

圖1是繪製本發明的一實施例的記憶體的方塊示意圖。參考圖1,記憶體100包括記憶體陣列110、模式暫存器120、人工智慧(Artificial Intelligence, AI)核心130以及記憶體介面140。記憶體陣列110耦接人工智慧核心130以及記憶體介面140。模式暫存器(Mode register)120耦接記憶體陣列110、人工智慧核心130以及記憶體介面140。記憶體陣列110包括多個記憶體區域。所述多個記憶體區域各別用以儲存特定資料(或稱資料集(Dataset))。並且,在一實施例中,記憶體100還可進一步包括多個專屬記憶體控制單元。所述多個專屬記憶體控制單元以一對一地對應於所述多個記憶體區域,來分別執行資料存取動作。在本實施例中,記憶體介面140可外部耦接至特殊功能處理核心。並且,所述多個記憶體區域依據記錄在模式暫存器120當中的多個記憶體模式設定的來分別選擇性地被定址(Addressing)於特殊功能處理核心以及人工智慧核心130,以使特殊功能處理核心以及人工智慧核心130可依據所述多個記憶體模式設定來分別存取記憶體陣列110中的不同記憶體區域。並且,本實施例的記憶體100具有執行人工智慧運算的能力。FIG. 1 is a block diagram illustrating a memory according to an embodiment of the invention. Referring to FIG. 1, the memory 100 includes a memory array 110, a mode register 120, an artificial intelligence (AI) core 130, and a memory interface 140. The memory array 110 is coupled to the artificial intelligence core 130 and the memory interface 140. The mode register 120 is coupled to the memory array 110, the artificial intelligence core 130, and the memory interface 140. The memory array 110 includes a plurality of memory areas. The plurality of memory areas are used to store specific data (or data set). Moreover, in an embodiment, the memory 100 may further include a plurality of dedicated memory control units. The plurality of dedicated memory control units correspond to the plurality of memory regions one-to-one to perform data access operations respectively. In this embodiment, the memory interface 140 can be externally coupled to the special function processing core. In addition, the plurality of memory areas are selectively addressed to the special function processing core and the artificial intelligence core 130 according to the settings of the plurality of memory modes recorded in the mode register 120 to make the special The function processing core and the artificial intelligence core 130 can respectively access different memory areas in the memory array 110 according to the plurality of memory mode settings. Moreover, the memory 100 of this embodiment has the ability to perform artificial intelligence operations.

在本實施例中,記憶體100可為動態隨機存取記憶體(Dynamic Random Access Memory, DRAM)晶片,並且可例如是由控制邏輯、運算邏輯以及快取(Cache)單元等諸如此類的電路元件所建構而成的記憶體內運算(Processing In Memory, PIM)架構。人工智慧核心130可整合在記憶體100的周邊電路區域當中,以直接透過專屬的記憶體控制器以及專屬的匯流排(Bus)來存取記憶體陣列110的多個記憶體庫(Memory bank)。並且,人工智慧核心130可預先設計以具有執行特定的神經網路(Neural network)運算的功能及特性。換言之,本實施例的記憶體100具有執行人工智慧運算的功能,並且人工智慧核心130以及外部的特殊功能處理核心可同時存取記憶體陣列110,以提供高效率的資料存取以及運算效果。In this embodiment, the memory 100 may be a dynamic random access memory (Dynamic Random Access Memory, DRAM) chip, and may be, for example, a circuit element such as control logic, arithmetic logic, and cache unit. Constructed in-memory (Processing In Memory, PIM) architecture. The artificial intelligence core 130 can be integrated in the peripheral circuit area of the memory 100 to directly access multiple memory banks of the memory array 110 through a dedicated memory controller and a dedicated bus (Bus) . In addition, the artificial intelligence core 130 may be designed in advance to have a function and characteristic of performing a specific neural network (Neural network) operation. In other words, the memory 100 of this embodiment has the function of performing artificial intelligence operations, and the artificial intelligence core 130 and the external special function processing core can access the memory array 110 at the same time to provide efficient data access and calculation effects.

在本實施例中,所述特殊功能處理核心可例如是中央處理單元(Central Processing Unit, CPU)核心、影像信號處理器(Image Signal Processor, ISP)核心、數位信號處理器(Digital Signal Processor, DSP)核心、繪圖處理器(Graphics Processing Unit, GPU)核心或其他類似特殊功能處理核心。在本實施例中,特殊功能處理核心經由通用的匯流排(或標準匯流排)耦接至記憶體介面140,以經由記憶體介面140存取記憶體陣列110。對此,人工智慧核心130是經由記憶體內部的專屬匯流排來存取記憶體陣列110,因此不受限於記憶體介面140的寬度或速度,並且人工智慧核心130可依據特定的資料存取模式來快速地存取記憶體陣列130。In this embodiment, the special function processing core may be, for example, a central processing unit (Central Processing Unit, CPU) core, an image signal processor (Image Signal Processor, ISP) core, and a digital signal processor (Digital Signal Processor, DSP) ) Core, Graphics Processing Unit (GPU) core or other similar special function processing core. In this embodiment, the special function processing core is coupled to the memory interface 140 via a general-purpose bus (or standard bus) to access the memory array 110 via the memory interface 140. In this regard, the artificial intelligence core 130 accesses the memory array 110 through a dedicated bus inside the memory, so it is not limited to the width or speed of the memory interface 140, and the artificial intelligence core 130 can access based on specific data Mode to quickly access the memory array 130.

圖2是繪製本發明的一實施例的記憶體與多個特殊功能處理核心的架構示意圖。參考圖2,記憶體200包括記憶體區域211、213、列緩衝區塊212、214、模式暫存器220、人工智慧核心230以及記憶體介面240。在本實施例中,模式暫存器220耦接人工智慧核心230以及記憶體介面240,以分別提供多個記憶體模式設定至人工智慧核心230以及記憶體介面240。人工智慧核心230以及記憶體介面240各自獨立運作,以分別存取記憶體陣列。記憶體陣列包括記憶體區域211、213以及列緩衝區塊212、214。記憶體區域211、213個別包括多個記憶體庫。記憶體區域211、213可為資料緩衝區域。在本實施例中,記憶體介面240外部耦接至另一記憶體介面340。記憶體介面340例如經由匯流排耦接至中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353。FIG. 2 is a schematic diagram illustrating the architecture of a memory and multiple special function processing cores according to an embodiment of the invention. Referring to FIG. 2, the memory 200 includes memory areas 211 and 213, column buffer blocks 212 and 214, a pattern register 220, an artificial intelligence core 230, and a memory interface 240. In this embodiment, the mode register 220 is coupled to the artificial intelligence core 230 and the memory interface 240 to provide multiple memory mode settings to the artificial intelligence core 230 and the memory interface 240, respectively. The artificial intelligence core 230 and the memory interface 240 operate independently to respectively access the memory array. The memory array includes memory areas 211 and 213 and column buffer blocks 212 and 214. The memory areas 211 and 213 each include a plurality of memory banks. The memory areas 211 and 213 can be data buffer areas. In this embodiment, the memory interface 240 is externally coupled to another memory interface 340. The memory interface 340 is coupled to the central processing unit core 351, the graphics processor core 352, and the digital signal processor core 353, for example, via a bus.

在本實施例中,當中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353需要存取列緩衝區塊212或列緩衝區塊214時,中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353需經由記憶體介面240、340依順序或依隊列來存取列緩衝區塊212或列緩衝區塊214。然而,無論上述的各種特殊功能處理核心的當前存取記憶體陣列的情況為何,人工智慧核心230可同時存取在記憶體陣列中的不同記憶體區域。在一實施例中,記憶體區域211或記憶體區域213可例如適用於存取執行神經網路運算或其他機器學習運算所需的數位化輸入資料、權重(Weight)資料或特徵圖(Feature map)資料等。In this embodiment, when the central processing unit core 351, the graphics processor core 352, and the digital signal processor core 353 need to access the column buffer block 212 or the column buffer block 214, the central processing unit core 351, the graphics processor The core 352 and the digital signal processor core 353 need to access the column buffer block 212 or the column buffer block 214 through the memory interfaces 240 and 340 in sequence or in queue. However, the artificial intelligence core 230 can access different memory areas in the memory array at the same time regardless of the above-mentioned various special function processing cores currently accessing the memory array. In one embodiment, the memory area 211 or the memory area 213 may be suitable for accessing digital input data, weight data, or feature maps required to perform neural network operations or other machine learning operations, for example. ) Information etc.

值得注意的是,上述的各種特殊功能處理核心以及人工智慧核心230是分別經由各自專屬的記憶體匯流排來同時存取記憶體陣列的不同記憶體區域。也就是說,當上述的各種特殊功能處理核心經由列緩衝區塊212存取記憶體區域211當中的資料時,人工智慧核心230可經由列緩衝區塊214存取記憶體區域213當中的資料。並且,當上述的各種特殊功能處理核心經由列緩衝區塊214存取記憶體區域213當中的資料時,人工智慧核心230可經由列緩衝區塊212存取記憶體區域211當中的資料。換言之,上述的各種特殊功能處理核心以及人工智慧核心230可交替地至作為資料緩衝區域的記憶體區域211、213存取不同資料。此外,在一實施例中,人工智慧核心230還可進一步包括多個快取(Cache)或佇列(Queue),並且人工智慧核心230可透過所述多個快取或所述多個佇列以管線式(Pipeline)的方式來快速存取記憶體區域211或記憶體區域213當中的資料。It is worth noting that the above-mentioned various special function processing cores and artificial intelligence core 230 respectively access different memory areas of the memory array through their own dedicated memory buses. That is to say, when the above-mentioned various special function processing cores access the data in the memory area 211 through the column buffer block 212, the artificial intelligence core 230 can access the data in the memory area 213 through the column buffer block 214. Moreover, when the above-mentioned various special function processing cores access the data in the memory area 213 via the column buffer block 214, the artificial intelligence core 230 can access the data in the memory area 211 via the column buffer block 212. In other words, the aforementioned various special function processing cores and artificial intelligence core 230 can alternately access different data to the memory areas 211 and 213 as data buffer areas. In addition, in an embodiment, the artificial intelligence core 230 may further include a plurality of caches or queues, and the artificial intelligence core 230 may pass the plurality of caches or the plurality of queues Use the pipeline method to quickly access the data in the memory area 211 or the memory area 213.

圖3是繪製本發明的另一實施例的記憶體與多個特殊功能處理核心的架構示意圖。參考圖3,本實施例的處理器400包括記憶體區域411、413、415、417、列緩衝區塊412、414、416、418、模式暫存器420、人工智慧核心430以及記憶體介面440。在本實施例中,模式暫存器420耦接人工智慧核心430以及記憶體介面440,以分別提供多個記憶體模式設定至人工智慧核心430以及記憶體介面440。記憶體介面440例如經由匯流排耦接至中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353。在本實施例中,人工智慧核心430以及記憶體介面440各自獨立運作,以分別存取記憶體陣列。記憶體陣列包括記憶體區域411、413、415、417以及列緩衝區塊412、414、416、418,並且記憶體區域411、413、415、417各別包括多個記憶體庫。FIG. 3 is a schematic diagram illustrating the architecture of a memory and multiple special function processing cores according to another embodiment of the invention. Referring to FIG. 3, the processor 400 of this embodiment includes memory regions 411, 413, 415, 417, column buffer blocks 412, 414, 416, 418, mode register 420, artificial intelligence core 430, and memory interface 440 . In this embodiment, the mode register 420 is coupled to the artificial intelligence core 430 and the memory interface 440 to provide multiple memory mode settings to the artificial intelligence core 430 and the memory interface 440, respectively. The memory interface 440 is coupled to the central processing unit core 351, the graphics processor core 352, and the digital signal processor core 353, for example, via a bus. In this embodiment, the artificial intelligence core 430 and the memory interface 440 operate independently to respectively access the memory array. The memory array includes memory areas 411, 413, 415, 417 and column buffer blocks 412, 414, 416, 418, and the memory areas 411, 413, 415, 417 each include multiple memory banks.

在本實施例中,記憶體區域413、415可為資料緩衝區域。記憶體區域411供上述的各種特殊功能處理核心專屬存取,其中所述各種特殊功能處理核心可例如是中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353。記憶體區域417供人工智慧核心430專屬存取。也就是說,當上述的各種特殊功能處理核心與人工智慧核心430分別專屬存取記憶體區域411以及記憶體區域417時,上述的各種特殊功能處理核心與人工智慧核心430之間不會互相影響存取動作。舉例而言,以執行神經網路運算為例,記憶體區域417的多個記憶體庫的一整列可例如儲存權重資料的多個權重值。人工智慧核心430可透過列緩衝區塊418來依序且交錯地讀取專屬於人工智慧核心430的記憶體區域417的所述多個記憶體庫的每一列,以快速地取得執行神經網路運算所需的資料。In this embodiment, the memory areas 413 and 415 can be data buffer areas. The memory area 411 provides exclusive access to the above-mentioned various special function processing cores, wherein the various special function processing cores may be, for example, a central processing unit core 351, a graphics processor core 352, and a digital signal processor core 353. The memory area 417 is exclusively accessed by the artificial intelligence core 430. That is to say, when the above-mentioned various special function processing cores and the artificial intelligence core 430 exclusively access the memory area 411 and the memory area 417 respectively, the above-mentioned various special function processing cores and the artificial intelligence core 430 will not affect each other Access action. For example, taking the execution of a neural network operation as an example, a whole row of multiple memory banks in the memory area 417 may store multiple weight values of weight data, for example. The artificial intelligence core 430 can sequentially and interleavedly read each row of the plurality of memory banks dedicated to the memory area 417 of the artificial intelligence core 430 through the row buffer block 418 to quickly obtain an execution neural network The information required for the calculation.

圖4A以及圖4B是繪製本發明的一實施例的不同記憶體空間當中的不同記憶體區塊的交換定址的示意圖。請參考圖3、圖4A以及圖4B。以下將以對多個影像資料連續執行神經網路運算為例並且搭配圖4A以及圖4B來說明記憶體400的一種存取方式。人工智慧核心430所執行的人工智慧運算可例如是深度學習網路(Deep Neural Networks, DNN)運算、卷積神經網路(Convolutional Neural Networks, CNN)運算或循環神經網路(Recurrent Neural Network, RNN)運算等,本發明並不加以限制。在一實施情境中,記憶體區域417包括子記憶體區域417_1、417_2。子記憶體區域417_1例如用於儲存具有多個權重值的權重資料,並且子記憶體區域417_2例如用於儲存具有多個特徵值的特徵圖資料。在此一實施情境中,記憶體區域413例如被定址於特殊功能處理核心354,並且記憶體區域415例如被定址於人工智慧核心430。特殊功能處理核心354可例如是圖3的中央處理單元核心351、繪圖處理器核心352或數位信號處理器核心353。因此,如圖4A所示,對應於特殊功能處理核心354的記憶體空間450包括記憶體區域411、413,並且對應於人工智慧核心430的記憶體空間460包括記憶體區域415、417。4A and 4B are schematic diagrams illustrating the exchange addressing of different memory blocks in different memory spaces according to an embodiment of the invention. Please refer to FIG. 3, FIG. 4A and FIG. 4B. In the following, an access method of the memory 400 will be described by taking the continuous execution of neural network operations on a plurality of image data as an example and with reference to FIGS. 4A and 4B. The artificial intelligence operations performed by the artificial intelligence core 430 may be, for example, deep learning network (Deep Neural Networks, DNN) operations, convolutional neural network (Convolutional Neural Networks, CNN) operations, or recurrent neural network (RNN) ) Calculation, etc., the present invention is not limited. In an implementation scenario, the memory area 417 includes sub-memory areas 417_1, 417_2. The sub-memory area 417_1 is used to store weight data having multiple weight values, for example, and the sub-memory area 417_2 is used to store feature map data having multiple feature values, for example. In this implementation scenario, the memory area 413 is, for example, addressed to the special function processing core 354, and the memory area 415 is, for example, addressed to the artificial intelligence core 430. The special function processing core 354 may be, for example, the central processing unit core 351, the graphics processor core 352, or the digital signal processor core 353 of FIG. Therefore, as shown in FIG. 4A, the memory space 450 corresponding to the special function processing core 354 includes memory areas 411 and 413, and the memory space 460 corresponding to the artificial intelligence core 430 includes memory areas 415 and 417.

在此實施情境中,假設特殊功能處理核心354即圖3的數位信號處理器核心353,因此記憶體區域415可儲存有由數位信號處理器核心353先前儲存的數位化輸入資料,例如影像資料。人工智慧核心430可例如執行神經網路運算,以對儲存在記憶體區域415當中的當前影像資料進行影像辨識。人工智慧核心430可經由專屬匯流排來讀取記憶體區域417的權重資料,並且讀取記憶體區域415的影像資料作為神經網路運算所需的輸入參數,以進行神經網路運算。同時,數位信號處理器核心353可經由記憶體介面340、440對記憶體區域413儲存下一個影像資料。In this implementation scenario, it is assumed that the special function processing core 354 is the digital signal processor core 353 of FIG. 3, so the memory area 415 may store digital input data previously stored by the digital signal processor core 353, such as image data. The artificial intelligence core 430 may, for example, perform a neural network operation to perform image recognition on the current image data stored in the memory area 415. The artificial intelligence core 430 can read the weight data of the memory area 417 through a dedicated bus, and read the image data of the memory area 415 as input parameters required for neural network operation to perform neural network operation. At the same time, the digital signal processor core 353 can store the next image data in the memory area 413 via the memory interfaces 340 and 440.

接著,當記憶體區域415的影像資料經由人工智慧核心430辨識完成後,透過設定模式暫存器420,可交換記憶體區域413、415的被定址對象,以交換記憶體區域413、415所處的記憶體空間。因此,記憶體區域413、415經由定址交換後,如圖4B所示,對應於數位信號處理器核心353的記憶體空間450’包括記憶體區域411、415,並且對應於人工智慧核心430的記憶體空間460’包括記憶體區域413、417。此時,人工智慧核心430可接續執行神經網路運算,以對儲存在記憶體區域413當中的新一個影像資料進行影像辨識。人工智慧核心430可經由專屬匯流排來讀取記憶體區域417-1的權重資料,並且讀取記憶體區域413的下一個影像資料作為神經網路運算所需的輸入參數,以進行神經網路運算。同時,數位信號處理器核心353可經由記憶體介面340、440對記憶體區域415進行覆寫,以儲存下下一個影像資料至記憶體區域415。據此,本實施例的記憶體400可提供高效率的資料存取操作,並且記憶體400可實現具有高速執行效果的神經網路運算。Then, after the image data of the memory area 415 is recognized by the artificial intelligence core 430, the addressed objects of the memory areas 413 and 415 can be exchanged through the setting mode register 420 to exchange the memory areas 413 and 415 Memory space. Therefore, after the memory areas 413 and 415 are exchanged by addressing, as shown in FIG. 4B, the memory space 450 ′ corresponding to the digital signal processor core 353 includes the memory areas 411 and 415 and corresponds to the memory of the artificial intelligence core 430 The body space 460' includes memory areas 413, 417. At this time, the artificial intelligence core 430 may continue to perform neural network operations to perform image recognition on the new image data stored in the memory area 413. The artificial intelligence core 430 can read the weight data of the memory area 417-1 through a dedicated bus, and read the next image data of the memory area 413 as input parameters required for neural network operation to perform a neural network Operation. At the same time, the digital signal processor core 353 can overwrite the memory area 415 via the memory interfaces 340, 440 to store the next image data to the memory area 415. According to this, the memory 400 of this embodiment can provide a highly efficient data access operation, and the memory 400 can realize a neural network operation with a high-speed execution effect.

圖5A以及圖5B是繪製本發明的一實施例的同一記憶體空間的不同記憶體區塊的交換存取的示意圖。請參考圖3、圖5A以及圖5B。以下將以對影像資料執行神經網路運算為例並且搭配圖4A以及圖4B來說明記憶體400的另一種存取方式。在上述情境中,在神經網路運算的輸入層階段,對應於人工智慧核心430的記憶體空間550可例如包括記憶體區域415、子記憶體區域417_1、417_2。人工智慧核心430可讀取記憶體區域415,以取得輸入資料,並作為輸入參數。記憶體區域415儲存有由數位信號處理器核心353先前儲存的影像資料。並且,人工智慧核心430讀取子記憶體區域417_1的權重資料。因此,人工智慧核心430依據輸入參數以及權重資料執行神經網路運算,以產生特徵圖資料,並且人工智慧核心430將特徵圖資料儲存至子記憶體區域417_2。5A and 5B are schematic diagrams illustrating the swap access of different memory blocks in the same memory space according to an embodiment of the invention. Please refer to FIG. 3, FIG. 5A and FIG. 5B. In the following, another method of accessing the memory 400 will be described by taking the neural network operation on the image data as an example and using FIGS. 4A and 4B. In the above scenario, at the input layer stage of the neural network operation, the memory space 550 corresponding to the artificial intelligence core 430 may include, for example, a memory area 415 and sub-memory areas 417_1, 417_2. The artificial intelligence core 430 can read the memory area 415 to obtain input data and use it as an input parameter. The memory area 415 stores image data previously stored by the digital signal processor core 353. And, the artificial intelligence core 430 reads the weight data of the sub-memory area 417_1. Therefore, the artificial intelligence core 430 performs a neural network operation according to the input parameters and the weight data to generate feature map data, and the artificial intelligence core 430 stores the feature map data to the sub-memory area 417_2.

接著,在神經網路運算的下一隱藏層階段,對應於人工智慧核心430的記憶體空間550’包括記憶體區域415、子記憶體區域417_1、417_2。人工智慧核心430讀取前次儲存在子記憶體區域417_2的特徵圖資料,以作為當前隱藏層的輸入參數,並且讀取子記憶體區域417_1的權重資料。因此,人工智慧核心430依據輸入參數以及權重資料執行神經網路運算,以產生新的特徵圖資料,並且人工智慧核心430將新的特徵圖資料複寫至記憶體區域415。換言之,被定址於人工智慧核心430的記憶體區域不變,但是人工智慧核心430的讀取及儲存目標位址交換。以此類推,本實施例的人工智慧核心430可利用記憶體區域415以及子記憶體區域417_2來輪替地讀取先前產生的特徵圖資料以及儲存人工智慧核心430在當前進行神經網路運算的過程中所產生的當前特徵圖資料。由於各記憶體區域有其獨立匯流排,因此本實施例的人工智慧核心430可快速地取得輸入資料以及權重資料,並且快速地進執行神經網路運算並儲存輸出資料。Next, in the next hidden layer stage of the neural network operation, the memory space 550' corresponding to the artificial intelligence core 430 includes the memory area 415 and the sub-memory areas 417_1, 417_2. The artificial intelligence core 430 reads the feature map data previously stored in the sub-memory area 417_2 as an input parameter of the current hidden layer, and reads the weight data of the sub-memory area 417_1. Therefore, the artificial intelligence core 430 performs a neural network operation according to the input parameters and the weight data to generate new feature map data, and the artificial intelligence core 430 replicates the new feature map data to the memory area 415. In other words, the memory area addressed to the artificial intelligence core 430 remains unchanged, but the artificial intelligence core 430 reads and stores the target address exchange. By analogy, the artificial intelligence core 430 of this embodiment can use the memory area 415 and the sub-memory area 417_2 to alternately read the previously generated feature map data and store the artificial intelligence core 430 currently performing neural network operations. The current feature map data generated during the process. Since each memory area has its own independent bus, the artificial intelligence core 430 of this embodiment can quickly obtain input data and weight data, and quickly perform neural network operations and store output data.

圖6是繪製本發明的一實施例的記憶體操作方法的流程圖。參考圖6,本實施例的記憶體操作方法可至少適用於圖1的記憶體100,以使記憶體100執行步驟S610、S620。記憶體100的記憶體介面140可外部耦接至特殊功能處理核心。在步驟S610中,依據模式暫存器120的多個記憶體模式設定來分別選擇性地將記憶體陣列110的多個記憶體區域被定址於特殊功能處理核心以及人工智慧核心130的記憶體空間。在步驟S620中,特殊功能處理核心以及人工智慧核心130依據所述多個記憶體模式設定來分別存取記憶體陣列110中的不同記憶體區域。因此,本實施例的記憶體操作方法可使記憶體100可同時供特殊功能處理核心以及人工智慧核心130進行存取,以提供高效率的記憶體運作效果。FIG. 6 is a flowchart illustrating a memory operation method according to an embodiment of the invention. Referring to FIG. 6, the memory operation method of this embodiment may be at least applicable to the memory 100 of FIG. 1, so that the memory 100 executes steps S610 and S620. The memory interface 140 of the memory 100 can be externally coupled to the special function processing core. In step S610, multiple memory regions of the memory array 110 are selectively addressed to the memory spaces of the special function processing core and the artificial intelligence core 130 according to the multiple memory mode settings of the mode register 120, respectively . In step S620, the special function processing core and the artificial intelligence core 130 respectively access different memory areas in the memory array 110 according to the plurality of memory mode settings. Therefore, the memory operation method of this embodiment enables the memory 100 to be simultaneously accessed by the special function processing core and the artificial intelligence core 130 to provide a highly efficient memory operation effect.

另外,關於本實施例的記憶體100的相關內部元件、實施方式以及技術細節,可參考上述圖1至圖5B實施例的說明而獲致足夠的教示、建議以及實施說明,因此不再贅述。In addition, regarding the internal components, implementations, and technical details of the memory 100 of this embodiment, reference may be made to the description of the above-described embodiments of FIGS. 1 to 5B to obtain sufficient teaching, suggestions, and implementation descriptions, and thus no further description is required.

綜上所述,本發明的記憶體及其操作方法,可藉由模式暫存器設計有多個特定記憶體模式設定,以使記憶體陣列的多個記憶體區域可依據所述多個特定記憶體模式設定來分別選擇性地被定址於外部的特殊功能處理核心以及人工智慧核心,以使外部的特殊功能處理核心以及人工智慧核心可同時存取記憶體陣列中的不同記憶體區域。因此,設置在記憶體當中的人工智慧核心可快速地執行神經網路運算。In summary, the memory of the present invention and its operation method can be designed with a plurality of specific memory mode settings through a mode register, so that the multiple memory areas of the memory array can be based on the multiple specific The memory mode is set to be selectively addressed to the external special function processing core and the artificial intelligence core, respectively, so that the external special function processing core and the artificial intelligence core can simultaneously access different memory areas in the memory array. Therefore, the artificial intelligence core set in the memory can quickly execute the neural network operation.

雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed as above with examples, it is not intended to limit the present invention. Any person with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the present invention. The scope of protection of the present invention shall be subject to the scope defined in the appended patent application.

100、200、400:記憶體 110:記憶體陣列 120、220、420:模式暫存器 130、230、430:人工智慧核心 140、240、440:記憶體介面 211、213、411、413、415、417:記憶體區域 212、214、412、414、416、418:列緩衝區塊 340:記憶體介面 351:中央處理單元核心 352:繪圖處理器核心 353:數位信號處理器核心 354:特殊功能處理核心 417_1、417_2:子記憶體區域 450、450’、460、460’、550、550’:記憶體空間 S610、S620:步驟100, 200, 400: memory 110: memory array 120, 220, 420: pattern register 130, 230, 430: artificial intelligence core 140, 240, 440: memory interface 211, 213, 411, 413, 415, 417: memory area 212, 214, 412, 414, 416, 418: column buffer block 340: Memory interface 351: Central processing unit core 352: graphics processor core 353: Digital signal processor core 354: Special function processing core 417_1, 417_2: Sub memory area 450, 450’, 460, 460’, 550, 550’: memory space S610, S620: steps

圖1是繪製本發明的一實施例的記憶體的方塊示意圖。 圖2是繪製本發明的一實施例的記憶體與多個特殊功能處理核心的架構示意圖。 圖3是繪製本發明的另一實施例的記憶體與多個特殊功能處理核心的架構示意圖。 圖4A以及圖4B是繪製本發明的一實施例的不同記憶體空間當中的不同記憶體區塊的交換定址的示意圖。 圖5A以及圖5B是繪製本發明的一實施例的同一記憶體空間的不同記憶體區塊的交換存取的示意圖。 圖6是繪製本發明的一實施例的記憶體操作方法的流程圖。FIG. 1 is a block diagram illustrating a memory according to an embodiment of the invention. FIG. 2 is a schematic diagram illustrating the architecture of a memory and multiple special function processing cores according to an embodiment of the invention. FIG. 3 is a schematic diagram illustrating the architecture of a memory and multiple special function processing cores according to another embodiment of the invention. 4A and 4B are schematic diagrams illustrating the exchange addressing of different memory blocks in different memory spaces according to an embodiment of the invention. 5A and 5B are schematic diagrams illustrating the swap access of different memory blocks in the same memory space according to an embodiment of the invention. FIG. 6 is a flowchart illustrating a memory operation method according to an embodiment of the invention.

100:記憶體 100: memory

110:記憶體陣列 110: memory array

120:模式暫存器 120: Mode register

130:人工智慧核心 130: Artificial Intelligence Core

140:記憶體介面 140: memory interface

Claims (20)

一種具有記憶體內運算架構的記憶體,包括: 一記憶體陣列,包括多個記憶體區域; 一模式暫存器,用以儲存多個記憶體模式設定; 一記憶體介面,耦接該記憶體陣列以及該模式暫存器,並且外部耦接至一特殊功能處理核心;以及 一人工智慧核心,耦接該記憶體陣列以及該模式暫存器, 其中該些記憶體區域依據該模式暫存器的該些記憶體模式設定來分別選擇性地被定址於該特殊功能處理核心以及該人工智慧核心,以使該特殊功能處理核心以及該人工智慧核心依據該些記憶體模式設定來分別存取該記憶體陣列中的不同記憶體區域。A memory with an in-memory computing architecture includes: A memory array, including multiple memory areas; A mode register for storing multiple memory mode settings; A memory interface, coupled to the memory array and the mode register, and externally coupled to a special function processing core; and An artificial intelligence core, coupled to the memory array and the mode register, The memory areas are selectively addressed to the special function processing core and the artificial intelligence core respectively according to the memory mode settings of the mode register, so that the special function processing core and the artificial intelligence core According to the memory mode settings, different memory areas in the memory array are respectively accessed. 如申請專利範圍第1項所述的記憶體,其中該特殊功能處理核心以及該人工智慧核心分別經由各自專屬的記憶體匯流排來同時存取該記憶體陣列的不同記憶體區域。The memory according to item 1 of the patent application scope, wherein the special function processing core and the artificial intelligence core respectively access different memory areas of the memory array through their own dedicated memory buses. 如申請專利範圍第1項所述的記憶體,其中該些記憶體區域包括一第一記憶體區域以及一第二記憶體區域,該第一記憶體區域用以供該人工智慧核心專屬存取,並且該第二記憶體區域用以供該特殊功能處理核心專屬存取。The memory according to item 1 of the patent application scope, wherein the memory areas include a first memory area and a second memory area, the first memory area is used for exclusive access by the artificial intelligence core And the second memory area is used for exclusive access by the special function processing core. 如申請專利範圍第3項所述的記憶體,其中該些記憶體區域更包括多個資料緩衝區域,並且該人工智慧引擎以及該記憶體介面交替地至該些資料緩衝區域存取不同資料。The memory according to item 3 of the patent application scope, wherein the memory areas further include a plurality of data buffer areas, and the artificial intelligence engine and the memory interface alternately access the data buffer areas to access different data. 如申請專利範圍第4項所述的記憶體,其中當該人工智慧核心執行一神經網路運算時,該人工智慧核心讀取該些資料緩衝區域的其中之一的一輸入資料作為一輸入參數,並且讀取該第一記憶體區域的一權重資料,其中該人工智慧核心輸出一特徵資料至該第一記憶體區域。The memory as described in item 4 of the patent application scope, wherein when the artificial intelligence core performs a neural network operation, the artificial intelligence core reads an input data of one of the data buffer areas as an input parameter And read a weight data of the first memory area, wherein the artificial intelligence core outputs a characteristic data to the first memory area. 如申請專利範圍第5項所述的記憶體,其中當該人工智慧核心執行該神經網路運算時,該人工智慧核心讀取該第一記憶體區域的該特徵資料作為下一輸入參數,並且讀取該第一記憶體區域的另一權重資料,其中該人工智慧核心輸出下一特徵圖資料至該些資料緩衝區的其中之一,以覆寫該些資料緩衝區的其中之一。The memory according to item 5 of the patent application scope, wherein when the artificial intelligence core performs the neural network operation, the artificial intelligence core reads the characteristic data of the first memory area as the next input parameter, and Reading another weight data of the first memory area, wherein the artificial intelligence core outputs the next feature map data to one of the data buffers to overwrite one of the data buffers. 如申請專利範圍第4項所述的記憶體,其中該些資料緩衝區域分別可交替地被定址於該特殊功能處理核心以及該人工智慧核心,以使對應於該人工智慧核心的一第一記憶體空間包括該第一記憶體區域以及該些資料緩衝區域的其中之一,並且對應於該特殊功能處理核心的一第二記憶體空間包括該第二記憶體區域以及該些資料緩衝區域的其中之另一。The memory as recited in item 4 of the patent application range, wherein the data buffer areas can be alternately addressed to the special function processing core and the artificial intelligence core, respectively, so that a first memory corresponding to the artificial intelligence core The body space includes one of the first memory area and the data buffer areas, and a second memory space corresponding to the special function processing core includes the second memory area and the data buffer areas The other. 如申請專利範圍第1項所述的記憶體,其中專屬於該人工智慧核心與該些記憶體區域之間的一匯流排的寬度大於該特殊功能處理核心與該記憶體介面之間的一外部匯流排的寬度。The memory according to item 1 of the patent application scope, wherein the width of a bus bar dedicated to the artificial intelligence core and the memory areas is larger than an exterior between the special function processing core and the memory interface The width of the bus bar. 如申請專利範圍第1項所述的記憶體,其中該些記憶體區域分別對應於多個列緩衝區塊,並且該些記憶體區域各別包括多個記憶體庫,其中專屬於該人工智慧核心與該些記憶體區域之間的一匯流排的寬度大於或等於該些記憶體庫的一整列的資料數。The memory according to item 1 of the patent application scope, wherein the memory areas respectively correspond to a plurality of column buffer blocks, and the memory areas each include a plurality of memory banks, of which the artificial intelligence is exclusive The width of a bus between the core and the memory regions is greater than or equal to the number of data in a whole row of the memory banks. 如申請專利範圍第1項所述的記憶體,其中該記憶體為一動態隨機存取記憶體晶片。The memory according to item 1 of the patent application scope, wherein the memory is a dynamic random access memory chip. 一種具有記憶體內運算架構的記憶體操作方法,該記憶體包括一記憶體陣列、一模式暫存器、一記憶體介面以及一人工智慧核心,其中該方法包括: 依據該模式暫存器的該些記憶體模式設定來分別選擇性地將該記憶體中的多個記憶體區域被定址於該特殊功能處理核心以及該人工智慧核心;以及 藉由該特殊功能處理核心以及該人工智慧核心依據該些記憶體模式設定來分別存取該記憶體陣列中的不同記憶體區域。A memory operation method with an in-memory computing architecture. The memory includes a memory array, a mode register, a memory interface, and an artificial intelligence core. The method includes: Selectively addressing the plurality of memory regions in the memory to the special function processing core and the artificial intelligence core according to the memory mode settings of the mode register; and The special function processing core and the artificial intelligence core respectively access different memory areas in the memory array according to the memory mode settings. 如申請專利範圍第11項所述的記憶體操作方法,其中該特殊功能處理核心以及該人工智慧核心分別經由各自專屬的記憶體匯流排來同時存取該記憶體陣列的不同記憶體區域。The memory operation method as described in item 11 of the patent application range, wherein the special function processing core and the artificial intelligence core respectively access different memory areas of the memory array through their own dedicated memory buses. 如申請專利範圍第11項所述的記憶體操作方法,其中該些記憶體區域包括一第一記憶體區域以及一第二記憶體區域,該第一記憶體區域用以供該人工智慧核心專屬存取,並且該第二記憶體區域用以供該特殊功能處理核心專屬存取。The memory operation method as described in item 11 of the patent application range, wherein the memory areas include a first memory area and a second memory area, the first memory area is used exclusively for the artificial intelligence core Access, and the second memory area is used for exclusive access by the special function processing core. 如申請專利範圍第13項所述的記憶體操作方法,其中該些記憶體區域更包括多個資料緩衝區域,並且該人工智慧引擎以及該記憶體介面交替地至該些資料緩衝區域存取不同資料。The memory operation method as described in item 13 of the patent application scope, wherein the memory areas further include a plurality of data buffer areas, and the artificial intelligence engine and the memory interface alternately access the data buffer areas differently data. 如申請專利範圍第14項所述的記憶體操作方法,當該人工智慧核心執行一神經網路運算時,其中藉由該特殊功能處理核心以及該人工智慧核心依據該模式暫存器的該些記憶體模式設定來分別存取該記憶體陣列中的不同記憶體區域的步驟包括: 藉由該人工智慧核心讀取該些資料緩衝區域的其中之一的一輸入資料作為一輸入參數; 藉由該人工智慧核心讀取該第一記憶體區域的一權重資料;以及 藉由該人工智慧核心輸出一特徵資料至該第一記憶體區域。According to the memory operation method described in item 14 of the patent application scope, when the artificial intelligence core performs a neural network operation, wherein the processing core by the special function and the artificial intelligence core are based on the patterns of the pattern registers The steps of setting the memory mode to access different memory areas in the memory array include: Reading an input data of one of the data buffer areas by the artificial intelligence core as an input parameter; Reading weighting data of the first memory area by the artificial intelligence core; and The artificial intelligence core outputs a characteristic data to the first memory area. 如申請專利範圍第15項所述的記憶體操作方法,當該人工智慧核心執行該神經網路運算時,其中藉由該特殊功能處理核心以及該人工智慧核心依據該模式暫存器的該些記憶體模式設定來分別存取該記憶體陣列中的不同記憶體區域的步驟更包括: 藉由該人工智慧核心讀取該第一記憶體區域的該特徵資料作為下一輸入參數; 藉由該人工智慧核心讀取該第一記憶體區域的另一權重資料;以及 藉由該人工智慧核心輸出下一特徵圖資料至該些資料緩衝區的其中之一,以覆寫該些資料緩衝區的其中之一。According to the memory operation method described in item 15 of the patent application scope, when the artificial intelligence core executes the neural network operation, wherein the special function processing core and the artificial intelligence core are based on the patterns of the pattern registers The step of setting the memory mode to separately access different memory areas in the memory array further includes: Reading the characteristic data of the first memory area as the next input parameter by the artificial intelligence core; Reading another weight data of the first memory area by the artificial intelligence core; and The artificial intelligence core outputs the next feature map data to one of the data buffers to overwrite one of the data buffers. 如申請專利範圍第14項所述的記憶體操作方法,其中該些資料緩衝區域分別可交替地被定址於該特殊功能處理核心以及該人工智慧核心,以使對應於該人工智慧核心的一第一記憶體空間包括該第一記憶體區域以及該些資料緩衝區域的其中之一,並且對應於該特殊功能處理核心的一第二記憶體空間包括該第二記憶體區域以及該些資料緩衝區域的其中之另一。The memory operation method as described in item 14 of the patent application range, wherein the data buffer areas can be alternately addressed to the special function processing core and the artificial intelligence core, so that a first corresponding to the artificial intelligence core A memory space includes one of the first memory area and the data buffer areas, and a second memory space corresponding to the special function processing core includes the second memory area and the data buffer areas One of them. 如申請專利範圍第11項所述的記憶體操作方法,其中專屬於該人工智慧核心與該些記憶體區域之間的一匯流排的寬度大於該特殊功能處理核心與該記憶體介面之間的一外部匯流排的寬度。The memory operation method as described in item 11 of the patent application scope, wherein the width of a bus dedicated to the artificial intelligence core and the memory areas is larger than that between the special function processing core and the memory interface The width of an external bus bar. 如申請專利範圍第11項所述的記憶體操作方法,其中該些記憶體區域分別對應於多個列緩衝區塊,並且該些記憶體區域各別包括多個記憶體庫,其中專屬於該人工智慧核心與該些記憶體區域之間的一匯流排的寬度大於或等於該些記憶體庫的一整列的資料數。The memory operation method as described in item 11 of the patent application scope, wherein the memory areas respectively correspond to a plurality of column buffer blocks, and the memory areas each include a plurality of memory banks, of which dedicated to the The width of a bus between the artificial intelligence core and the memory areas is greater than or equal to the number of data in a whole row of the memory banks. 如申請專利範圍第11項所述的記憶體操作方法,其中該記憶體為一動態隨機存取記憶體晶片。The memory operation method as described in item 11 of the patent application range, wherein the memory is a dynamic random access memory chip.
TW108119618A 2018-10-11 2019-06-06 Memory with processing in memory architecture and operating method thereof TWI749331B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/563,956 US10990524B2 (en) 2018-10-11 2019-09-09 Memory with processing in memory architecture and operating method thereof

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201862744140P 2018-10-11 2018-10-11
US62/744,140 2018-10-11
US201862785234P 2018-12-27 2018-12-27
US62/785,234 2018-12-27

Publications (2)

Publication Number Publication Date
TW202014895A true TW202014895A (en) 2020-04-16
TWI749331B TWI749331B (en) 2021-12-11

Family

ID=70231709

Family Applications (1)

Application Number Title Priority Date Filing Date
TW108119618A TWI749331B (en) 2018-10-11 2019-06-06 Memory with processing in memory architecture and operating method thereof

Country Status (2)

Country Link
CN (1) CN111047029B (en)
TW (1) TWI749331B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI821148B (en) * 2023-04-26 2023-11-01 旺宏電子股份有限公司 Electronic device and method for operating the same

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2284947C (en) * 1999-10-04 2005-12-20 Storagequest Inc. Apparatus and method for managing data storage
KR100816053B1 (en) * 2006-11-21 2008-03-21 엠텍비젼 주식회사 Memory device, memory system and dual port memory device with self-copy function
US8719516B2 (en) * 2009-10-21 2014-05-06 Micron Technology, Inc. Memory having internal processors and methods of controlling memory access
CN105654419A (en) * 2016-01-25 2016-06-08 上海华力创通半导体有限公司 Operation processing system and operation processing method of image
CN109074845B (en) * 2016-03-23 2023-07-14 Gsi 科技公司 In-memory matrix multiplication and use thereof in neural networks
KR102650828B1 (en) * 2016-05-20 2024-03-26 삼성전자주식회사 Memory device shared by two or more processors and su|ystem incluing the same

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI821148B (en) * 2023-04-26 2023-11-01 旺宏電子股份有限公司 Electronic device and method for operating the same

Also Published As

Publication number Publication date
CN111047029A (en) 2020-04-21
CN111047029B (en) 2023-04-18
TWI749331B (en) 2021-12-11

Similar Documents

Publication Publication Date Title
US10990524B2 (en) Memory with processing in memory architecture and operating method thereof
US11294599B1 (en) Registers for restricted memory
TWI766396B (en) Data temporary storage apparatus, data temporary storage method and operation method
WO2017124642A1 (en) Device and method for executing forward calculation of artificial neural network
JP6335335B2 (en) Adaptive partition mechanism with arbitrary tile shapes for tile-based rendering GPU architecture
US11645533B2 (en) IR drop prediction with maximum convolutional neural network
JP2018120549A (en) Processor, information processing device, and operation method for processor
WO2019118363A1 (en) On-chip computational network
US20200184002A1 (en) Hardware accelerated convolution
TW202134861A (en) Interleaving memory requests to accelerate memory accesses
WO2020073801A1 (en) Data reading/writing method and system in 3d image processing, storage medium, and terminal
TW202127461A (en) Concurrent testing of a logic device and a memory device within a system package
TWI749331B (en) Memory with processing in memory architecture and operating method thereof
TWI714003B (en) Memory chip capable of performing artificial intelligence operation and method thereof
JP6912535B2 (en) Memory chips capable of performing artificial intelligence operations and their methods
CN110837483B (en) Tensor dimension transformation method and device
Zhou et al. Hygraph: Accelerating graph processing with hybrid memory-centric computing
WO2023124304A1 (en) Chip cache system, data processing method, device, storage medium, and chip
CN113741977B (en) Data operation method, data operation device and data processor
US9189448B2 (en) Routing image data across on-chip networks
WO2021243489A1 (en) Data processing method and apparatus for neural network
CN113407258A (en) Self-adaptive resource allocation layout and wiring method and system of storage and computation integrated architecture
JP2021507368A (en) Multiple pipeline architecture with special number detection
CN110826704B (en) Processing device and system for preventing overfitting of neural network
US20230267992A1 (en) Keeper-free volatile memory system