TWI749331B - Memory with processing in memory architecture and operating method thereof - Google Patents

Memory with processing in memory architecture and operating method thereof Download PDF

Info

Publication number
TWI749331B
TWI749331B TW108119618A TW108119618A TWI749331B TW I749331 B TWI749331 B TW I749331B TW 108119618 A TW108119618 A TW 108119618A TW 108119618 A TW108119618 A TW 108119618A TW I749331 B TWI749331 B TW I749331B
Authority
TW
Taiwan
Prior art keywords
memory
artificial intelligence
core
data
special function
Prior art date
Application number
TW108119618A
Other languages
Chinese (zh)
Other versions
TW202014895A (en
Inventor
黃崇仁
葛永年
Original Assignee
力晶積成電子製造股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 力晶積成電子製造股份有限公司 filed Critical 力晶積成電子製造股份有限公司
Priority to US16/563,956 priority Critical patent/US10990524B2/en
Publication of TW202014895A publication Critical patent/TW202014895A/en
Application granted granted Critical
Publication of TWI749331B publication Critical patent/TWI749331B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

A memory with processing in memory architecture and an operating method thereof are provided. The memory includes a memory array, a mode register, an artificial intelligence core, and a memory interface. The memory array includes a plurality of memory regions. The mode register stores a plurality of memory mode settings. The memory interface is coupled to the memory array and the mode register, and is externally coupled to a special function processing core. The artificial intelligence core is coupled to the memory array and the mode register. The plurality of memory regions are selectively addressed to the special function processing core and the artificial intelligence core according to the plurality of memory mode settings of the mode register, so that the special function processing core and the artificial intelligence core respectively access different memory regions in the memory array according to the plurality of memory mode settings.

Description

具有記憶體內運算架構的記憶體及其操作方法Memory body with in-memory operation structure and operation method thereof

本發明是有關於一種電路架構,且特別是有關於一種具有記憶體內運算(Processing In Memory, PIM)架構的記憶體及其操作方法。The present invention relates to a circuit architecture, and particularly relates to a memory with a Processing In Memory (PIM) architecture and an operation method thereof.

隨著人工智慧(Artificial Intelligence, AI)運算的演進,人工智慧運算的應用越來越廣泛,例如經由神經網路(Neural network)模型來進行影像(Image)資料分析、語音(Voice)資料分析、自然語言(Natural language)處理等神經網路運算。並且,隨著神經網路的運算複雜度越來越高,目前用於執行人工智慧運算的電腦設備已逐漸無法應付當前的神經網路運算需求,來提供有效且快速的運算性能。With the evolution of artificial intelligence (AI) computing, artificial intelligence computing has become more and more widely used, such as image data analysis, voice data analysis, and voice data analysis through neural network models. Neural network operations such as natural language processing. Moreover, as the computational complexity of neural networks is getting higher and higher, the current computer equipment used to perform artificial intelligence calculations has gradually been unable to cope with the current neural network calculation requirements to provide effective and fast calculation performance.

因此,目前已有專屬的處理核心被設計出來,以利用專屬的處理核心來進行神經網路運算。然而,雖然將神經網路運算獨立由專屬的處理核心執行可充分發揮處理核心的運算能力,但是專屬的處理核心的處理速度仍然受限於資料存取速度。由於專屬的處理核心與其他特殊功能處理核心經由相同的通用匯流排(Bus)來讀取記憶體的資料,因此在其他特殊功能處理核心占用通用匯流排的情況下,導致專屬的處理核心無法即時的取得執行人工智慧運算所需的資料。有鑑於此,如何設計一種能快速執行人工智慧運算的處理架構,以下將提出幾個實施例的解決方案。Therefore, at present, a dedicated processing core has been designed to use the dedicated processing core to perform neural network operations. However, although the neural network operations are independently executed by the dedicated processing cores to give full play to the computing capabilities of the processing cores, the processing speed of the dedicated processing cores is still limited by the data access speed. Since the exclusive processing core and other special function processing cores read the data of the memory through the same general bus (Bus), the exclusive processing core cannot be real-time when other special function processing cores occupy the general bus. To obtain the data needed to perform artificial intelligence operations. In view of this, how to design a processing architecture that can quickly execute artificial intelligence operations, the following will propose solutions in several embodiments.

本發明提供一種具有記憶體內運算架構的記憶體及其操作方法,可藉由整合在記憶體當中的人工智慧(Artificial Intelligence, AI)核心來直接讀取儲存在記憶體晶片當中的執行神經網路(Neural network)運算所需的資料,以實現快速地神經網路運算的功效。The present invention provides a memory with an in-memory arithmetic architecture and an operation method thereof, which can directly read the execution neural network stored in the memory chip by the artificial intelligence (AI) core integrated in the memory (Neural network) The data required for computing to achieve the effect of fast neural network computing.

本發明的具有記憶體內運算架構的記憶體包括記憶體陣列、模式暫存器、記憶體介面以及人工智慧核心。記憶體陣列包括多個記憶體區域。模式暫存器用以儲存多個記憶體模式設定。記憶體介面耦接記憶體陣列以及模式暫存器,並且外部耦接至特殊功能處理核心。人工智慧核心耦接記憶體陣列以及模式暫存器。所述多個記憶體區域依據模式暫存器的所述多個記憶體模式設定來分別選擇性地被定址於特殊功能處理核心以及人工智慧核心,以使特殊功能處理核心以及人工智慧核心依據所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域。The memory with an in-memory arithmetic architecture of the present invention includes a memory array, a pattern register, a memory interface, and an artificial intelligence core. The memory array includes a plurality of memory regions. The mode register is used to store multiple memory mode settings. The memory interface is coupled to the memory array and the mode register, and externally coupled to the special function processing core. The artificial intelligence core is coupled to the memory array and the pattern register. The plurality of memory areas are respectively selectively addressed to the special function processing core and the artificial intelligence core according to the plurality of memory mode settings of the mode register, so that the special function processing core and the artificial intelligence core are based on what The multiple memory mode settings are used to respectively access different memory areas in the memory array.

在本發明的一實施例中,上述的特殊功能處理核心以及人工智慧核心分別經由各自專屬的記憶體匯流排來同時存取記憶體陣列的不同記憶體區域。In an embodiment of the present invention, the above-mentioned special function processing core and artificial intelligence core respectively access different memory areas of the memory array via their own dedicated memory bus.

在本發明的一實施例中,上述的所述多個記憶體區域包括第一記憶體區域以及第二記憶體區域。第一記憶體區域用以供人工智慧核心專屬存取。第二記憶體區域用以供特殊功能處理核心專屬存取。In an embodiment of the present invention, the aforementioned plurality of memory regions includes a first memory region and a second memory region. The first memory area is used for exclusive access by the artificial intelligence core. The second memory area is used for exclusive access by the special function processing core.

在本發明的一實施例中,上述的所述多個記憶體區域更包括多個資料緩衝區域。人工智慧引擎以及記憶體介面交替地至所述多個資料緩衝區域存取不同資料。In an embodiment of the present invention, the above-mentioned multiple memory areas further include multiple data buffer areas. The artificial intelligence engine and the memory interface alternately access different data to the multiple data buffer areas.

在本發明的一實施例中,上述的當該人工智慧核心執行神經網路運算時,人工智慧核心讀取所述多個資料緩衝區域的其中之一的輸入資料作為輸入參數,並且讀取第一記憶體區域的權重資料。人工智慧核心輸出特徵資料至第一記憶體區域。In an embodiment of the present invention, when the artificial intelligence core executes neural network operations as described above, the artificial intelligence core reads the input data of one of the plurality of data buffer areas as input parameters, and reads the first The weight data of a memory area. The artificial intelligence core outputs characteristic data to the first memory area.

在本發明的一實施例中,上述的當人工智慧核心執行神經網路運算時,人工智慧核心讀取第一記憶體區域的特徵資料作為下一輸入參數,並且讀取第一記憶體區域的另一權重資料。人工智慧核心輸出下一特徵圖資料至所述多個資料緩衝區的其中之一,以覆寫所述多個資料緩衝區的其中之一。In an embodiment of the present invention, when the artificial intelligence core executes the neural network operation, the artificial intelligence core reads the characteristic data of the first memory area as the next input parameter, and reads the data of the first memory area Another weight data. The artificial intelligence core outputs the next feature map data to one of the multiple data buffers to overwrite one of the multiple data buffers.

在本發明的一實施例中,上述的所述多個資料緩衝區域分別可交替地被定址於特殊功能處理核心以及人工智慧核心,以使對應於人工智慧核心的第一記憶體空間包括第一記憶體區域以及所述多個資料緩衝區域的其中之一,並且對應於特殊功能處理核心的第二記憶體空間包括第二記憶體區域以及所述多個資料緩衝區域的其中之另一。In an embodiment of the present invention, the aforementioned multiple data buffer areas can be alternately addressed to the special function processing core and the artificial intelligence core, so that the first memory space corresponding to the artificial intelligence core includes the first The memory area and one of the plurality of data buffer areas, and the second memory space corresponding to the special function processing core includes a second memory area and the other of the plurality of data buffer areas.

在本發明的一實施例中,上述的專屬於人工智慧核心與所述多個記憶體區域之間的匯流排的寬度大於特殊功能處理核心與記憶體介面之間的外部匯流排的寬度。In an embodiment of the present invention, the width of the aforementioned bus dedicated to the artificial intelligence core and the plurality of memory regions is greater than the width of the external bus between the special function processing core and the memory interface.

在本發明的一實施例中,上述的所述多個記憶體區域分別對應於多個列緩衝區塊,並且所述多個記憶體區域各別包括多個記憶體庫。專屬於該人工智慧核心與所述多個記憶體區域之間的一匯流排的寬度大於或等於所述多個記憶體庫的一整列的資料數。In an embodiment of the present invention, the above-mentioned plurality of memory regions respectively correspond to a plurality of column buffer blocks, and the plurality of memory regions respectively include a plurality of memory banks. The width of a bus dedicated to the artificial intelligence core and the plurality of memory regions is greater than or equal to the number of data in a whole row of the plurality of memory banks.

在本發明的一實施例中,上述的該記憶體為動態隨機存取記憶體晶片。In an embodiment of the present invention, the aforementioned memory is a dynamic random access memory chip.

本發明的具有記憶體內運算架構的記憶體操作方法適於一記憶體包括記憶體陣列、模式暫存器、記憶體介面以及人工智慧核心。所述方法包括以下步驟:依據模式暫存器的所述多個記憶體模式設定來分別選擇性地將記憶體中的多個記憶體區域被定址於特殊功能處理核心以及人工智慧核心;以及藉由特殊功能處理核心以及人工智慧核心依據所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域。The memory operation method with in-memory arithmetic architecture of the present invention is suitable for a memory including a memory array, a pattern register, a memory interface and an artificial intelligence core. The method includes the following steps: separately selectively addressing a plurality of memory regions in the memory to a special function processing core and an artificial intelligence core according to the plurality of memory mode settings of the mode register; and The special function processing core and the artificial intelligence core respectively access different memory areas in the memory array according to the multiple memory mode settings.

在本發明的一實施例中,上述的特殊功能處理核心以及人工智慧核心分別經由各自專屬的記憶體匯流排來同時存取記憶體陣列的不同記憶體區域。In an embodiment of the present invention, the above-mentioned special function processing core and artificial intelligence core respectively access different memory areas of the memory array via their own dedicated memory bus.

在本發明的一實施例中,上述的所述多個記憶體區域包括第一記憶體區域以及第二記憶體區域,第一記憶體區域用以供人工智慧核心專屬存取,並且第二記憶體區域用以供特殊功能處理核心專屬存取。In an embodiment of the present invention, the above-mentioned plurality of memory areas includes a first memory area and a second memory area, the first memory area is used for exclusive access by the artificial intelligence core, and the second memory area The body area is used for exclusive access by the special function processing core.

在本發明的一實施例中,上述的所述多個記憶體區域更包括多個資料緩衝區域,並且人工智慧引擎以及記憶體介面交替地至所述多個資料緩衝區域存取不同資料。In an embodiment of the present invention, the aforementioned multiple memory areas further include multiple data buffer areas, and the artificial intelligence engine and the memory interface alternately access different data to the multiple data buffer areas.

在本發明的一實施例中,上述的當人工智慧核心執行神經網路運算時,其中藉由特殊功能處理核心以及人工智慧核心依據模式暫存器的所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域的步驟包括:藉由人工智慧核心讀取所述多個資料緩衝區域的其中之一的輸入資料作為輸入參數;藉由人工智慧核心讀取該第一記憶體區域的權重資料;以及藉由人工智慧核心輸出特徵資料至第一記憶體區域。In an embodiment of the present invention, when the artificial intelligence core executes neural network operations, the special function processing core and the artificial intelligence core store each according to the multiple memory mode settings of the mode register. The step of obtaining different memory areas in the memory array includes: reading the input data of one of the plurality of data buffer areas by an artificial intelligence core as an input parameter; and reading the first memory by the artificial intelligence core Weight data of the body area; and output characteristic data to the first memory area through the artificial intelligence core.

在本發明的一實施例中,上述的當人工智慧核心執行神經網路運算時,其中藉由特殊功能處理核心以及人工智慧核心依據模式暫存器的所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域的步驟更包括:藉由人工智慧核心讀取第一記憶體區域的特徵資料作為下一輸入參數;藉由人工智慧核心讀取第一記憶體區域的另一權重資料;以及藉由人工智慧核心輸出下一特徵圖資料至所述多個資料緩衝區的其中之一,以覆寫所述多個資料緩衝區的其中之一。In an embodiment of the present invention, when the artificial intelligence core executes neural network operations, the special function processing core and the artificial intelligence core store each according to the multiple memory mode settings of the mode register. The step of obtaining different memory areas in the memory array further includes: reading the characteristic data of the first memory area by the artificial intelligence core as the next input parameter; and reading the other memory area of the first memory area by the artificial intelligence core A weighting data; and outputting the next feature map data to one of the plurality of data buffers by the artificial intelligence core to overwrite one of the plurality of data buffers.

在本發明的一實施例中,上述的所述多個資料緩衝區域分別可交替地被定址於特殊功能處理核心以及人工智慧核心,以使對應於人工智慧核心的第一記憶體空間包括第一記憶體區域以及所述多個資料緩衝區域的其中之一,並且對應於特殊功能處理核心的第二記憶體空間包括第二記憶體區域以及所述多個資料緩衝區域的其中之另一。In an embodiment of the present invention, the aforementioned multiple data buffer areas can be alternately addressed to the special function processing core and the artificial intelligence core, so that the first memory space corresponding to the artificial intelligence core includes the first The memory area and one of the plurality of data buffer areas, and the second memory space corresponding to the special function processing core includes a second memory area and the other of the plurality of data buffer areas.

在本發明的一實施例中,上述的專屬於該人工智慧核心與所述多個記憶體區域之間的匯流排的寬度大於特殊功能處理核心與記憶體介面之間的外部匯流排的寬度。In an embodiment of the present invention, the width of the aforementioned bus dedicated to the artificial intelligence core and the plurality of memory regions is greater than the width of the external bus between the special function processing core and the memory interface.

在本發明的一實施例中,上述的所述多個記憶體區域分別對應於多個列緩衝區塊,並且所述多個記憶體區域各別包括多個記憶體庫。專屬於人工智慧核心與所述多個記憶體區域之間的匯流排的寬度大於或等於所述多個記憶體庫的整列的資料數。In an embodiment of the present invention, the above-mentioned plurality of memory regions respectively correspond to a plurality of column buffer blocks, and the plurality of memory regions respectively include a plurality of memory banks. The width of the bus dedicated to the artificial intelligence core and the plurality of memory regions is greater than or equal to the number of data in the entire row of the plurality of memory banks.

在本發明的一實施例中,上述的該記憶體為動態隨機存取記憶體晶片。In an embodiment of the present invention, the aforementioned memory is a dynamic random access memory chip.

基於上述,本發明的記憶體及其操作方法,可使外部的特殊功能處理核心以及設置在記憶體當中的人工智慧核心可同時存取記憶體陣列中的不同記憶體區域。因此,本發明的記憶體可快速地執行神經網路運算。Based on the above, the memory and the operating method of the present invention enable the external special function processing core and the artificial intelligence core set in the memory to simultaneously access different memory areas in the memory array. Therefore, the memory of the present invention can quickly perform neural network operations.

為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail in conjunction with the accompanying drawings.

為了使本發明之內容可以被更容易明瞭,以下特舉實施例做為本發明確實能夠據以實施的範例。另外,凡可能之處,在圖式及實施方式中使用相同標號的元件/構件/步驟,係代表相同或類似部件。In order to make the content of the present invention more comprehensible, the following embodiments are specifically cited as examples on which the present invention can indeed be implemented. In addition, wherever possible, elements/components/steps with the same reference numbers in the drawings and embodiments represent the same or similar parts.

圖1是繪製本發明的一實施例的記憶體的方塊示意圖。參考圖1,記憶體100包括記憶體陣列110、模式暫存器120、人工智慧(Artificial Intelligence, AI)核心130以及記憶體介面140。記憶體陣列110耦接人工智慧核心130以及記憶體介面140。模式暫存器(Mode register)120耦接記憶體陣列110、人工智慧核心130以及記憶體介面140。記憶體陣列110包括多個記憶體區域。所述多個記憶體區域各別用以儲存特定資料(或稱資料集(Dataset))。並且,在一實施例中,記憶體100還可進一步包括多個專屬記憶體控制單元。所述多個專屬記憶體控制單元以一對一地對應於所述多個記憶體區域,來分別執行資料存取動作。在本實施例中,記憶體介面140可外部耦接至特殊功能處理核心。並且,所述多個記憶體區域依據記錄在模式暫存器120當中的多個記憶體模式設定的來分別選擇性地被定址(Addressing)於特殊功能處理核心以及人工智慧核心130,以使特殊功能處理核心以及人工智慧核心130可依據所述多個記憶體模式設定來分別存取記憶體陣列110中的不同記憶體區域。並且,本實施例的記憶體100具有執行人工智慧運算的能力。FIG. 1 is a schematic block diagram of a memory according to an embodiment of the present invention. 1, the memory 100 includes a memory array 110, a pattern register 120, an artificial intelligence (AI) core 130, and a memory interface 140. The memory array 110 is coupled to the artificial intelligence core 130 and the memory interface 140. The mode register 120 is coupled to the memory array 110, the artificial intelligence core 130 and the memory interface 140. The memory array 110 includes a plurality of memory regions. The multiple memory areas are respectively used for storing specific data (or called a data set (Dataset)). Moreover, in an embodiment, the memory 100 may further include a plurality of dedicated memory control units. The multiple dedicated memory control units correspond to the multiple memory areas one-to-one to perform data access operations respectively. In this embodiment, the memory interface 140 may be externally coupled to the special function processing core. Moreover, the plurality of memory areas are selectively addressed in the special function processing core and the artificial intelligence core 130 according to the settings of the plurality of memory modes recorded in the mode register 120, so as to make special The function processing core and the artificial intelligence core 130 can respectively access different memory areas in the memory array 110 according to the multiple memory mode settings. In addition, the memory 100 of this embodiment has the ability to perform artificial intelligence operations.

在本實施例中,記憶體100可為動態隨機存取記憶體(Dynamic Random Access Memory, DRAM)晶片,並且可例如是由控制邏輯、運算邏輯以及快取(Cache)單元等諸如此類的電路元件所建構而成的記憶體內運算(Processing In Memory, PIM)架構。人工智慧核心130可整合在記憶體100的周邊電路區域當中,以直接透過專屬的記憶體控制器以及專屬的匯流排(Bus)來存取記憶體陣列110的多個記憶體庫(Memory bank)。並且,人工智慧核心130可預先設計以具有執行特定的神經網路(Neural network)運算的功能及特性。換言之,本實施例的記憶體100具有執行人工智慧運算的功能,並且人工智慧核心130以及外部的特殊功能處理核心可同時存取記憶體陣列110,以提供高效率的資料存取以及運算效果。In this embodiment, the memory 100 may be a dynamic random access memory (Dynamic Random Access Memory, DRAM) chip, and may be, for example, a control logic, arithmetic logic, and a cache unit and other circuit elements. Constructed of Processing In Memory (PIM) architecture. The artificial intelligence core 130 can be integrated in the peripheral circuit area of the memory 100 to directly access multiple memory banks of the memory array 110 through a dedicated memory controller and a dedicated bus (Bus) . In addition, the artificial intelligence core 130 can be pre-designed to have the function and characteristics of performing a specific neural network (Neural network) operation. In other words, the memory 100 of this embodiment has the function of performing artificial intelligence operations, and the artificial intelligence core 130 and the external special function processing core can simultaneously access the memory array 110 to provide high-efficiency data access and operation effects.

在本實施例中,所述特殊功能處理核心可例如是中央處理單元(Central Processing Unit, CPU)核心、影像信號處理器(Image Signal Processor, ISP)核心、數位信號處理器(Digital Signal Processor, DSP)核心、繪圖處理器(Graphics Processing Unit, GPU)核心或其他類似特殊功能處理核心。在本實施例中,特殊功能處理核心經由通用的匯流排(或標準匯流排)耦接至記憶體介面140,以經由記憶體介面140存取記憶體陣列110。對此,人工智慧核心130是經由記憶體內部的專屬匯流排來存取記憶體陣列110,因此不受限於記憶體介面140的寬度或速度,並且人工智慧核心130可依據特定的資料存取模式來快速地存取記憶體陣列130。In this embodiment, the special function processing core may be, for example, a central processing unit (Central Processing Unit, CPU) core, an image signal processor (Image Signal Processor, ISP) core, or a digital signal processor (Digital Signal Processor, DSP) core. ) Core, Graphics Processing Unit (GPU) core or other similar special function processing core. In this embodiment, the special function processing core is coupled to the memory interface 140 via a universal bus (or a standard bus) to access the memory array 110 via the memory interface 140. In this regard, the artificial intelligence core 130 accesses the memory array 110 through a dedicated bus inside the memory, so it is not limited to the width or speed of the memory interface 140, and the artificial intelligence core 130 can access according to specific data Mode to quickly access the memory array 130.

圖2是繪製本發明的一實施例的記憶體與多個特殊功能處理核心的架構示意圖。參考圖2,記憶體200包括記憶體區域211、213、列緩衝區塊212、214、模式暫存器220、人工智慧核心230以及記憶體介面240。在本實施例中,模式暫存器220耦接人工智慧核心230以及記憶體介面240,以分別提供多個記憶體模式設定至人工智慧核心230以及記憶體介面240。人工智慧核心230以及記憶體介面240各自獨立運作,以分別存取記憶體陣列。記憶體陣列包括記憶體區域211、213以及列緩衝區塊212、214。記憶體區域211、213個別包括多個記憶體庫。記憶體區域211、213可為資料緩衝區域。在本實施例中,記憶體介面240外部耦接至另一記憶體介面340。記憶體介面340例如經由匯流排耦接至中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353。FIG. 2 is a schematic diagram illustrating the architecture of a memory and a plurality of special function processing cores according to an embodiment of the present invention. 2, the memory 200 includes memory areas 211 and 213, column buffer blocks 212 and 214, a mode register 220, an artificial intelligence core 230, and a memory interface 240. In this embodiment, the mode register 220 is coupled to the artificial intelligence core 230 and the memory interface 240 to provide a plurality of memory mode settings to the artificial intelligence core 230 and the memory interface 240, respectively. The artificial intelligence core 230 and the memory interface 240 operate independently to access the memory array respectively. The memory array includes memory areas 211 and 213 and column buffer blocks 212 and 214. The memory areas 211 and 213 each include a plurality of memory banks. The memory areas 211 and 213 may be data buffer areas. In this embodiment, the memory interface 240 is externally coupled to another memory interface 340. The memory interface 340 is, for example, coupled to the central processing unit core 351, the graphics processor core 352, and the digital signal processor core 353 via a bus.

在本實施例中,當中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353需要存取列緩衝區塊212或列緩衝區塊214時,中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353需經由記憶體介面240、340依順序或依隊列來存取列緩衝區塊212或列緩衝區塊214。然而,無論上述的各種特殊功能處理核心的當前存取記憶體陣列的情況為何,人工智慧核心230可同時存取在記憶體陣列中的不同記憶體區域。在一實施例中,記憶體區域211或記憶體區域213可例如適用於存取執行神經網路運算或其他機器學習運算所需的數位化輸入資料、權重(Weight)資料或特徵圖(Feature map)資料等。In this embodiment, when the central processing unit core 351, the graphics processor core 352, and the digital signal processor core 353 need to access the column buffer block 212 or the column buffer block 214, the central processing unit core 351, graphics processor core The core 352 and the digital signal processor core 353 need to access the column buffer block 212 or the column buffer block 214 through the memory interfaces 240 and 340 sequentially or in a queue. However, regardless of the current access to the memory array of the various special function processing cores described above, the artificial intelligence core 230 can simultaneously access different memory regions in the memory array. In one embodiment, the memory area 211 or the memory area 213 may be suitable for accessing digital input data, weight data, or feature map (Feature map) required for performing neural network operations or other machine learning operations, for example. ) Information, etc.

值得注意的是,上述的各種特殊功能處理核心以及人工智慧核心230是分別經由各自專屬的記憶體匯流排來同時存取記憶體陣列的不同記憶體區域。也就是說,當上述的各種特殊功能處理核心經由列緩衝區塊212存取記憶體區域211當中的資料時,人工智慧核心230可經由列緩衝區塊214存取記憶體區域213當中的資料。並且,當上述的各種特殊功能處理核心經由列緩衝區塊214存取記憶體區域213當中的資料時,人工智慧核心230可經由列緩衝區塊212存取記憶體區域211當中的資料。換言之,上述的各種特殊功能處理核心以及人工智慧核心230可交替地至作為資料緩衝區域的記憶體區域211、213存取不同資料。此外,在一實施例中,人工智慧核心230還可進一步包括多個快取(Cache)或佇列(Queue),並且人工智慧核心230可透過所述多個快取或所述多個佇列以管線式(Pipeline)的方式來快速存取記憶體區域211或記憶體區域213當中的資料。It is worth noting that the above-mentioned various special function processing cores and artificial intelligence core 230 simultaneously access different memory areas of the memory array through their own dedicated memory bus. That is, when the aforementioned various special function processing cores access the data in the memory area 211 through the column buffer block 212, the artificial intelligence core 230 can access the data in the memory area 213 through the column buffer block 214. Moreover, when the aforementioned various special function processing cores access data in the memory area 213 via the column buffer block 214, the artificial intelligence core 230 can access the data in the memory area 211 via the column buffer block 212. In other words, the aforementioned various special function processing cores and artificial intelligence core 230 can alternately access different data to the memory areas 211 and 213 as data buffer areas. In addition, in an embodiment, the artificial intelligence core 230 may further include a plurality of caches or queues, and the artificial intelligence core 230 may use the plurality of caches or the plurality of queues. The data in the memory area 211 or the memory area 213 is quickly accessed in a pipeline manner.

圖3是繪製本發明的另一實施例的記憶體與多個特殊功能處理核心的架構示意圖。參考圖3,本實施例的處理器400包括記憶體區域411、413、415、417、列緩衝區塊412、414、416、418、模式暫存器420、人工智慧核心430以及記憶體介面440。在本實施例中,模式暫存器420耦接人工智慧核心430以及記憶體介面440,以分別提供多個記憶體模式設定至人工智慧核心430以及記憶體介面440。記憶體介面440例如經由匯流排耦接至中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353。在本實施例中,人工智慧核心430以及記憶體介面440各自獨立運作,以分別存取記憶體陣列。記憶體陣列包括記憶體區域411、413、415、417以及列緩衝區塊412、414、416、418,並且記憶體區域411、413、415、417各別包括多個記憶體庫。FIG. 3 is a schematic diagram illustrating the structure of a memory and a plurality of special function processing cores according to another embodiment of the present invention. 3, the processor 400 of this embodiment includes memory areas 411, 413, 415, 417, column buffer blocks 412, 414, 416, 418, a mode register 420, an artificial intelligence core 430, and a memory interface 440 . In this embodiment, the mode register 420 is coupled to the artificial intelligence core 430 and the memory interface 440 to provide a plurality of memory mode settings to the artificial intelligence core 430 and the memory interface 440, respectively. The memory interface 440 is, for example, coupled to the central processing unit core 351, the graphics processor core 352, and the digital signal processor core 353 via a bus. In this embodiment, the artificial intelligence core 430 and the memory interface 440 operate independently to access the memory array respectively. The memory array includes memory areas 411, 413, 415, and 417 and column buffer blocks 412, 414, 416, and 418, and the memory areas 411, 413, 415, and 417 each include a plurality of memory banks.

在本實施例中,記憶體區域413、415可為資料緩衝區域。記憶體區域411供上述的各種特殊功能處理核心專屬存取,其中所述各種特殊功能處理核心可例如是中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353。記憶體區域417供人工智慧核心430專屬存取。也就是說,當上述的各種特殊功能處理核心與人工智慧核心430分別專屬存取記憶體區域411以及記憶體區域417時,上述的各種特殊功能處理核心與人工智慧核心430之間不會互相影響存取動作。舉例而言,以執行神經網路運算為例,記憶體區域417的多個記憶體庫的一整列可例如儲存權重資料的多個權重值。人工智慧核心430可透過列緩衝區塊418來依序且交錯地讀取專屬於人工智慧核心430的記憶體區域417的所述多個記憶體庫的每一列,以快速地取得執行神經網路運算所需的資料。In this embodiment, the memory areas 413 and 415 may be data buffer areas. The memory area 411 is exclusively accessed by the aforementioned various special function processing cores, where the various special function processing cores may be, for example, a central processing unit core 351, a graphics processor core 352, and a digital signal processor core 353. The memory area 417 is exclusively accessed by the artificial intelligence core 430. That is to say, when the aforementioned various special function processing cores and artificial intelligence core 430 exclusively access the memory area 411 and the memory area 417, the aforementioned various special function processing cores and artificial intelligence core 430 will not affect each other. Access action. For example, taking a neural network operation as an example, a whole row of the multiple memory banks in the memory area 417 can store multiple weight values of weight data, for example. The artificial intelligence core 430 can sequentially and alternately read each row of the multiple memory banks dedicated to the memory area 417 of the artificial intelligence core 430 through the column buffer block 418 to quickly obtain and execute the neural network Data required for calculation.

圖4A以及圖4B是繪製本發明的一實施例的不同記憶體空間當中的不同記憶體區塊的交換定址的示意圖。請參考圖3、圖4A以及圖4B。以下將以對多個影像資料連續執行神經網路運算為例並且搭配圖4A以及圖4B來說明記憶體400的一種存取方式。人工智慧核心430所執行的人工智慧運算可例如是深度學習網路(Deep Neural Networks, DNN)運算、卷積神經網路(Convolutional Neural Networks, CNN)運算或循環神經網路(Recurrent Neural Network, RNN)運算等,本發明並不加以限制。在一實施情境中,記憶體區域417包括子記憶體區域417_1、417_2。子記憶體區域417_1例如用於儲存具有多個權重值的權重資料,並且子記憶體區域417_2例如用於儲存具有多個特徵值的特徵圖資料。在此一實施情境中,記憶體區域413例如被定址於特殊功能處理核心354,並且記憶體區域415例如被定址於人工智慧核心430。特殊功能處理核心354可例如是圖3的中央處理單元核心351、繪圖處理器核心352或數位信號處理器核心353。因此,如圖4A所示,對應於特殊功能處理核心354的記憶體空間450包括記憶體區域411、413,並且對應於人工智慧核心430的記憶體空間460包括記憶體區域415、417。4A and 4B are schematic diagrams illustrating the swap addressing of different memory blocks in different memory spaces according to an embodiment of the present invention. Please refer to Figure 3, Figure 4A and Figure 4B. Hereinafter, a method of accessing the memory 400 will be described by taking the continuous execution of neural network operations on multiple image data as an example, and in conjunction with FIG. 4A and FIG. 4B. The artificial intelligence operations performed by the artificial intelligence core 430 can be, for example, Deep Neural Networks (DNN) operations, Convolutional Neural Networks (CNN) operations, or Recurrent Neural Networks (RNN). ) Calculations, etc., the present invention is not limited. In an implementation scenario, the memory area 417 includes sub-memory areas 417_1 and 417_2. The sub-memory area 417_1, for example, is used to store weight data having multiple weight values, and the sub-memory area 417_2, for example, is used to store feature map data having multiple feature values. In this implementation scenario, the memory area 413 is, for example, addressed to the special function processing core 354, and the memory area 415 is, for example, addressed to the artificial intelligence core 430. The special function processing core 354 may be, for example, the central processing unit core 351, the graphics processor core 352 or the digital signal processor core 353 of FIG. 3. Therefore, as shown in FIG. 4A, the memory space 450 corresponding to the special function processing core 354 includes memory areas 411 and 413, and the memory space 460 corresponding to the artificial intelligence core 430 includes memory areas 415 and 417.

在此實施情境中,假設特殊功能處理核心354即圖3的數位信號處理器核心353,因此記憶體區域415可儲存有由數位信號處理器核心353先前儲存的數位化輸入資料,例如影像資料。人工智慧核心430可例如執行神經網路運算,以對儲存在記憶體區域415當中的當前影像資料進行影像辨識。人工智慧核心430可經由專屬匯流排來讀取記憶體區域417的權重資料,並且讀取記憶體區域415的影像資料作為神經網路運算所需的輸入參數,以進行神經網路運算。同時,數位信號處理器核心353可經由記憶體介面340、440對記憶體區域413儲存下一個影像資料。In this implementation scenario, it is assumed that the special function processing core 354 is the digital signal processor core 353 of FIG. 3, so the memory area 415 can store the digital input data previously stored by the digital signal processor core 353, such as image data. The artificial intelligence core 430 may, for example, perform neural network operations to perform image recognition on the current image data stored in the memory area 415. The artificial intelligence core 430 can read the weight data of the memory area 417 through a dedicated bus, and read the image data of the memory area 415 as input parameters required for neural network operations to perform neural network operations. At the same time, the digital signal processor core 353 can store the next image data in the memory area 413 via the memory interfaces 340 and 440.

接著,當記憶體區域415的影像資料經由人工智慧核心430辨識完成後,透過設定模式暫存器420,可交換記憶體區域413、415的被定址對象,以交換記憶體區域413、415所處的記憶體空間。因此,記憶體區域413、415經由定址交換後,如圖4B所示,對應於數位信號處理器核心353的記憶體空間450’包括記憶體區域411、415,並且對應於人工智慧核心430的記憶體空間460’包括記憶體區域413、417。此時,人工智慧核心430可接續執行神經網路運算,以對儲存在記憶體區域413當中的新一個影像資料進行影像辨識。人工智慧核心430可經由專屬匯流排來讀取記憶體區域417-1的權重資料,並且讀取記憶體區域413的下一個影像資料作為神經網路運算所需的輸入參數,以進行神經網路運算。同時,數位信號處理器核心353可經由記憶體介面340、440對記憶體區域415進行覆寫,以儲存下下一個影像資料至記憶體區域415。據此,本實施例的記憶體400可提供高效率的資料存取操作,並且記憶體400可實現具有高速執行效果的神經網路運算。Then, after the image data in the memory area 415 is recognized by the artificial intelligence core 430, the address objects of the memory areas 413 and 415 can be exchanged through the setting mode register 420 to exchange the memory areas 413 and 415. Memory space. Therefore, after the memory areas 413 and 415 are exchanged with addresses, as shown in FIG. 4B, the memory space 450' corresponding to the digital signal processor core 353 includes the memory areas 411 and 415 and corresponds to the memory of the artificial intelligence core 430 The volume space 460 ′ includes memory areas 413 and 417. At this time, the artificial intelligence core 430 can continue to perform neural network operations to perform image recognition on the new image data stored in the memory area 413. The artificial intelligence core 430 can read the weight data of the memory area 417-1 through a dedicated bus, and read the next image data of the memory area 413 as the input parameters required for the neural network operation to perform the neural network Operation. At the same time, the digital signal processor core 353 can overwrite the memory area 415 via the memory interfaces 340 and 440 to store the next image data in the memory area 415. Accordingly, the memory 400 of the present embodiment can provide high-efficiency data access operations, and the memory 400 can implement neural network operations with high-speed execution effects.

圖5A以及圖5B是繪製本發明的一實施例的同一記憶體空間的不同記憶體區塊的交換存取的示意圖。請參考圖3、圖5A以及圖5B。以下將以對影像資料執行神經網路運算為例並且搭配圖4A以及圖4B來說明記憶體400的另一種存取方式。在上述情境中,在神經網路運算的輸入層階段,對應於人工智慧核心430的記憶體空間550可例如包括記憶體區域415、子記憶體區域417_1、417_2。人工智慧核心430可讀取記憶體區域415,以取得輸入資料,並作為輸入參數。記憶體區域415儲存有由數位信號處理器核心353先前儲存的影像資料。並且,人工智慧核心430讀取子記憶體區域417_1的權重資料。因此,人工智慧核心430依據輸入參數以及權重資料執行神經網路運算,以產生特徵圖資料,並且人工智慧核心430將特徵圖資料儲存至子記憶體區域417_2。5A and 5B are schematic diagrams illustrating the swap access of different memory blocks in the same memory space according to an embodiment of the present invention. Please refer to Figure 3, Figure 5A and Figure 5B. Hereinafter, a neural network operation performed on image data will be taken as an example, and another method of accessing the memory 400 will be described in conjunction with FIG. 4A and FIG. 4B. In the above scenario, at the input layer stage of the neural network operation, the memory space 550 corresponding to the artificial intelligence core 430 may, for example, include the memory area 415, the sub-memory areas 417_1, 417_2. The artificial intelligence core 430 can read the memory area 415 to obtain input data and use it as an input parameter. The memory area 415 stores image data previously stored by the digital signal processor core 353. In addition, the artificial intelligence core 430 reads the weight data of the sub-memory area 417_1. Therefore, the artificial intelligence core 430 performs neural network operations according to the input parameters and weight data to generate feature map data, and the artificial intelligence core 430 stores the feature map data in the sub-memory area 417_2.

接著,在神經網路運算的下一隱藏層階段,對應於人工智慧核心430的記憶體空間550’包括記憶體區域415、子記憶體區域417_1、417_2。人工智慧核心430讀取前次儲存在子記憶體區域417_2的特徵圖資料,以作為當前隱藏層的輸入參數,並且讀取子記憶體區域417_1的權重資料。因此,人工智慧核心430依據輸入參數以及權重資料執行神經網路運算,以產生新的特徵圖資料,並且人工智慧核心430將新的特徵圖資料複寫至記憶體區域415。換言之,被定址於人工智慧核心430的記憶體區域不變,但是人工智慧核心430的讀取及儲存目標位址交換。以此類推,本實施例的人工智慧核心430可利用記憶體區域415以及子記憶體區域417_2來輪替地讀取先前產生的特徵圖資料以及儲存人工智慧核心430在當前進行神經網路運算的過程中所產生的當前特徵圖資料。由於各記憶體區域有其獨立匯流排,因此本實施例的人工智慧核心430可快速地取得輸入資料以及權重資料,並且快速地進執行神經網路運算並儲存輸出資料。Then, in the next hidden layer stage of neural network operation, the memory space 550' corresponding to the artificial intelligence core 430 includes a memory area 415, sub-memory areas 417_1, 417_2. The artificial intelligence core 430 reads the feature map data previously stored in the sub-memory area 417_2 as input parameters of the current hidden layer, and reads the weight data of the sub-memory area 417_1. Therefore, the artificial intelligence core 430 performs neural network operations according to the input parameters and weight data to generate new feature map data, and the artificial intelligence core 430 replicates the new feature map data to the memory area 415. In other words, the memory area addressed in the artificial intelligence core 430 remains unchanged, but the read and storage target addresses of the artificial intelligence core 430 are exchanged. By analogy, the artificial intelligence core 430 of this embodiment can use the memory area 415 and the sub-memory area 417_2 to alternately read the previously generated feature map data and store the artificial intelligence core 430 currently performing neural network operations. The current feature map data generated in the process. Since each memory area has its own independent bus, the artificial intelligence core 430 of this embodiment can quickly obtain input data and weight data, and quickly perform neural network operations and store output data.

圖6是繪製本發明的一實施例的記憶體操作方法的流程圖。參考圖6,本實施例的記憶體操作方法可至少適用於圖1的記憶體100,以使記憶體100執行步驟S610、S620。記憶體100的記憶體介面140可外部耦接至特殊功能處理核心。在步驟S610中,依據模式暫存器120的多個記憶體模式設定來分別選擇性地將記憶體陣列110的多個記憶體區域被定址於特殊功能處理核心以及人工智慧核心130的記憶體空間。在步驟S620中,特殊功能處理核心以及人工智慧核心130依據所述多個記憶體模式設定來分別存取記憶體陣列110中的不同記憶體區域。因此,本實施例的記憶體操作方法可使記憶體100可同時供特殊功能處理核心以及人工智慧核心130進行存取,以提供高效率的記憶體運作效果。FIG. 6 is a flowchart of a memory operation method according to an embodiment of the present invention. Referring to FIG. 6, the memory operation method of this embodiment can be at least applicable to the memory 100 of FIG. 1, so that the memory 100 executes steps S610 and S620. The memory interface 140 of the memory 100 can be externally coupled to a special function processing core. In step S610, according to the multiple memory mode settings of the mode register 120, the multiple memory regions of the memory array 110 are respectively selectively addressed to the memory spaces of the special function processing core and the artificial intelligence core 130. . In step S620, the special function processing core and the artificial intelligence core 130 respectively access different memory areas in the memory array 110 according to the multiple memory mode settings. Therefore, the memory operation method of this embodiment enables the memory 100 to be accessed by the special function processing core and the artificial intelligence core 130 at the same time, so as to provide a highly efficient memory operation effect.

另外,關於本實施例的記憶體100的相關內部元件、實施方式以及技術細節,可參考上述圖1至圖5B實施例的說明而獲致足夠的教示、建議以及實施說明,因此不再贅述。In addition, with regard to the relevant internal components, implementations, and technical details of the memory 100 of this embodiment, reference can be made to the description of the above-mentioned embodiments of FIG. 1 to FIG.

綜上所述,本發明的記憶體及其操作方法,可藉由模式暫存器設計有多個特定記憶體模式設定,以使記憶體陣列的多個記憶體區域可依據所述多個特定記憶體模式設定來分別選擇性地被定址於外部的特殊功能處理核心以及人工智慧核心,以使外部的特殊功能處理核心以及人工智慧核心可同時存取記憶體陣列中的不同記憶體區域。因此,設置在記憶體當中的人工智慧核心可快速地執行神經網路運算。In summary, the memory and operation method of the present invention can be designed with multiple specific memory mode settings through the mode register, so that multiple memory regions of the memory array can be based on the multiple specific The memory mode is set to be selectively addressed to the external special function processing core and artificial intelligence core respectively, so that the external special function processing core and artificial intelligence core can simultaneously access different memory areas in the memory array. Therefore, the artificial intelligence core set in the memory can quickly perform neural network operations.

雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the present invention. The scope of protection of the present invention shall be subject to those defined by the attached patent scope.

100、200、400:記憶體 110:記憶體陣列 120、220、420:模式暫存器 130、230、430:人工智慧核心 140、240、440:記憶體介面 211、213、411、413、415、417:記憶體區域 212、214、412、414、416、418:列緩衝區塊 340:記憶體介面 351:中央處理單元核心 352:繪圖處理器核心 353:數位信號處理器核心 354:特殊功能處理核心 417_1、417_2:子記憶體區域 450、450’、460、460’、550、550’:記憶體空間 S610、S620:步驟100, 200, 400: memory 110: memory array 120, 220, 420: mode register 130, 230, 430: artificial intelligence core 140, 240, 440: memory interface 211, 213, 411, 413, 415, 417: memory area 212, 214, 412, 414, 416, 418: column buffer block 340: Memory Interface 351: Central Processing Unit Core 352: graphics processor core 353: Digital Signal Processor Core 354: Special function processing core 417_1, 417_2: sub-memory area 450, 450’, 460, 460’, 550, 550’: memory space S610, S620: steps

圖1是繪製本發明的一實施例的記憶體的方塊示意圖。 圖2是繪製本發明的一實施例的記憶體與多個特殊功能處理核心的架構示意圖。 圖3是繪製本發明的另一實施例的記憶體與多個特殊功能處理核心的架構示意圖。 圖4A以及圖4B是繪製本發明的一實施例的不同記憶體空間當中的不同記憶體區塊的交換定址的示意圖。 圖5A以及圖5B是繪製本發明的一實施例的同一記憶體空間的不同記憶體區塊的交換存取的示意圖。 圖6是繪製本發明的一實施例的記憶體操作方法的流程圖。FIG. 1 is a schematic block diagram of a memory according to an embodiment of the present invention. FIG. 2 is a schematic diagram illustrating the architecture of a memory and a plurality of special function processing cores according to an embodiment of the present invention. FIG. 3 is a schematic diagram illustrating the structure of a memory and a plurality of special function processing cores according to another embodiment of the present invention. 4A and 4B are schematic diagrams illustrating the swap addressing of different memory blocks in different memory spaces according to an embodiment of the present invention. 5A and 5B are schematic diagrams illustrating the swap access of different memory blocks in the same memory space according to an embodiment of the present invention. FIG. 6 is a flowchart of a memory operation method according to an embodiment of the present invention.

100:記憶體 100: memory

110:記憶體陣列 110: memory array

120:模式暫存器 120: Mode register

130:人工智慧核心 130: Artificial Intelligence Core

140:記憶體介面 140: memory interface

Claims (10)

一種具有記憶體內運算架構的記憶體,包括:一記憶體陣列,包括多個記憶體區域;一模式暫存器,用以儲存多個記憶體模式設定;一記憶體介面,耦接該記憶體陣列以及該模式暫存器,並且外部耦接至一特殊功能處理核心;以及一人工智慧核心,耦接該記憶體陣列以及該模式暫存器,其中該些記憶體區域依據該模式暫存器的該些記憶體模式設定來分別選擇性地被定址於該特殊功能處理核心以及該人工智慧核心,以使該特殊功能處理核心以及該人工智慧核心依據該些記憶體模式設定來分別存取該記憶體陣列中的不同記憶體區域,其中該特殊功能處理核心以及該人工智慧核心分別經由各自專屬的記憶體匯流排來同時存取該記憶體陣列的不同記憶體區域,其中專屬於該人工智慧核心與該些記憶體區域之間的一匯流排的寬度大於該特殊功能處理核心與該記憶體介面之間的一外部匯流排的寬度,其中該些記憶體區域包括一第一記憶體區域以及一第二記憶體區域,該第一記憶體區域用以供該人工智慧核心專屬存取,並且該第二記憶體區域用以供該特殊功能處理核心專屬存取,其中該些記憶體區域更包括多個資料緩衝區域,並且該人工智慧引擎以及該記憶體介面交替地至該些資料緩衝區域存取不同 資料,其中當該人工智慧核心執行一神經網路運算時,該人工智慧核心讀取該些資料緩衝區域的其中之一的一輸入資料作為一輸入參數,並且讀取該第一記憶體區域的一權重資料,其中該人工智慧核心輸出一特徵資料至該第一記憶體區域。 A memory with an in-memory arithmetic architecture, comprising: a memory array including a plurality of memory regions; a mode register for storing a plurality of memory mode settings; and a memory interface coupled to the memory Array and the mode register, and externally coupled to a special function processing core; and an artificial intelligence core, coupled to the memory array and the mode register, wherein the memory areas are based on the mode register The memory mode settings are selectively addressed to the special function processing core and the artificial intelligence core, respectively, so that the special function processing core and the artificial intelligence core respectively access the Different memory areas in the memory array, wherein the special function processing core and the artificial intelligence core respectively access different memory areas of the memory array simultaneously through their own dedicated memory bus, which are exclusive to the artificial intelligence The width of a bus between the core and the memory regions is greater than the width of an external bus between the special function processing core and the memory interface, wherein the memory regions include a first memory region and A second memory area, the first memory area is used for exclusive access by the artificial intelligence core, and the second memory area is used for exclusive access by the special function processing core, wherein the memory areas are more Including multiple data buffer areas, and the artificial intelligence engine and the memory interface alternately access the data buffer areas with different Data, wherein when the artificial intelligence core executes a neural network operation, the artificial intelligence core reads an input data of one of the data buffer areas as an input parameter, and reads the first memory area A weight data, wherein the artificial intelligence core outputs a feature data to the first memory area. 如申請專利範圍第1項所述的記憶體,其中當該人工智慧核心執行該神經網路運算時,該人工智慧核心讀取該第一記憶體區域的該特徵資料作為下一輸入參數,並且讀取該第一記憶體區域的另一權重資料,其中該人工智慧核心輸出下一特徵圖資料至該些資料緩衝區的其中之一,以覆寫該些資料緩衝區的其中之一。 The memory according to claim 1, wherein when the artificial intelligence core executes the neural network operation, the artificial intelligence core reads the characteristic data of the first memory area as the next input parameter, and Read another weight data of the first memory area, wherein the artificial intelligence core outputs the next feature map data to one of the data buffers to overwrite one of the data buffers. 如申請專利範圍第1項所述的記憶體,其中該些資料緩衝區域分別可交替地被定址於該特殊功能處理核心以及該人工智慧核心,以使對應於該人工智慧核心的一第一記憶體空間包括該第一記憶體區域以及該些資料緩衝區域的其中之一,並且對應於該特殊功能處理核心的一第二記憶體空間包括該第二記憶體區域以及該些資料緩衝區域的其中之另一。 For the memory described in claim 1, wherein the data buffer areas can be alternately addressed to the special function processing core and the artificial intelligence core, so that a first memory corresponding to the artificial intelligence core The volume space includes the first memory area and one of the data buffer areas, and a second memory space corresponding to the special function processing core includes the second memory area and one of the data buffer areas The other. 如申請專利範圍第1項所述的記憶體,其中該些記憶體區域分別對應於多個列緩衝區塊,並且該些記憶體區域各別包括多個記憶體庫,其中專屬於該人工智慧核心與該些記憶體區域之間的一匯流排的寬度大於或等於該些記憶體庫的一整列的資料數。 As for the memory described in claim 1, wherein the memory areas respectively correspond to a plurality of row buffer blocks, and the memory areas each include a plurality of memory banks, which are exclusive to the artificial intelligence The width of a bus between the core and the memory areas is greater than or equal to the number of data in a whole row of the memory banks. 如申請專利範圍第1項所述的記憶體,其中該記憶體為一動態隨機存取記憶體晶片。 The memory described in item 1 of the scope of patent application, wherein the memory is a dynamic random access memory chip. 一種具有記憶體內運算架構的記憶體操作方法,該記憶體包括一記憶體陣列、一模式暫存器、一記憶體介面以及一人工智慧核心,其中該方法包括:依據該模式暫存器的該些記憶體模式設定來分別選擇性地將該記憶體中的多個記憶體區域被定址於該特殊功能處理核心以及該人工智慧核心;以及藉由該特殊功能處理核心以及該人工智慧核心依據該些記憶體模式設定來分別存取該記憶體陣列中的不同記憶體區域,其中該特殊功能處理核心以及該人工智慧核心分別經由各自專屬的記憶體匯流排來同時存取該記憶體陣列的不同記憶體區域,其中專屬於該人工智慧核心與該些記憶體區域之間的一匯流排的寬度大於該特殊功能處理核心與該記憶體介面之間的一外部匯流排的寬度,其中該些記憶體區域包括一第一記憶體區域以及一第二記憶體區域,該第一記憶體區域用以供該人工智慧核心專屬存取,並且該第二記憶體區域用以供該特殊功能處理核心專屬存取,其中該些記憶體區域更包括多個資料緩衝區域,並且該人工智慧引擎以及該記憶體介面交替地至該些資料緩衝區域存取不同資料, 當該人工智慧核心執行一神經網路運算時,其中藉由該特殊功能處理核心以及該人工智慧核心依據該模式暫存器的該些記憶體模式設定來分別存取該記憶體陣列中的不同記憶體區域的步驟包括:藉由該人工智慧核心讀取該些資料緩衝區域的其中之一的一輸入資料作為一輸入參數;藉由該人工智慧核心讀取該第一記憶體區域的一權重資料;以及藉由該人工智慧核心輸出一特徵資料至該第一記憶體區域。 A memory operation method with an in-memory arithmetic architecture. The memory includes a memory array, a pattern register, a memory interface, and an artificial intelligence core, wherein the method includes: the pattern register according to the The memory mode is set to selectively address multiple memory regions in the memory to the special function processing core and the artificial intelligence core; and according to the special function processing core and the artificial intelligence core The memory modes are set to respectively access different memory areas in the memory array, wherein the special function processing core and the artificial intelligence core respectively access different memory arrays via their own dedicated memory bus. A memory area, where the width of a bus dedicated to the artificial intelligence core and the memory areas is greater than the width of an external bus between the special function processing core and the memory interface, wherein the memories The body area includes a first memory area and a second memory area. The first memory area is used for the exclusive access of the artificial intelligence core, and the second memory area is used for the special function processing core. Access, wherein the memory areas further include a plurality of data buffer areas, and the artificial intelligence engine and the memory interface alternately access different data to the data buffer areas, When the artificial intelligence core executes a neural network operation, the special function processing core and the artificial intelligence core access different memory arrays according to the memory mode settings of the mode register. The steps of the memory area include: reading an input data of one of the data buffer areas by the artificial intelligence core as an input parameter; reading a weight of the first memory area by the artificial intelligence core Data; and output a characteristic data to the first memory area by the artificial intelligence core. 如申請專利範圍第6項所述的記憶體操作方法,當該人工智慧核心執行該神經網路運算時,其中藉由該特殊功能處理核心以及該人工智慧核心依據該模式暫存器的該些記憶體模式設定來分別存取該記憶體陣列中的不同記憶體區域的步驟更包括:藉由該人工智慧核心讀取該第一記憶體區域的該特徵資料作為下一輸入參數;藉由該人工智慧核心讀取該第一記憶體區域的另一權重資料;以及藉由該人工智慧核心輸出下一特徵圖資料至該些資料緩衝區的其中之一,以覆寫該些資料緩衝區的其中之一。 For example, in the memory operation method described in item 6 of the scope of patent application, when the artificial intelligence core executes the neural network operation, the special function processing core and the artificial intelligence core are based on the pattern registers. The step of setting the memory mode to separately access different memory areas in the memory array further includes: reading the characteristic data of the first memory area by the artificial intelligence core as the next input parameter; The artificial intelligence core reads another weighted data of the first memory area; and the artificial intelligence core outputs the next feature map data to one of the data buffers to overwrite the data of the data buffers one of them. 如申請專利範圍第6項所述的記憶體操作方法,其中該些資料緩衝區域分別可交替地被定址於該特殊功能處理核心以及該人工智慧核心,以使對應於該人工智慧核心的一第一記憶體空 間包括該第一記憶體區域以及該些資料緩衝區域的其中之一,並且對應於該特殊功能處理核心的一第二記憶體空間包括該第二記憶體區域以及該些資料緩衝區域的其中之另一。 For example, in the memory operation method described in item 6 of the scope of patent application, the data buffer areas can be alternately addressed to the special function processing core and the artificial intelligence core, so that the data corresponding to the first artificial intelligence core A memory empty The space includes the first memory area and one of the data buffer areas, and a second memory space corresponding to the special function processing core includes the second memory area and one of the data buffer areas another. 如申請專利範圍第6項所述的記憶體操作方法,其中該些記憶體區域分別對應於多個列緩衝區塊,並且該些記憶體區域各別包括多個記憶體庫,其中專屬於該人工智慧核心與該些記憶體區域之間的一匯流排的寬度大於或等於該些記憶體庫的一整列的資料數。 In the memory operation method described in item 6 of the scope of patent application, the memory regions correspond to a plurality of column buffer blocks, and the memory regions respectively include a plurality of memory banks, which are exclusive to the The width of a bus between the artificial intelligence core and the memory regions is greater than or equal to the number of data in a whole row of the memory banks. 如申請專利範圍第6項所述的記憶體操作方法,其中該記憶體為一動態隨機存取記憶體晶片。 In the memory operation method described in item 6 of the scope of patent application, the memory is a dynamic random access memory chip.
TW108119618A 2018-10-11 2019-06-06 Memory with processing in memory architecture and operating method thereof TWI749331B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/563,956 US10990524B2 (en) 2018-10-11 2019-09-09 Memory with processing in memory architecture and operating method thereof

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201862744140P 2018-10-11 2018-10-11
US62/744,140 2018-10-11
US201862785234P 2018-12-27 2018-12-27
US62/785,234 2018-12-27

Publications (2)

Publication Number Publication Date
TW202014895A TW202014895A (en) 2020-04-16
TWI749331B true TWI749331B (en) 2021-12-11

Family

ID=70231709

Family Applications (1)

Application Number Title Priority Date Filing Date
TW108119618A TWI749331B (en) 2018-10-11 2019-06-06 Memory with processing in memory architecture and operating method thereof

Country Status (2)

Country Link
CN (1) CN111047029B (en)
TW (1) TWI749331B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI821148B (en) * 2023-04-26 2023-11-01 旺宏電子股份有限公司 Electronic device and method for operating the same

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010052061A1 (en) * 1999-10-04 2001-12-13 Storagequest Inc. Apparatus And Method For Managing Data Storage
CN107402901A (en) * 2016-05-20 2017-11-28 三星电子株式会社 The storage device shared by two or more processors and include its system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100816053B1 (en) * 2006-11-21 2008-03-21 엠텍비젼 주식회사 Memory device, memory system and dual port memory device with self-copy function
US8719516B2 (en) * 2009-10-21 2014-05-06 Micron Technology, Inc. Memory having internal processors and methods of controlling memory access
CN105654419A (en) * 2016-01-25 2016-06-08 上海华力创通半导体有限公司 Operation processing system and operation processing method of image
KR102458885B1 (en) * 2016-03-23 2022-10-24 쥐에스아이 테크놀로지 인코포레이티드 In-memory matrix multiplication and its use in neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010052061A1 (en) * 1999-10-04 2001-12-13 Storagequest Inc. Apparatus And Method For Managing Data Storage
CN107402901A (en) * 2016-05-20 2017-11-28 三星电子株式会社 The storage device shared by two or more processors and include its system

Also Published As

Publication number Publication date
TW202014895A (en) 2020-04-16
CN111047029A (en) 2020-04-21
CN111047029B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
US10990524B2 (en) Memory with processing in memory architecture and operating method thereof
US11294599B1 (en) Registers for restricted memory
JP6335335B2 (en) Adaptive partition mechanism with arbitrary tile shapes for tile-based rendering GPU architecture
TWI766396B (en) Data temporary storage apparatus, data temporary storage method and operation method
US11657119B2 (en) Hardware accelerated convolution
US11645533B2 (en) IR drop prediction with maximum convolutional neural network
JP2018120549A (en) Processor, information processing device, and operation method for processor
WO2019118363A1 (en) On-chip computational network
KR20200108774A (en) Memory Device including instruction memory based on circular queue and Operation Method thereof
TW202134861A (en) Interleaving memory requests to accelerate memory accesses
TW202127461A (en) Concurrent testing of a logic device and a memory device within a system package
TWI749331B (en) Memory with processing in memory architecture and operating method thereof
WO2020073801A1 (en) Data reading/writing method and system in 3d image processing, storage medium, and terminal
TWI714003B (en) Memory chip capable of performing artificial intelligence operation and method thereof
TWI782403B (en) Shared scratchpad memory with parallel load-store
JP6802480B2 (en) Processor, information processing device and how the processor operates
Zhou et al. Hygraph: Accelerating graph processing with hybrid memory-centric computing
US11443185B2 (en) Memory chip capable of performing artificial intelligence operation and method thereof
US9189448B2 (en) Routing image data across on-chip networks
JP2019191710A (en) Information processing unit, information processing method and information processing program
CN112035056B (en) Parallel RAM access equipment and access method based on multiple computing units
JP4071930B2 (en) Synchronous DRAM
KR102356704B1 (en) Computing apparatus and method for processing operations thereof
JP2008102599A (en) Processor
WO2021196160A1 (en) Data storage management apparatus and processing core