WO2021041644A1 - Transfer data in a memory system with artificial intelligence mode - Google Patents

Transfer data in a memory system with artificial intelligence mode Download PDF

Info

Publication number
WO2021041644A1
WO2021041644A1 PCT/US2020/048160 US2020048160W WO2021041644A1 WO 2021041644 A1 WO2021041644 A1 WO 2021041644A1 US 2020048160 W US2020048160 W US 2020048160W WO 2021041644 A1 WO2021041644 A1 WO 2021041644A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory device
data
memory
training
operations
Prior art date
Application number
PCT/US2020/048160
Other languages
French (fr)
Inventor
Alberto TROIA
Original Assignee
Micron Technology, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Micron Technology, Inc. filed Critical Micron Technology, Inc.
Priority to KR1020227009913A priority Critical patent/KR20220052358A/en
Priority to CN202080060027.3A priority patent/CN114303136A/en
Priority to EP20859442.4A priority patent/EP4022525A4/en
Publication of WO2021041644A1 publication Critical patent/WO2021041644A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1678Details of memory controller using bus width
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/0284Multiple user address space allocation, e.g. using different base addresses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7821Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/401Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
    • G11C11/4063Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
    • G11C11/407Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
    • G11C11/409Read-write [R-W] circuits 
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/401Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
    • G11C11/4063Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
    • G11C11/407Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
    • G11C11/409Read-write [R-W] circuits 
    • G11C11/4096Input/output [I/O] data management or control circuits, e.g. reading or writing circuits, I/O drivers or bit-line switches 
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/54Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using elements simulating biological cells, e.g. neuron
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1006Data managing, e.g. manipulating data before writing or reading out, data bus switches or control circuits therefor
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1048Data bus control circuits, e.g. precharging, presetting, equalising
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/22Read-write [R-W] timing or clocking circuits; Read-write [R-W] control signal generators or management 
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0207Addressing or allocation; Relocation with multidimensional access, e.g. row/column, matrix
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1024Latency reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/72Details relating to flash memory management
    • G06F2212/7208Multiple device management, e.g. distributing data over multiple flash devices
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C2207/00Indexing scheme relating to arrangements for writing information into, or reading information out from, a digital store
    • G11C2207/22Control and timing of internal memory operations
    • G11C2207/2236Copy
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates generally to memory devices, and more particularly, to apparatuses and methods for transferring data in a memory system with an artificial intelligence (AI) mode.
  • AI artificial intelligence
  • Memory devices are typically provided as internal, semiconductor, integrated circuits in computers or other electronic devices.
  • memory can include volatile and non-volatile memory.
  • Volatile memory can require power to maintain its data and includes random-access memory (RAM), dynamic random access memory (DRAM), and synchronous dynamic random access memory (SDRAM), among others.
  • RAM random-access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • Non volatile memory can provide persistent data by retaining stored data when not powered and can include NAND flash memory, NOR flash memory, read only memory (ROM), Electrically Erasable Programmable ROM (EEPROM), Erasable Programmable ROM (EPROM), and resistance variable memory such as phase change random access memory (PCRAM), resistive random access memory (RRAM), and magnetoresistive random access memory (MRAM), among others.
  • PCRAM phase change random access memory
  • RRAM resistive random access memory
  • MRAM magnetoresistive random access memory
  • Non-volatile memory may be used in, for example, personal computers, portable memory sticks, digital cameras, cellular telephones, portable music players such as MP3 players, movie players, and other electronic devices.
  • Memory cells can be arranged into arrays, with the arrays being used in memory devices.
  • Figure 1 A is a block diagram of an apparatus in the form of a computing system including a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure.
  • Figure IB is a block diagram of an apparatus in the form of a computing system including a memory system with memory devices having an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure.
  • Figure 2 is a block diagram of a number of registers on a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure.
  • AI artificial intelligence
  • Figures 3A and 3B are block diagrams of a number of bits in a number of registers on a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure.
  • AI artificial intelligence
  • FIG. 4 is a block diagram of a number of blocks of a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure
  • AI artificial intelligence
  • FIG. 5 is a flow diagram illustrating an example artificial intelligence process in a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure.
  • AI artificial intelligence
  • Figure 6 is a flow diagram illustrating an example method to transfer data in accordance with a number of embodiments of the present disclosure.
  • the present disclosure includes apparatuses and methods related to transferring data in a memory system with an artificial intelligence (AI) mode.
  • An example apparatus can include receive a command indicating that the apparatus operate in an artificial intelligence (AI) mode, a command to perform AI operations using an AI accelerator based on a status of a number of registers, and a command to transfer data between memory devices that are performing an AI operation.
  • the AI accelerator can include hardware, software, and or firmware that is configured to perform operations (e.g., logic operations, among other operations) associated with AI operations.
  • the hardware can include circuitry configured as an adder and/or multiplier to perform operations, such as logic operations, associated with AI operations.
  • a memory device can include data stored in the arrays of memory cells that is used by the AI accelerator to perform AI operations.
  • Input data along with data that defines the neural network, such neuron data, activation function data, and/or bias value data can be stored in the memory devices, transferred between memory devices, and used to perform AI operations.
  • the memory device can include temporary block to store partial results of the AI operations and output blocks to store the results of the AI operations.
  • the host can issue a read command for the output block and the results in the output blocks can be sent to a host to complete performance of a command requesting that an AI operation be performed.
  • the host and/or a controller of a memory system can issue a command to transfer input and/or output data between memory devices performing AI operations.
  • the memory system can transfer output data of a layer and/or neuron of an AI operation from a first memory device to a second memory device; and the second memory device can use the output data transferred to the second memory device as input data for a subsequent layer and/or neuron of the AI operation.
  • the first memory device and the second memory device performing the AI operation can include the same or different neural network data, activation function data, and/or bias data; and neural network data, activation function data, and/or bias data can be transferred between memory devices.
  • the results of the AI operation can be reported to a controller and/or host.
  • Each memory device of a memory system can send input data and neuron data to the AI accelerator and the AI accelerator can perform AI operations on the input data and neuron data.
  • the memory device can store the results of the AI operations in temporary blocks on the memory device.
  • the memory device can send the results from the temporary blocks and apply bias value data to the AI accelerator.
  • the AI accelerator can perform AI operations on the results from the temporary blocks using the bias value data.
  • the memory device can store the results of the AI operations in temporary blocks on the memory device.
  • the memory device can send the results from the temporary blocks and activation function data to the AI accelerator.
  • the AI accelerator can perform AI operations on the results from the temporary blocks and/or the activation function data.
  • the memory device can store the results of the AI operations in output blocks on the memory device.
  • the AI accelerator can reduce latency and power consumption associated with AI operations when compared to AI operations that are performed on a host.
  • AI operations performed on a host use data that is exchanged between a memory device and the host, which adds latency and power consumption to the AI operations.
  • AI operations performed according to embodiments of the present disclosure can be performed on a memory device using the AI accelerator and the memory arrays, where data is not transferred from the memory device while performing the AI operations.
  • a number of something can refer to one or more of such things.
  • a number of memory devices can refer to one or more of memory devices.
  • designators such as “N”, as used herein, particularly with respect to reference numerals in the drawings, indicates that a number of the particular feature so designated can be included with a number of embodiments of the present disclosure.
  • Figure 1 A is a block diagram of an apparatus in the form of a computing system 100 including a memory device 120 in accordance with a number of embodiments of the present disclosure.
  • a memory device 120, memory arrays 125-1,... 125-N, memory controller 122, and/or AI accelerator 124 might also be separately considered an “apparatus.”
  • host 102 can be coupled to the memory device 120.
  • Host 102 can be a laptop computer, personal computers, digital camera, digital recording and playback device, mobile telephone, PDA, memory card reader, interface hub, among other host systems, and can include a memory access device, e.g., a processor.
  • a processor can intend one or more processors, such as a parallel processing system, a number of coprocessors, etc.
  • Host 102 includes a host controller 108 to communicate with memory device 120.
  • the host controller 108 can send commands to the memory device 120.
  • the host controller 108 can communicate with the memory device 120, memory controller 122 on memory device 120, and/or the AI accelerator 124 on memory device 120 to perform AI operations, read data, write data, and/or erase data, among other operations.
  • AI operations may include machine learning or neural network operations, which may include training operations or inference operations, or both.
  • each memory device 120 may represent a layer within a neural network or deep neural network (e.g., a network having three or more hidden layers).
  • each memory device 120 may be or include nodes of a neural network, and a layer of the neural network may be composed of multiple memory devices or portions of several memory devices 120.
  • Memory devices 120 may store weights (or models) for AI operations in memory arrays 125.
  • a physical host interface can provide an interface for passing control, address, data, and other signals between memory device 120 and host 102 having compatible receptors for the physical host interface.
  • the signals can be communicated between host 102 and memory device 120 on a number of buses, such as a data bus and/or an address bus, for example.
  • Memory device 120 can include controller 120, AI accelerator
  • Memory device 120 can be a low- power double data rate dynamic random access memory, such as a LPDDR5 device, and/or a graphics double data rate dynamic random access memory, such as a GDDR6 device, among other types of devices.
  • Memory arrays 125- 1,... ,125-N can include a number of memory cells, such as volatile memory cells (e.g., DRAM memory cells, among other types of volatile memory cells) and/or non-volatile memory cells (e.g., RRAM memory cells, among other types of non-volatile memory cells).
  • Memory device 120 can read and/or write data to memory arrays 125-1,... ,125-N.
  • ,125-N can store data that is used during AI operations performed on memory device 120.
  • Memory arrays 125-1,... ,125-N can store inputs, outputs, weight matrix and bias information of a neural network, and/or activation functions information used by the AI accelerator to perform AI operations on memory device 120.
  • the host controller 108, memory controller 122, and/or AI accelerator 124 on memory device 120 can include control circuitry, e.g., hardware, firmware, and/or software.
  • the host controller 108, memory controller 122, and/or AI accelerator 124 can be an application specific integrated circuit (ASIC) coupled to a printed circuit board including a physical interface.
  • ASIC application specific integrated circuit
  • memory controller 122 on memory device 120 can include registers 130. Registers 130 can be programmed to provide information for the AI accelerator to perform AI operations. Registers 130 can include any number of registers. Registers 130 can be written to and/or read by host 102, memory controller 122, and/or AI accelerator 124.
  • Registers 130 can provide input, output, neural network, and/or activation functions information for AI accelerator 124.
  • Registers 130 can include mode register 131 to select a mode of operation for memory device 120.
  • the AI mode of operation can be selected by writing a word to register 131, such as OxAA and/or 0x2 AA, for example, which inhibits access to the registers associated with normal operation of memory device 120 and allows access to the registers associated with AI operations.
  • the AI mode of operation can be selected using a signature that uses a crypto algorithm that is authenticated by a key stored in the memory device 120.
  • Registers 130 can also be located in memory arrays 125-1,... , 125- N and be accessible by controller 122.
  • AI accelerator 124 can include hardware 126 and/or software/firmware 128 to perform AI operations.
  • Hardware 126 can include adder/multiplier 126 to perform logic operations associated with AI operations.
  • Memory controller 122 and/or AI accelerator 124 can received commands from host 102 to perform AI operations.
  • Memory device 120 can perform the AI operations requested in the commands from host 102 using the AI accelerator
  • the memory device can report back information, such as results and/or error information, for example, of the AI operations to host 120.
  • the AI operations performed by AI accelerator 124 can be performed without use of an external processing resource.
  • the memory arrays 125-1,... ,125-N can provide main memory for the memory system or could be used as additional memory or storage throughout the memory system.
  • Each memory array 125-1,... ,125-N can include a number of blocks of memory cells.
  • the blocks of memory cells can be used to store data that is used during AI operations performed by memory device 120.
  • Memory arrays 125-1,... ,125-N can include DRAM memory cells, for example.
  • Embodiments are not limited to a particular type of memory device.
  • the memory device can include RAM, ROM, DRAM, SDRAM, PCRAM, RRAM, 3D XPoint, and flash memory, among others.
  • memory device 120 may perform an AI operation that is or includes one or more inference steps.
  • Memory arrays 125 may be layers of a neural network or may each be individual nodes and memory device 120 may be layer; or memory device 120 may be a node within a larger network. Additionally or alternatively, memory arrays 125 may store data or weights, or both, to be used (e.g., summed) within a node. Each node (e.g., memory array 125) may combine an input from data read from cells of the same or a different memory array 125 with weights read from cells of memory array
  • Combinations of weights and data may, for instance, be summed within the periphery of a memory array 125 or within hardware 126 using adder/multiplier 127. In such cases, the summed result may be passed to an activation function represented or instantiated in the periphery of a memory array 125 or within hardware 126. The result may be passed to another memory device 120 or may be used within AI accelerator 124 (e.g., by software/firmware 128) to make a decision or to train a network that includes memory device 120.
  • a network that employs memory device 120 may be capable of or used for supervised or unsupervised learning. This may be combined with other learning or training regimes. In some cases, a trained network or model is imported or used with memory device 120, and memory device’s 120 operations are primarily or exclusively related to inference.
  • memory device 120 can include address circuitry to latch address signals provided over I/O connections through I/O circuitry. Address signals can be received and decoded by a row decoder and a column decoder to access the memory arrays 125-1,... ,125-N. It will be appreciated by those skilled in the art that the number of address input connections can depend on the density and architecture of the memory arrays 125-1,... ,125-N.
  • Figure IB is a block diagram of an apparatus in the form of a computing system including a memory system with memory devices having an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure.
  • a memory devices 120- 1, 120-2, 120-3, and 120-X, controller 10, and/or memory system 104 might also be separately considered an “apparatus.”
  • host 102 can be coupled to the memory system 104.
  • Host 102 can be a laptop computer, personal computers, digital camera, digital recording and playback device, mobile telephone, PDA, memory card reader, interface hub, among other host systems, and can include a memory access device, e.g., a processor.
  • a processor can intend one or more processors, such as a parallel processing system, a number of coprocessors, etc.
  • Host 102 includes a host controller 108 to communicate with memory system 104.
  • the host controller 108 can send commands to the memory system 104.
  • the memory system 104 can include controller 104 and memory devices 120-1, 120-2, 120-3, and 120-X.
  • Memory device 120-1, 120-2, 120-3, and 120-X can be the memory device 120 described above in association with Figure 1A and include an AI accelerator with hardware, software, and/or firmware to perform AI operations.
  • the host controller 108 can communicate with controller 105 and/or memory devices 120-1, 120-2, 120-3, and 120-X to perform AI operations, read data, write data, and/or erase data, among other operations.
  • a physical host interface can provide an interface for passing control, address, data, and other signals between memory system 104 and host 102 having compatible receptors for the physical host interface.
  • the signals can be communicated between host 102 and memory system 104 on a number of buses, such as a data bus and/or an address bus, for example.
  • Memory system 104 can include controller 105 coupled to memory devices 120-1, 120-2, 120-3, and 120-X via bus 121.
  • Bus 121 can be configured such that the full bandwidth of bus 121 can be consumed when operation a portion or all of the memory devices of a memory system.
  • two memory devices of the four memory device 120-1, 120-2, 120-3, and 120-X shown in Figure IB can be configured to operate while using the full bandwidth of bus 121.
  • controller 105 can send a command on select line 117 that can select memory devices 120-1 and 120-3 for operation during a particular time period, such as at the same time.
  • Controller 105 can send a command on select line 119 that can select memory device 120-2 and 120-X for operation during a particular time period, such as at the same time.
  • controller 105 can be configured to send commands on select lines 117 and 119 to select any combination of the memory devices 120-1, 120-2, 120-3, and 120-X.
  • a command on select line 117 can be used to select memory devices 120-1 and 120-3 and a command on select line 119 can be used to select memory devices 120-2 and 120-X.
  • the selected memory device can be used during performance of AI operations. Data associated with the AI operation can be copied and/or transferred between the selected memory devices 120-1, 120-2, 120-3, and 120-X on bus 121. For example, a first portion of an AI operation can be performed on memory device 120-1 and an output of the first portion of the AI operation can be transferred to memory device 120-3 on bus 121.
  • the output from a particular layer and/or neuron of an AI operation on a first memory device can be transferred to a second memory device; and the second memory device can continue the AI operation using the transferred data in the next layer and/or neuron of the AI operation.
  • the output of the first portion of the AI operation on memory device 120-1 can be used by memory device 120-3 as an input of a second portion of the AI operation.
  • neural network data, activation function data and/or bias data associated with an AI operation can be transferred between memory devices 120-1, 120-2, 120-3, and 120-X on bus 121.
  • FIG 2 is a block diagram of a number of registers on a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure.
  • Registers 230 can be AI registers and include input information, output information, neural network information, and/or activation functions information, among other types of information, for use by an AI accelerator, a controller, and/or memory arrays of a memory device (e.g., AI accelerator 124, memory controller 122, and/or memory arrays 125-1,... , 125-N in Figure 1).
  • Registers can be read and/or writen to based on commands from a host, an AI accelerator, and/or a controller (e.g., host 102, AI accelerator 124, memory controller 122 in Figure 1).
  • Register 232-0 can define parameters associated with AI mode of the memory device. Bits in register 232-0 can start AI operations, restart AI operations, indicate content in registers is valid, clear content from registers, and/or exit from AI mode.
  • Registers 232-1, 232-2, 232-3, 232-4, and 232-5 can define the size of inputs used in AI operations, the number of inputs used in AI operations, and the start address and end address of the inputs used in AI operations.
  • Registers 232-7, 232-8, 232-9, 232-10, and 232-11 can define the size of outputs of AI operations, the number of outputs in AI operations, and the start address and end address of the outputs of AI operations.
  • Register 232-12 can be used to enable the usage of the input banks, the neuron banks, the output banks, the bias banks, the activation functions, and the temporary banks used during AI operations.
  • Registers 232-13, 232-14, 232- 15, 232-16, 232-17, 232-18, 232-19, 232-20, 232-21, 232-22, 232-23, 232-24, and 232-25 can define the size, number, and location of neurons and/or layers of the neural network used during AI operations.
  • Register 232-26 can enable a debug/hold mode of the AI accelerator and output to be observed at a layer of AI operations.
  • Register 232- 26 can indicate that an activation should be applied during AI operations and that the AI operation can step forward (e.g., perform a next step in an AI operation) in AI operations.
  • Register 232-26 can indicate that the temporary blocks, where the output of the layer is located, is valid. The data in the temporary blocks can be changed by a host and/or a controller on the memory device, such that the changed data can be used in the AI operation as the AI operation steps forward.
  • Registers 232-27, 232-28, and 232-29 can define the layer where the debug/hold mode will stop the AI operation, change the content of the neural network, and/or observe the output of the layer.
  • Registers 232-30, 232-31, 232-32, and 232-33 can define the size of temporary banks used in AI operations and the start address and end address of the temporary banks used in AI operations.
  • Register 232-30 can define the start address and end address of a first temporary bank used in AI operations and register 232-33 can define the start address and end address of a first temporary bank used in AI operations.
  • Registers 232-31, and 232-32 can define the size of the temporary banks used in AI operations.
  • Registers 232-34, 232-35, 232-36, 232-37, 232-38, and 232-39 can be associated with the activation functions used in AI operations.
  • Register 232-34 can enable usage of the activation function block, enable usage of the activation function for each neuron, the activation function for each layer, and enables usage of an external activation function.
  • Registers 232-35 can define the start address and the end address of the location of the activation functions.
  • Registers 232-36, 232-37, 232-38, and 232-39 can define the resolution of the inputs (e.g., x-axis) and outputs (e.g., y-axis) of the activation functions and/or a custom defined activation function.
  • Registers 232-40, 232-41, 232-42, 232-43, and 232-44 can define the size of bias values used in AI operations, the number of bias values used in AI operations, and the start address and end address of the bias values used in AI operations.
  • Register 232-45 can provide status information for the AI calculations and provide information for the debug/hold mode. Register 232-45 can enable debug/hold mode, indicate that the AI accelerator is performing AI operations, indicate that the full capability of the AI accelerator should be used, indicate only matrix calculations of the AI operations should be made, and/or indicate that the AI operation can proceed to the next neuron and/or layer.
  • Register 232-46 can provide error information regarding AI operations. Register 232-46 can indicate that there was an error in a sequence of an AI operation, that there was an error in an algorithm of an AI operations, that there was an error in a page of data that ECC was not able to correct, and/or that there was an error in a page of data that ECC was able to correct.
  • Register 232-47 can indicate an activation function to use in AI operations.
  • Register 232-47 can indicated one of a number of pre-defme activation function can be used in AI operations and/or a custom activation function located in a block can be used in AI operations.
  • Registers 232-48, 232-49, and 232-50 can indicate the neuron and/or layer where the AI operation is executing. In the case where errors occur during the AI operations, registers 232-48, 232-49, and 232-50 the neuron and/or layer where an error occurred.
  • FIGS 3A and 3B are block diagrams of a number of bits in a number of registers on a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure.
  • Each register 332-0,... , 332-50 can include a number of bits, bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7, to indicate information associated with performing AI operations.
  • Register 332-0 can define parameters associated with AI mode of the memory device. Bit 334-5 of register 332-0 can be a read/write bit and can indicate that an elaboration of an AI operation can restart 360 at the beginning when programmed to lb. Bit 334-5 of register 332-0 can be reset to 0b once the AI operation has restarted. Bit 334-4 of register 332-0 can be a read/write bit and can indicate that an elaboration of an AI operation can start 361 when programmed to lb. Bit 334-4 of register 332-0 can be reset to 0b once the AI operation has started.
  • Bit 334-3 of register 332-0 can be a read/write bit and can indicate that the content of the AI registers is valid 362 when programmed to lb and invalid when programmed to 0b.
  • Bit 334-2 of register 332-0 can be a read/write bit and can indicate that the content of the AI registers is to be cleared 363 when programmed to lb.
  • Bit 334-1 of register 332-0 can be a read only bit and can indicate that the AI accelerator is in use 363 and performing AI operations when programmed to lb.
  • Bit 334-0 of register 332-0 can be a write only bit and can indicate that the memory device is to exit 365 AI mode when programmed to lb.
  • Registers 332-1, 332-2, 332-3, 332-4, and 332-5 can define the size of inputs used in AI operations, the number of inputs used in AI operations, and the start address and end address of the inputs used in AI operations.
  • Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-1 and 332-2 can define the size of the inputs 366 used in AI operations.
  • the size of the inputs can indicate the width of the inputs in terms of number of bits and/or the type of input, such as floating point, integer, and/or double, among other types.
  • Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-3 and 332-4 can indicate the number of inputs 367 used in AI operations.
  • Bits 334-4, 334-5, 334-6, and 334-7 of register 332-5 can indicate a start address 368 of the blocks in memory arrays of the inputs used in AI operations.
  • Bits 334-0, 334-1, 334-2, and 334-3 of register 332-5 can indicate an end address 369 of the blocks in memory arrays of the inputs used in AI operations. If the start address 368 and the end address 369 is the same address, only one block of input is indicated for the AI operations.
  • Registers 332-7, 332-8, 332-9, 332-10, and 332-11 can define the size of outputs of AI operations, the number of outputs in AI operations, and the start address and end address of the outputs of AI operations.
  • Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-7 and 332-8 can define the size 370 of the outputs used in AI operations.
  • the size of the outputs can indicate the width of the outputs in terms of number of bits and/or the type of output, such as floating point, integer, and/or double, among other types.
  • Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-9 and 332-10 can indicate the number of outputs 371 used in AI operations.
  • Bits 334-4, 334-5, 334-6, and 334-7 of register 332-11 can indicate a start address
  • Bits 334-0, 334-1, 334-2, and 334-3 of register 332-11 can indicate an end address
  • Register 332-12 can be used to enable the usage of the input banks, the neuron banks, the output banks, the bias banks, the activation functions, and the temporary banks used during AI operations.
  • Bit 334-0 of register 332-12 can enable the input banks 380
  • bit 334-1 of register 332-12 can enable the neural network banks 379
  • bit 334-2 of register 332-12 can enable the output banks 378
  • bit 334-3 of register 332-12 can enable the bias banks 377
  • bit 334-4 of register 332-12 can enable the activation function banks 376
  • bit 334-5 and 334-6 of register 332-12 can enable a first temporary 375 banks and a second temporary bank 374.
  • bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-13 and 332-14 can define the number of rows 381 in a matrix used in AI operations.
  • Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-15 and 332-16 can define the number of columns 382 in a matrix used in AI operations.
  • Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-17 and 332-18 can define the size of the neurons 383 used in AI operations.
  • the size of the neurons can indicate the width of the neurons in terms of number of bits and/or the type of input, such as floating point, integer, and/or double, among other types.
  • Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334- 5, 334-6, and 334-7 of registers 332-19, 332-20, and 322-21 can indicate the number of neurons 384 of the neural network used in AI operations.
  • Bits 334-4, 334-5, 334-6, and 334-7 of register 332-22 can indicate a start address 385 of the blocks in memory arrays of the neurons used in AI operations.
  • Bits 334-0, 334- 1, 334-2, and 334-3 of register 332-5 can indicate an end address 386 of the blocks in memory arrays of the neurons used in AI operations. If the start address 385 and the end address 386 is the same address, only one block of neurons is indicated for the AI operations.
  • Bits 334-0, 334-1, 334-2, 334-3, 334- 4, 334-5, 334-6, and 334-7 of registers 332-23, 332-24, and 322-25 can indicate the number of layers 387 of the neural network used in AI operations.
  • Register 332-26 can enable a debug/hold mode of the AI accelerator and an output to be observed at a layer of AI operations.
  • Bit 334-0 of register 332-26 can indicate that the AI accelerator is in a debug/hold mode and that an activation function should be applied 391 during AI operations.
  • Bit 334-1 of register 332-26 can indicate that the AI operation can step forward 390 (e.g., perform a next step in an AI operation) in AI operations.
  • Bit 334-2 and bit 334-3 of register 232-26 can indicate that the temporary blocks, where the output of the layer is located, is valid 388 and 389.
  • the data in the temporary blocks can be changed by a host and/or a controller on the memory device, such that the changed data can be used in the AI operation as the AI operation steps forward.
  • Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-27, 332-28, and 332-29 can define the layer where the debug/hold mode will stop 392 the AI operation and observe the output of the layer.
  • Registers 332-30, 332-31, 332-32, and 332-33 can define the size of temporary banks used in AI operations and the start address and end address of the temporary banks used in AI operations.
  • Bits 334-4, 334-5, 334-6, and 334-7 of register 332-30 can define the start address 393 of a first temporary bank used in AI operations.
  • Bits 334-0, 334-1, 334-2, and 334-3 of register 332- 30 can define the end address 394 of a first temporary bank used in AI operations.
  • Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-31 and 332-32 can define the size 395 of the temporary banks used in AI operations.
  • the size of the temporary banks can indicate the width of the temporary banks in terms of number of bits and/or the type of input, such as floating point, integer, and/or double, among other types.
  • Bits 334-4, 334-5, 334-6, and 334-7 of register 332-33 can define the start address 396 of a second temporary bank used in AI operations.
  • Bits 334-0, 334-1, 334-2, and 334-3 of register 332-34 can define the end address 397 of a second temporary bank used in AI operations.
  • Registers 332-34, 332-35, 332-36, 332-37, 332-38, and 332-39 can be associated with the activation functions used in AI operations.
  • Bit 334-0 of register 332-34 can enable usage of the activation function block 3101.
  • Bit 334-1 of register 332-34 can enable holding that AI at a neuron 3100 and usage of the activation function for each neuron.
  • Bit 334-2 of register 332-34 can enable holding the AI at a layer 399 and the usage of the activation function for each layer.
  • Bit 334-3 of register 332-34 can enable usage of an external activation function 398.
  • Bits 334-4, 334-5, 334-6, and 334-7 of register 332-35 can define the start address 3102 of activation function banks used in AI operations.
  • Bits 334-0, 334-1, 334-2, and 334-3 of register 332-35 can define the end address 3103 of activation functions banks used in AI operations.
  • Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-36 and 332-37 can define the resolution of the inputs (e.g., x-axis) 3104 of the activation functions.
  • Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-38 and 332-39 can define the resolution and/or the outputs (e.g., y-axis)
  • Registers 332-40, 332-41, 332-42, 332-43, and 332-44 can define the size of bias values used in AI operations, the number of bias values used in AI operations, and the start address and end address of the bias values used in AI operations.
  • Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-40 and 332-41 can define the size of the bias values 3106 used in AI operations.
  • the size of the bias values can indicate the width of the bias values in terms of number of bits and/or the type of bias values, such as floating point, integer, and/or double, among other types.
  • Bits 334-0, 334-1, 334-2, 334- 3, 334-4, 334-5, 334-6, and 334-7 of registers 332-42 and 332-43 can indicate the number of bias values 3107 used in AI operations.
  • Bits 334-4, 334-5, 334-6, and 334-7 of register 332-44 can indicate a start address 3108 of the blocks in memory arrays of the bias values used in AI operations.
  • Bits 334-0, 334-1, 334- 2, and 334-3 of register 332-44 can indicate an end address 3109 of the blocks in memory arrays of the bias values used in AI operations. If the start address 3108 and the end address 3109 is the same address, only one block of bias values is indicated for the AI operations.
  • Register 332-45 can provide status information for the AI calculations and provide information for the debug/hold mode.
  • Bit 334-0 of register 332-45 can activate the debug/hold mode 3114.
  • Bit 334-1 of register can indicate that the AI accelerator is busy 3113 and performing AI operations.
  • Bit 334-2 of register 332-45 can indicate that the AI accelerator is on 3112 and/or that the full capability of the AI accelerator should be used.
  • Bit 334-3 of register 332-45 can indicate only matrix calculations 3111 of the AI operations should be made.
  • Bit. 334-4 of register 332-45 can indicate that the AI operation can step forward 3110 and proceed to the next neuron and/or layer.
  • Register 332-46 can provide error information regarding AI operations. Bit 334-3 of register 332-46 can indicate that there was an error in a sequence 3115 of an AI operation. Bit 334-2 of register 332-46 can indicate that there was an error in an algorithm 3116 of an AI operation. Bit 334-1 of register 332-46 can indicate there was an error in a page of data that ECC was not able to correct 3117. Bit 334-0 of register 332-46 can indicate there was an error in a page of data that ECC was able to correct 3118.
  • Register 332-47 can indicate an activation function to use in AI operations.
  • Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, and 334-6 of register 332-47 can indicate one of a number of pre-defme activation functions 3120 can be used in AI operations.
  • Bit 334-7 of register 332-47 can indicate a custom activation function 3119 located in a block can be used in AI operations.
  • Registers 332-48, 332-49, and 332-50 can indicate the neuron and/or layer where the AI operation is executing.
  • Bits 334-0, 334-1, 334-2, 334- 3, 334-4, 334-5, 334-6, and 334-7 of registers 332-48, 332-49, and 332-50 can indicate the address of the neuron and/or layer where the AI operation is executing.
  • registers 332-48, 332-49, and 332-50 can indicate the neuron and/or layer where an error occurred.
  • FIG 4 is a block diagram of a number of blocks of a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure.
  • Input block 440 is a block in the memory arrays where input data is stored. Data in input block 440 can be used as the input for AI operations. The address of input block 440 can be indicated in register 5 (e.g. register 232-5 in Figure 2 and 332-5 in Figure 3A). Embodiments are not limited to one input block as there can be a plurality of input blocks.
  • Data input block 440 can be sent to the memory device from the host. The data can accompany a command indicated that AI operations should be performed on the memory device using the data.
  • Output block 420 is a block in the memory arrays where output data from AI operations is stored. Data in output block 442 can be used store the output from AI operations and sent to the host. The address of output block 442 can be indicated in register 11 (e.g. register 232-11 in Figure 2 and 332-11 in Figure 3A). Embodiments are not limited to one output block as there can be a plurality of output blocks.
  • Data in output block 442 can be sent to host upon completion and/or holding of an AI operation.
  • Temporary blocks 444-1 and 444-2 can be blocks in memory arrays where data is stored temporarily while AI operations are being performed. Data can be stored in temporary blocks 444-1 and 444-2 while the AI operations are iterating through the neuron and layers of the neural network used for the AI operations.
  • the address of temporary block 448 can be indicated in registers 30 and 33 (e.g. registers 232-30 and 232-33 in Figure 2 and 332-30 and 332-33 in Figure 3B). Embodiments are not limited to two temporary blocks as there can be a plurality of temporary blocks.
  • Activation function block 446 is a block in the memory arrays where the activations functions for the AI operations are stored. Activation function block 446 can store pre-defmed activation functions and/or custom activation functions that are created by the host and/or AI accelerator.
  • the address of activation function block 448 can be indicated in register 35 (e.g. register 232-35 in Figure 2 and 332-35 in Figure 3B). Embodiments are not limited to one activation function block as there can be a plurality of activation function blocks.
  • Bias values block 448 is a block in the memory array where the bias values for the AI operations are stored.
  • the address of bias values block 448 can be indicated in register 44 (e.g. register 232-44 in Figure 2 and 332-44 in Figure 3B).
  • Embodiments are not limited to one bias value block as there can be a plurality of bias value blocks.
  • 450-7, 450-8, 450-9, and 450-10 are a block in the memory array where the neural network for the AI operations are stored.
  • Neural network blocks 450-1, 450-2, 450-3, 450-4, 450-5, 450-6, 450-7, 450-8, 450-9, and 450-10 can store the information for the neurons and layers that are used in the AI operations.
  • the address of neural network blocks 450-1, 450-2, 450-3, 450-4, 450-5, 450-6, 450- 7, 450-8, 450-9, and 450-10 can be indicated in register 22 (e.g. register 232-22 in Figure 2 and 332-22 in Figure 3A).
  • FIG. 5 is a flow diagram illustrating an example artificial intelligence process in a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure.
  • an AI accelerator can write input data 540 and neural network data 550 to the input and neural network block, respectively.
  • the AI accelerator can perform AI operations using input data 540 and neural network data 550.
  • the results can be stored in temporary banks 544-1 and 544-2.
  • the temporary banks 544-1 and 544-2 can be used to store data while performing matrix calculations, adding bias data, and/or to applying activation functions during the AI operations.
  • An AI accelerator can receive the partial results of AI operations stored in temporary banks 544-1 and 544-2 and bias value data 548 and perform AI operations using the partial results of AI operations bias value data 548.
  • the results can be stored in temporary banks 544-1 and 544-2.
  • An AI accelerator can receive the partial results of AI operations stored in temporary banks 544-1 and 544-2 and activation function data 546 and perform AI operations using the partial results of AI operations and activation function data 546.
  • the results can be stored in output banks 542.
  • Figure 6 is a flow diagram illustrating an example method to transfer data in accordance with a number of embodiments of the present disclosure.
  • the method described in Figure 6 can be performed by, for example, a memory system including a memory device such as memory device 120 shown in Figures 1A and IB.
  • the method can include executing a first portion of a training or inference operation on a first memory device that is configured as part of a neural network, wherein the first portion of the training or inference operation comprises combining a first input or a first weight, or both, represented as one or more data values stored within the first memory device with another input or another weight, or both, represented as other data stored within the first memory device or received from another memory device.
  • the method can include executing a first portion of an artificial intelligence (AI) operation on a first memory device.
  • AI artificial intelligence
  • the method can include transferring, from the first memory device to a second memory device, data that is based at least in part on the inputs or weights combined at the first memory device.
  • the method can include transferring data from the first memory device to a second memory device.
  • the first memory device can transfer an output block to an input block of the second memory device.
  • the host and/or controller can format the data for storage on the second memory device and use in AI operations.
  • the method can include executing a second portion of the training or inference operation on the second memory device using the data transferred from the first memory device to the second memory device, wherein the second portion of the training or inference operation comprises combining a second input or a second weight, or both, represented as one or more data values stored within the second memory device with an additional input or an additional weight, or both, represented as additional data stored within the second memory device or received from an additional memory device.
  • the method can include executing a second portion of the AI operation on the second memory device using the data transferred from the first memory device to the second memory device.
  • the method can include transferring data between memory devices that are coupled together. For example, when the density of the neural network is too large to be stored on a single memory device, the input, output, and/or temporary blocks can be transferred between memory devices to execute the AI operations of the neural network. The temporary and/or output block from a memory device can be transferred to another memory device so that an AI operation can continue. Data can be transferred between memory devices such that the memory devices can perform portions of an AI operations, such that a first memory device can perform a first portion of the AI operation on a layer and a second memory device can continuing performing a second portion of the AI operation on the same layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Neurology (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Memory System (AREA)

Abstract

The present disclosure includes apparatuses and methods related to transferring data in a memory system with an artificial intelligence (AI) mode. An apparatus can receive a command indicating that the apparatus operate in an artificial intelligence (AI) mode, a command to perform AI operations using an AI accelerator based on a status of a number of registers, and a command to transfer data between memory devices that are performing an AI operation. The memory system can transfer output data of a layer and/or neuron of an AI operation from a first memory device to a second memory device; and the second memory device can use the output data transferred to the second memory device as input data for a subsequent layer and/or neuron of the AI operation.

Description

TRANSFER DATA IN A MEMORY SYSTEM WITH ARTIFICIAL INTELLIGENCE MODE
Technical Field
[0001] The present disclosure relates generally to memory devices, and more particularly, to apparatuses and methods for transferring data in a memory system with an artificial intelligence (AI) mode.
Background
[0002] Memory devices are typically provided as internal, semiconductor, integrated circuits in computers or other electronic devices. There are many different types of memory including volatile and non-volatile memory. Volatile memory can require power to maintain its data and includes random-access memory (RAM), dynamic random access memory (DRAM), and synchronous dynamic random access memory (SDRAM), among others. Non volatile memory can provide persistent data by retaining stored data when not powered and can include NAND flash memory, NOR flash memory, read only memory (ROM), Electrically Erasable Programmable ROM (EEPROM), Erasable Programmable ROM (EPROM), and resistance variable memory such as phase change random access memory (PCRAM), resistive random access memory (RRAM), and magnetoresistive random access memory (MRAM), among others.
[0003] Memory is also utilized as volatile and non-volatile data storage for a wide range of electronic applications. Non-volatile memory may be used in, for example, personal computers, portable memory sticks, digital cameras, cellular telephones, portable music players such as MP3 players, movie players, and other electronic devices. Memory cells can be arranged into arrays, with the arrays being used in memory devices.
Brief Description of the Drawings
[0004] Figure 1 A is a block diagram of an apparatus in the form of a computing system including a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure. [0005] Figure IB is a block diagram of an apparatus in the form of a computing system including a memory system with memory devices having an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure.
[0006] Figure 2 is a block diagram of a number of registers on a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure.
[0007] Figures 3A and 3B are block diagrams of a number of bits in a number of registers on a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure.
[0008] Figure 4 is a block diagram of a number of blocks of a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure
[0009] Figure 5 is a flow diagram illustrating an example artificial intelligence process in a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure.
[0010] Figure 6 is a flow diagram illustrating an example method to transfer data in accordance with a number of embodiments of the present disclosure.
Detailed Description
[0011] The present disclosure includes apparatuses and methods related to transferring data in a memory system with an artificial intelligence (AI) mode. An example apparatus can include receive a command indicating that the apparatus operate in an artificial intelligence (AI) mode, a command to perform AI operations using an AI accelerator based on a status of a number of registers, and a command to transfer data between memory devices that are performing an AI operation. The AI accelerator can include hardware, software, and or firmware that is configured to perform operations (e.g., logic operations, among other operations) associated with AI operations. The hardware can include circuitry configured as an adder and/or multiplier to perform operations, such as logic operations, associated with AI operations. [0012] A memory device can include data stored in the arrays of memory cells that is used by the AI accelerator to perform AI operations. Input data, along with data that defines the neural network, such neuron data, activation function data, and/or bias value data can be stored in the memory devices, transferred between memory devices, and used to perform AI operations. Also, the memory device can include temporary block to store partial results of the AI operations and output blocks to store the results of the AI operations. The host can issue a read command for the output block and the results in the output blocks can be sent to a host to complete performance of a command requesting that an AI operation be performed.
[0013] The host and/or a controller of a memory system can issue a command to transfer input and/or output data between memory devices performing AI operations. For example, the memory system can transfer output data of a layer and/or neuron of an AI operation from a first memory device to a second memory device; and the second memory device can use the output data transferred to the second memory device as input data for a subsequent layer and/or neuron of the AI operation. The first memory device and the second memory device performing the AI operation can include the same or different neural network data, activation function data, and/or bias data; and neural network data, activation function data, and/or bias data can be transferred between memory devices. The results of the AI operation can be reported to a controller and/or host.
[0014] Each memory device of a memory system can send input data and neuron data to the AI accelerator and the AI accelerator can perform AI operations on the input data and neuron data. The memory device can store the results of the AI operations in temporary blocks on the memory device. The memory device can send the results from the temporary blocks and apply bias value data to the AI accelerator. The AI accelerator can perform AI operations on the results from the temporary blocks using the bias value data. The memory device can store the results of the AI operations in temporary blocks on the memory device. The memory device can send the results from the temporary blocks and activation function data to the AI accelerator. The AI accelerator can perform AI operations on the results from the temporary blocks and/or the activation function data. The memory device can store the results of the AI operations in output blocks on the memory device.
[0015] The AI accelerator can reduce latency and power consumption associated with AI operations when compared to AI operations that are performed on a host. AI operations performed on a host use data that is exchanged between a memory device and the host, which adds latency and power consumption to the AI operations. While AI operations performed according to embodiments of the present disclosure can be performed on a memory device using the AI accelerator and the memory arrays, where data is not transferred from the memory device while performing the AI operations. [0016] In the following detailed description of the present disclosure, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration how a number of embodiments of the disclosure may be practiced. These embodiments are described in sufficient detail to enable those of ordinary skill in the art to practice the embodiments of this disclosure, and it is to be understood that other embodiments may be utilized and that process, electrical, and/or structural changes may be made without departing from the scope of the present disclosure. As used herein, the designator “N” indicates that a number of the particular feature so designated can be included with a number of embodiments of the present disclosure.
[0017] As used herein, “a number of’ something can refer to one or more of such things. For example, a number of memory devices can refer to one or more of memory devices. Additionally, designators such as “N”, as used herein, particularly with respect to reference numerals in the drawings, indicates that a number of the particular feature so designated can be included with a number of embodiments of the present disclosure.
[0018] The figures herein follow a numbering convention in which the first digit or digits correspond to the drawing figure number and the remaining digits identify an element or component in the drawing. Similar elements or components between different figures may be identified by the use of similar digits. As will be appreciated, elements shown in the various embodiments herein can be added, exchanged, and/or eliminated so as to provide a number of additional embodiments of the present disclosure. In addition, the proportion and the relative scale of the elements provided in the figures are intended to illustrate various embodiments of the present disclosure and are not to be used in a limiting sense.
[0019] Figure 1 A is a block diagram of an apparatus in the form of a computing system 100 including a memory device 120 in accordance with a number of embodiments of the present disclosure. As used herein, a memory device 120, memory arrays 125-1,... 125-N, memory controller 122, and/or AI accelerator 124 might also be separately considered an “apparatus.”
[0020] As illustrated in Figure 1A, host 102 can be coupled to the memory device 120. Host 102 can be a laptop computer, personal computers, digital camera, digital recording and playback device, mobile telephone, PDA, memory card reader, interface hub, among other host systems, and can include a memory access device, e.g., a processor. One of ordinary skill in the art will appreciate that “a processor” can intend one or more processors, such as a parallel processing system, a number of coprocessors, etc.
[0021] Host 102 includes a host controller 108 to communicate with memory device 120. The host controller 108 can send commands to the memory device 120. The host controller 108 can communicate with the memory device 120, memory controller 122 on memory device 120, and/or the AI accelerator 124 on memory device 120 to perform AI operations, read data, write data, and/or erase data, among other operations. AI operations may include machine learning or neural network operations, which may include training operations or inference operations, or both. In some example, each memory device 120 may represent a layer within a neural network or deep neural network (e.g., a network having three or more hidden layers). Or each memory device 120 may be or include nodes of a neural network, and a layer of the neural network may be composed of multiple memory devices or portions of several memory devices 120. Memory devices 120 may store weights (or models) for AI operations in memory arrays 125.
[0022] A physical host interface can provide an interface for passing control, address, data, and other signals between memory device 120 and host 102 having compatible receptors for the physical host interface. The signals can be communicated between host 102 and memory device 120 on a number of buses, such as a data bus and/or an address bus, for example. [0023] Memory device 120 can include controller 120, AI accelerator
124, and memory arrays 125-1,... ,125-N. Memory device 120 can be a low- power double data rate dynamic random access memory, such as a LPDDR5 device, and/or a graphics double data rate dynamic random access memory, such as a GDDR6 device, among other types of devices. Memory arrays 125- 1,... ,125-N can include a number of memory cells, such as volatile memory cells (e.g., DRAM memory cells, among other types of volatile memory cells) and/or non-volatile memory cells (e.g., RRAM memory cells, among other types of non-volatile memory cells). Memory device 120 can read and/or write data to memory arrays 125-1,... ,125-N. Memory arrays 125-1,... ,125-N can store data that is used during AI operations performed on memory device 120. Memory arrays 125-1,... ,125-N can store inputs, outputs, weight matrix and bias information of a neural network, and/or activation functions information used by the AI accelerator to perform AI operations on memory device 120.
[0024] The host controller 108, memory controller 122, and/or AI accelerator 124 on memory device 120 can include control circuitry, e.g., hardware, firmware, and/or software. In one or more embodiments, the host controller 108, memory controller 122, and/or AI accelerator 124 can be an application specific integrated circuit (ASIC) coupled to a printed circuit board including a physical interface. Also, memory controller 122 on memory device 120 can include registers 130. Registers 130 can be programmed to provide information for the AI accelerator to perform AI operations. Registers 130 can include any number of registers. Registers 130 can be written to and/or read by host 102, memory controller 122, and/or AI accelerator 124. Registers 130 can provide input, output, neural network, and/or activation functions information for AI accelerator 124. Registers 130 can include mode register 131 to select a mode of operation for memory device 120. The AI mode of operation can be selected by writing a word to register 131, such as OxAA and/or 0x2 AA, for example, which inhibits access to the registers associated with normal operation of memory device 120 and allows access to the registers associated with AI operations. Also, the AI mode of operation can be selected using a signature that uses a crypto algorithm that is authenticated by a key stored in the memory device 120. Registers 130 can also be located in memory arrays 125-1,... , 125- N and be accessible by controller 122. [0025] AI accelerator 124 can include hardware 126 and/or software/firmware 128 to perform AI operations. Hardware 126 can include adder/multiplier 126 to perform logic operations associated with AI operations. Memory controller 122 and/or AI accelerator 124 can received commands from host 102 to perform AI operations. Memory device 120 can perform the AI operations requested in the commands from host 102 using the AI accelerator
124, data in memory arrays 125-1,... ,125-N, and information in registers 130. The memory device can report back information, such as results and/or error information, for example, of the AI operations to host 120. The AI operations performed by AI accelerator 124 can be performed without use of an external processing resource.
[0026] The memory arrays 125-1,... ,125-N can provide main memory for the memory system or could be used as additional memory or storage throughout the memory system. Each memory array 125-1,... ,125-N can include a number of blocks of memory cells. The blocks of memory cells can be used to store data that is used during AI operations performed by memory device 120. Memory arrays 125-1,... ,125-N can include DRAM memory cells, for example. Embodiments are not limited to a particular type of memory device. For instance, the memory device can include RAM, ROM, DRAM, SDRAM, PCRAM, RRAM, 3D XPoint, and flash memory, among others.
[0027] By way of example, memory device 120 may perform an AI operation that is or includes one or more inference steps. Memory arrays 125 may be layers of a neural network or may each be individual nodes and memory device 120 may be layer; or memory device 120 may be a node within a larger network. Additionally or alternatively, memory arrays 125 may store data or weights, or both, to be used (e.g., summed) within a node. Each node (e.g., memory array 125) may combine an input from data read from cells of the same or a different memory array 125 with weights read from cells of memory array
125. Combinations of weights and data may, for instance, be summed within the periphery of a memory array 125 or within hardware 126 using adder/multiplier 127. In such cases, the summed result may be passed to an activation function represented or instantiated in the periphery of a memory array 125 or within hardware 126. The result may be passed to another memory device 120 or may be used within AI accelerator 124 (e.g., by software/firmware 128) to make a decision or to train a network that includes memory device 120.
[0028] A network that employs memory device 120 may be capable of or used for supervised or unsupervised learning. This may be combined with other learning or training regimes. In some cases, a trained network or model is imported or used with memory device 120, and memory device’s 120 operations are primarily or exclusively related to inference.
[0029] The embodiment of Figure 1A can include additional circuitry that is not illustrated so as not to obscure embodiments of the present disclosure. For example, memory device 120 can include address circuitry to latch address signals provided over I/O connections through I/O circuitry. Address signals can be received and decoded by a row decoder and a column decoder to access the memory arrays 125-1,... ,125-N. It will be appreciated by those skilled in the art that the number of address input connections can depend on the density and architecture of the memory arrays 125-1,... ,125-N.
[0030] Figure IB is a block diagram of an apparatus in the form of a computing system including a memory system with memory devices having an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure. As used herein, a memory devices 120- 1, 120-2, 120-3, and 120-X, controller 10, and/or memory system 104 might also be separately considered an “apparatus.”
[0031] As illustrated in Figure IB, host 102 can be coupled to the memory system 104. Host 102 can be a laptop computer, personal computers, digital camera, digital recording and playback device, mobile telephone, PDA, memory card reader, interface hub, among other host systems, and can include a memory access device, e.g., a processor. One of ordinary skill in the art will appreciate that “a processor” can intend one or more processors, such as a parallel processing system, a number of coprocessors, etc.
[0032] Host 102 includes a host controller 108 to communicate with memory system 104. The host controller 108 can send commands to the memory system 104. The memory system 104 can include controller 104 and memory devices 120-1, 120-2, 120-3, and 120-X. Memory device 120-1, 120-2, 120-3, and 120-X can be the memory device 120 described above in association with Figure 1A and include an AI accelerator with hardware, software, and/or firmware to perform AI operations. The host controller 108 can communicate with controller 105 and/or memory devices 120-1, 120-2, 120-3, and 120-X to perform AI operations, read data, write data, and/or erase data, among other operations. A physical host interface can provide an interface for passing control, address, data, and other signals between memory system 104 and host 102 having compatible receptors for the physical host interface. The signals can be communicated between host 102 and memory system 104 on a number of buses, such as a data bus and/or an address bus, for example.
[0033] Memory system 104 can include controller 105 coupled to memory devices 120-1, 120-2, 120-3, and 120-X via bus 121. Bus 121 can be configured such that the full bandwidth of bus 121 can be consumed when operation a portion or all of the memory devices of a memory system. For example, two memory devices of the four memory device 120-1, 120-2, 120-3, and 120-X shown in Figure IB can be configured to operate while using the full bandwidth of bus 121. For example, controller 105 can send a command on select line 117 that can select memory devices 120-1 and 120-3 for operation during a particular time period, such as at the same time. Controller 105 can send a command on select line 119 that can select memory device 120-2 and 120-X for operation during a particular time period, such as at the same time. In a number of embodiments, controller 105 can be configured to send commands on select lines 117 and 119 to select any combination of the memory devices 120-1, 120-2, 120-3, and 120-X.
[0034] In a number of embodiments, a command on select line 117 can be used to select memory devices 120-1 and 120-3 and a command on select line 119 can be used to select memory devices 120-2 and 120-X. The selected memory device can be used during performance of AI operations. Data associated with the AI operation can be copied and/or transferred between the selected memory devices 120-1, 120-2, 120-3, and 120-X on bus 121. For example, a first portion of an AI operation can be performed on memory device 120-1 and an output of the first portion of the AI operation can be transferred to memory device 120-3 on bus 121. The output from a particular layer and/or neuron of an AI operation on a first memory device can be transferred to a second memory device; and the second memory device can continue the AI operation using the transferred data in the next layer and/or neuron of the AI operation. The output of the first portion of the AI operation on memory device 120-1 can be used by memory device 120-3 as an input of a second portion of the AI operation. Also, neural network data, activation function data and/or bias data associated with an AI operation can be transferred between memory devices 120-1, 120-2, 120-3, and 120-X on bus 121.
[0035] Figure 2 is a block diagram of a number of registers on a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure. Registers 230 can be AI registers and include input information, output information, neural network information, and/or activation functions information, among other types of information, for use by an AI accelerator, a controller, and/or memory arrays of a memory device (e.g., AI accelerator 124, memory controller 122, and/or memory arrays 125-1,... , 125-N in Figure 1). Registers can be read and/or writen to based on commands from a host, an AI accelerator, and/or a controller (e.g., host 102, AI accelerator 124, memory controller 122 in Figure 1).
[0036] Register 232-0 can define parameters associated with AI mode of the memory device. Bits in register 232-0 can start AI operations, restart AI operations, indicate content in registers is valid, clear content from registers, and/or exit from AI mode.
[0037] Registers 232-1, 232-2, 232-3, 232-4, and 232-5 can define the size of inputs used in AI operations, the number of inputs used in AI operations, and the start address and end address of the inputs used in AI operations. Registers 232-7, 232-8, 232-9, 232-10, and 232-11 can define the size of outputs of AI operations, the number of outputs in AI operations, and the start address and end address of the outputs of AI operations.
[0038] Register 232-12 can be used to enable the usage of the input banks, the neuron banks, the output banks, the bias banks, the activation functions, and the temporary banks used during AI operations.
[0039] Registers 232-13, 232-14, 232-15, 232-16, 232-17, 232-18, 232-
19, 232-20, 232-21, 232-22, 232-23, 232-24, and 232-25 can be used to define the neural network used during AI operations. Registers 232-13, 232-14, 232- 15, 232-16, 232-17, 232-18, 232-19, 232-20, 232-21, 232-22, 232-23, 232-24, and 232-25 can define the size, number, and location of neurons and/or layers of the neural network used during AI operations. [0040] Register 232-26 can enable a debug/hold mode of the AI accelerator and output to be observed at a layer of AI operations. Register 232- 26 can indicate that an activation should be applied during AI operations and that the AI operation can step forward (e.g., perform a next step in an AI operation) in AI operations. Register 232-26 can indicate that the temporary blocks, where the output of the layer is located, is valid. The data in the temporary blocks can be changed by a host and/or a controller on the memory device, such that the changed data can be used in the AI operation as the AI operation steps forward. Registers 232-27, 232-28, and 232-29 can define the layer where the debug/hold mode will stop the AI operation, change the content of the neural network, and/or observe the output of the layer.
[0041] Registers 232-30, 232-31, 232-32, and 232-33 can define the size of temporary banks used in AI operations and the start address and end address of the temporary banks used in AI operations. Register 232-30 can define the start address and end address of a first temporary bank used in AI operations and register 232-33 can define the start address and end address of a first temporary bank used in AI operations. Registers 232-31, and 232-32 can define the size of the temporary banks used in AI operations.
[0042] Registers 232-34, 232-35, 232-36, 232-37, 232-38, and 232-39 can be associated with the activation functions used in AI operations. Register 232-34 can enable usage of the activation function block, enable usage of the activation function for each neuron, the activation function for each layer, and enables usage of an external activation function. Registers 232-35 can define the start address and the end address of the location of the activation functions. Registers 232-36, 232-37, 232-38, and 232-39 can define the resolution of the inputs (e.g., x-axis) and outputs (e.g., y-axis) of the activation functions and/or a custom defined activation function.
[0043] Registers 232-40, 232-41, 232-42, 232-43, and 232-44 can define the size of bias values used in AI operations, the number of bias values used in AI operations, and the start address and end address of the bias values used in AI operations.
[0044] Register 232-45 can provide status information for the AI calculations and provide information for the debug/hold mode. Register 232-45 can enable debug/hold mode, indicate that the AI accelerator is performing AI operations, indicate that the full capability of the AI accelerator should be used, indicate only matrix calculations of the AI operations should be made, and/or indicate that the AI operation can proceed to the next neuron and/or layer.
[0045] Register 232-46 can provide error information regarding AI operations. Register 232-46 can indicate that there was an error in a sequence of an AI operation, that there was an error in an algorithm of an AI operations, that there was an error in a page of data that ECC was not able to correct, and/or that there was an error in a page of data that ECC was able to correct.
[0046] Register 232-47 can indicate an activation function to use in AI operations. Register 232-47 can indicated one of a number of pre-defme activation function can be used in AI operations and/or a custom activation function located in a block can be used in AI operations.
[0047] Registers 232-48, 232-49, and 232-50 can indicate the neuron and/or layer where the AI operation is executing. In the case where errors occur during the AI operations, registers 232-48, 232-49, and 232-50 the neuron and/or layer where an error occurred.
[0048] Figures 3A and 3B are block diagrams of a number of bits in a number of registers on a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure. Each register 332-0,... , 332-50 can include a number of bits, bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7, to indicate information associated with performing AI operations.
[0049] Register 332-0 can define parameters associated with AI mode of the memory device. Bit 334-5 of register 332-0 can be a read/write bit and can indicate that an elaboration of an AI operation can restart 360 at the beginning when programmed to lb. Bit 334-5 of register 332-0 can be reset to 0b once the AI operation has restarted. Bit 334-4 of register 332-0 can be a read/write bit and can indicate that an elaboration of an AI operation can start 361 when programmed to lb. Bit 334-4 of register 332-0 can be reset to 0b once the AI operation has started.
[0050] Bit 334-3 of register 332-0 can be a read/write bit and can indicate that the content of the AI registers is valid 362 when programmed to lb and invalid when programmed to 0b. Bit 334-2 of register 332-0 can be a read/write bit and can indicate that the content of the AI registers is to be cleared 363 when programmed to lb. Bit 334-1 of register 332-0 can be a read only bit and can indicate that the AI accelerator is in use 363 and performing AI operations when programmed to lb. Bit 334-0 of register 332-0 can be a write only bit and can indicate that the memory device is to exit 365 AI mode when programmed to lb.
[0051] Registers 332-1, 332-2, 332-3, 332-4, and 332-5 can define the size of inputs used in AI operations, the number of inputs used in AI operations, and the start address and end address of the inputs used in AI operations. Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-1 and 332-2 can define the size of the inputs 366 used in AI operations. The size of the inputs can indicate the width of the inputs in terms of number of bits and/or the type of input, such as floating point, integer, and/or double, among other types. Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-3 and 332-4 can indicate the number of inputs 367 used in AI operations. Bits 334-4, 334-5, 334-6, and 334-7 of register 332-5 can indicate a start address 368 of the blocks in memory arrays of the inputs used in AI operations. Bits 334-0, 334-1, 334-2, and 334-3 of register 332-5 can indicate an end address 369 of the blocks in memory arrays of the inputs used in AI operations. If the start address 368 and the end address 369 is the same address, only one block of input is indicated for the AI operations.
[0052] Registers 332-7, 332-8, 332-9, 332-10, and 332-11 can define the size of outputs of AI operations, the number of outputs in AI operations, and the start address and end address of the outputs of AI operations. Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-7 and 332-8 can define the size 370 of the outputs used in AI operations. The size of the outputs can indicate the width of the outputs in terms of number of bits and/or the type of output, such as floating point, integer, and/or double, among other types. Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-9 and 332-10 can indicate the number of outputs 371 used in AI operations. Bits 334-4, 334-5, 334-6, and 334-7 of register 332-11 can indicate a start address
372 of the blocks in memory arrays of the outputs used in AI operations. Bits 334-0, 334-1, 334-2, and 334-3 of register 332-11 can indicate an end address
373 of the blocks in memory arrays of the outputs used in AI operations. If the start address 372 and the end address 373 is the same address, only one block of output is indicated for the AI operations.
[0053] Register 332-12 can be used to enable the usage of the input banks, the neuron banks, the output banks, the bias banks, the activation functions, and the temporary banks used during AI operations. Bit 334-0 of register 332-12 can enable the input banks 380, bit 334-1 of register 332-12 can enable the neural network banks 379, bit 334-2 of register 332-12 can enable the output banks 378, bit 334-3 of register 332-12 can enable the bias banks 377, bit 334-4 of register 332-12 can enable the activation function banks 376, and bit 334-5 and 334-6 of register 332-12 can enable a first temporary 375 banks and a second temporary bank 374.
[0054] Registers 332-13, 332-14, 332-15, 332-16, 332-17, 332-18, 332-
19, 332-20, 332-21, 332-22, 332-23, 332-24, and 332-25 can be used to define the neural network used during AI operations. Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-13 and 332-14 can define the number of rows 381 in a matrix used in AI operations. Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-15 and 332-16 can define the number of columns 382 in a matrix used in AI operations.
[0055] Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-17 and 332-18 can define the size of the neurons 383 used in AI operations. The size of the neurons can indicate the width of the neurons in terms of number of bits and/or the type of input, such as floating point, integer, and/or double, among other types. Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334- 5, 334-6, and 334-7 of registers 332-19, 332-20, and 322-21 can indicate the number of neurons 384 of the neural network used in AI operations. Bits 334-4, 334-5, 334-6, and 334-7 of register 332-22 can indicate a start address 385 of the blocks in memory arrays of the neurons used in AI operations. Bits 334-0, 334- 1, 334-2, and 334-3 of register 332-5 can indicate an end address 386 of the blocks in memory arrays of the neurons used in AI operations. If the start address 385 and the end address 386 is the same address, only one block of neurons is indicated for the AI operations. Bits 334-0, 334-1, 334-2, 334-3, 334- 4, 334-5, 334-6, and 334-7 of registers 332-23, 332-24, and 322-25 can indicate the number of layers 387 of the neural network used in AI operations. [0056] Register 332-26 can enable a debug/hold mode of the AI accelerator and an output to be observed at a layer of AI operations. Bit 334-0 of register 332-26 can indicate that the AI accelerator is in a debug/hold mode and that an activation function should be applied 391 during AI operations. Bit 334-1 of register 332-26 can indicate that the AI operation can step forward 390 (e.g., perform a next step in an AI operation) in AI operations. Bit 334-2 and bit 334-3 of register 232-26 can indicate that the temporary blocks, where the output of the layer is located, is valid 388 and 389. The data in the temporary blocks can be changed by a host and/or a controller on the memory device, such that the changed data can be used in the AI operation as the AI operation steps forward. [0057] Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-27, 332-28, and 332-29 can define the layer where the debug/hold mode will stop 392 the AI operation and observe the output of the layer.
[0058] Registers 332-30, 332-31, 332-32, and 332-33 can define the size of temporary banks used in AI operations and the start address and end address of the temporary banks used in AI operations. Bits 334-4, 334-5, 334-6, and 334-7 of register 332-30 can define the start address 393 of a first temporary bank used in AI operations. Bits 334-0, 334-1, 334-2, and 334-3 of register 332- 30 can define the end address 394 of a first temporary bank used in AI operations. Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-31 and 332-32 can define the size 395 of the temporary banks used in AI operations. The size of the temporary banks can indicate the width of the temporary banks in terms of number of bits and/or the type of input, such as floating point, integer, and/or double, among other types. Bits 334-4, 334-5, 334-6, and 334-7 of register 332-33 can define the start address 396 of a second temporary bank used in AI operations. Bits 334-0, 334-1, 334-2, and 334-3 of register 332-34 can define the end address 397 of a second temporary bank used in AI operations.
[0059] Registers 332-34, 332-35, 332-36, 332-37, 332-38, and 332-39 can be associated with the activation functions used in AI operations. Bit 334-0 of register 332-34 can enable usage of the activation function block 3101. Bit 334-1 of register 332-34 can enable holding that AI at a neuron 3100 and usage of the activation function for each neuron. Bit 334-2 of register 332-34 can enable holding the AI at a layer 399 and the usage of the activation function for each layer. Bit 334-3 of register 332-34 can enable usage of an external activation function 398.
[0060] Bits 334-4, 334-5, 334-6, and 334-7 of register 332-35 can define the start address 3102 of activation function banks used in AI operations. Bits 334-0, 334-1, 334-2, and 334-3 of register 332-35 can define the end address 3103 of activation functions banks used in AI operations. Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-36 and 332-37 can define the resolution of the inputs (e.g., x-axis) 3104 of the activation functions. Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-38 and 332-39 can define the resolution and/or the outputs (e.g., y-axis)
3105 of the activation functions for a given x-axis value of a custom activation function.
[0061] Registers 332-40, 332-41, 332-42, 332-43, and 332-44 can define the size of bias values used in AI operations, the number of bias values used in AI operations, and the start address and end address of the bias values used in AI operations. Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, 334-6, and 334-7 of registers 332-40 and 332-41 can define the size of the bias values 3106 used in AI operations. The size of the bias values can indicate the width of the bias values in terms of number of bits and/or the type of bias values, such as floating point, integer, and/or double, among other types. Bits 334-0, 334-1, 334-2, 334- 3, 334-4, 334-5, 334-6, and 334-7 of registers 332-42 and 332-43 can indicate the number of bias values 3107 used in AI operations. Bits 334-4, 334-5, 334-6, and 334-7 of register 332-44 can indicate a start address 3108 of the blocks in memory arrays of the bias values used in AI operations. Bits 334-0, 334-1, 334- 2, and 334-3 of register 332-44 can indicate an end address 3109 of the blocks in memory arrays of the bias values used in AI operations. If the start address 3108 and the end address 3109 is the same address, only one block of bias values is indicated for the AI operations.
[0062] Register 332-45 can provide status information for the AI calculations and provide information for the debug/hold mode. Bit 334-0 of register 332-45 can activate the debug/hold mode 3114. Bit 334-1 of register can indicate that the AI accelerator is busy 3113 and performing AI operations. Bit 334-2 of register 332-45 can indicate that the AI accelerator is on 3112 and/or that the full capability of the AI accelerator should be used. Bit 334-3 of register 332-45 can indicate only matrix calculations 3111 of the AI operations should be made. Bit. 334-4 of register 332-45 can indicate that the AI operation can step forward 3110 and proceed to the next neuron and/or layer.
[0063] Register 332-46 can provide error information regarding AI operations. Bit 334-3 of register 332-46 can indicate that there was an error in a sequence 3115 of an AI operation. Bit 334-2 of register 332-46 can indicate that there was an error in an algorithm 3116 of an AI operation. Bit 334-1 of register 332-46 can indicate there was an error in a page of data that ECC was not able to correct 3117. Bit 334-0 of register 332-46 can indicate there was an error in a page of data that ECC was able to correct 3118.
[0064] Register 332-47 can indicate an activation function to use in AI operations. Bits 334-0, 334-1, 334-2, 334-3, 334-4, 334-5, and 334-6 of register 332-47 can indicate one of a number of pre-defme activation functions 3120 can be used in AI operations. Bit 334-7 of register 332-47 can indicate a custom activation function 3119 located in a block can be used in AI operations.
[0065] Registers 332-48, 332-49, and 332-50 can indicate the neuron and/or layer where the AI operation is executing. Bits 334-0, 334-1, 334-2, 334- 3, 334-4, 334-5, 334-6, and 334-7 of registers 332-48, 332-49, and 332-50 can indicate the address of the neuron and/or layer where the AI operation is executing. In the case where errors occur during the AI operations, registers 332-48, 332-49, and 332-50 can indicate the neuron and/or layer where an error occurred.
[0066] Figure 4 is a block diagram of a number of blocks of a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure. Input block 440 is a block in the memory arrays where input data is stored. Data in input block 440 can be used as the input for AI operations. The address of input block 440 can be indicated in register 5 (e.g. register 232-5 in Figure 2 and 332-5 in Figure 3A). Embodiments are not limited to one input block as there can be a plurality of input blocks. Data input block 440 can be sent to the memory device from the host. The data can accompany a command indicated that AI operations should be performed on the memory device using the data. [0067] Output block 420 is a block in the memory arrays where output data from AI operations is stored. Data in output block 442 can be used store the output from AI operations and sent to the host. The address of output block 442 can be indicated in register 11 (e.g. register 232-11 in Figure 2 and 332-11 in Figure 3A). Embodiments are not limited to one output block as there can be a plurality of output blocks.
[0068] Data in output block 442 can be sent to host upon completion and/or holding of an AI operation. Temporary blocks 444-1 and 444-2 can be blocks in memory arrays where data is stored temporarily while AI operations are being performed. Data can be stored in temporary blocks 444-1 and 444-2 while the AI operations are iterating through the neuron and layers of the neural network used for the AI operations. The address of temporary block 448 can be indicated in registers 30 and 33 (e.g. registers 232-30 and 232-33 in Figure 2 and 332-30 and 332-33 in Figure 3B). Embodiments are not limited to two temporary blocks as there can be a plurality of temporary blocks.
[0069] Activation function block 446 is a block in the memory arrays where the activations functions for the AI operations are stored. Activation function block 446 can store pre-defmed activation functions and/or custom activation functions that are created by the host and/or AI accelerator. The address of activation function block 448 can be indicated in register 35 (e.g. register 232-35 in Figure 2 and 332-35 in Figure 3B). Embodiments are not limited to one activation function block as there can be a plurality of activation function blocks.
[0070] Bias values block 448 is a block in the memory array where the bias values for the AI operations are stored. The address of bias values block 448 can be indicated in register 44 (e.g. register 232-44 in Figure 2 and 332-44 in Figure 3B). Embodiments are not limited to one bias value block as there can be a plurality of bias value blocks.
[0071] Neural network blocks 450-1, 450-2, 450-3, 450-4, 450-5, 450-6,
450-7, 450-8, 450-9, and 450-10 are a block in the memory array where the neural network for the AI operations are stored. Neural network blocks 450-1, 450-2, 450-3, 450-4, 450-5, 450-6, 450-7, 450-8, 450-9, and 450-10 can store the information for the neurons and layers that are used in the AI operations. The address of neural network blocks 450-1, 450-2, 450-3, 450-4, 450-5, 450-6, 450- 7, 450-8, 450-9, and 450-10 can be indicated in register 22 (e.g. register 232-22 in Figure 2 and 332-22 in Figure 3A).
[0072] Figure 5 is a flow diagram illustrating an example artificial intelligence process in a memory device with an artificial intelligence (AI) accelerator in accordance with a number of embodiments of the present disclosure. In response to staring an AI operation, an AI accelerator can write input data 540 and neural network data 550 to the input and neural network block, respectively. The AI accelerator can perform AI operations using input data 540 and neural network data 550. The results can be stored in temporary banks 544-1 and 544-2. The temporary banks 544-1 and 544-2 can be used to store data while performing matrix calculations, adding bias data, and/or to applying activation functions during the AI operations.
[0073] An AI accelerator can receive the partial results of AI operations stored in temporary banks 544-1 and 544-2 and bias value data 548 and perform AI operations using the partial results of AI operations bias value data 548. The results can be stored in temporary banks 544-1 and 544-2.
[0074] An AI accelerator can receive the partial results of AI operations stored in temporary banks 544-1 and 544-2 and activation function data 546 and perform AI operations using the partial results of AI operations and activation function data 546. The results can be stored in output banks 542.
[0075] Figure 6 is a flow diagram illustrating an example method to transfer data in accordance with a number of embodiments of the present disclosure. The method described in Figure 6 can be performed by, for example, a memory system including a memory device such as memory device 120 shown in Figures 1A and IB.
[0076] At block 6150, the method can include executing a first portion of a training or inference operation on a first memory device that is configured as part of a neural network, wherein the first portion of the training or inference operation comprises combining a first input or a first weight, or both, represented as one or more data values stored within the first memory device with another input or another weight, or both, represented as other data stored within the first memory device or received from another memory device. The method can include executing a first portion of an artificial intelligence (AI) operation on a first memory device. [0077] At block 6152, the method can include transferring, from the first memory device to a second memory device, data that is based at least in part on the inputs or weights combined at the first memory device. The method can include transferring data from the first memory device to a second memory device. For example, the first memory device can transfer an output block to an input block of the second memory device. The host and/or controller can format the data for storage on the second memory device and use in AI operations. [0078] At block 6154, the method can include executing a second portion of the training or inference operation on the second memory device using the data transferred from the first memory device to the second memory device, wherein the second portion of the training or inference operation comprises combining a second input or a second weight, or both, represented as one or more data values stored within the second memory device with an additional input or an additional weight, or both, represented as additional data stored within the second memory device or received from an additional memory device. The method can include executing a second portion of the AI operation on the second memory device using the data transferred from the first memory device to the second memory device. The method can include transferring data between memory devices that are coupled together. For example, when the density of the neural network is too large to be stored on a single memory device, the input, output, and/or temporary blocks can be transferred between memory devices to execute the AI operations of the neural network. The temporary and/or output block from a memory device can be transferred to another memory device so that an AI operation can continue. Data can be transferred between memory devices such that the memory devices can perform portions of an AI operations, such that a first memory device can perform a first portion of the AI operation on a layer and a second memory device can continuing performing a second portion of the AI operation on the same layer. [0079] Although specific embodiments have been illustrated and described herein, those of ordinary skill in the art will appreciate that an arrangement calculated to achieve the same results can be substituted for the specific embodiments shown. This disclosure is intended to cover adaptations or variations of various embodiments of the present disclosure. It is to be understood that the above description has been made in an illustrative fashion, and not a restrictive one. Combination of the above embodiments, and other embodiments not specifically described herein will be apparent to those of skill in the art upon reviewing the above description. The scope of the various embodiments of the present disclosure includes other applications in which the above structures and methods are used. Therefore, the scope of various embodiments of the present disclosure should be determined with reference to the appended claims, along with the full range of equivalents to which such claims are entitled.
[0080] In the foregoing Detailed Description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the disclosed embodiments of the present disclosure have to use more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment.

Claims

What is claimed is:
1. An apparatus, comprising: a controller; and a number of memory devices coupled to the controller, wherein each of the number of memory devices are configured as part of a neural network and include a number of memory arrays, and wherein the number of memory devices are configured to: store an input or a weight associated with the neural network, wherein the input or the weight are represented as data values stored in the number of memory devices; execute a training or inference operation on a first memory device; transfer data from the first memory device to a second memory device; and continue to execute the training or inference operation on the second memory device using the data transferred from the first memory device to the second memory device.
2. The apparatus of claim 1, wherein the data transferred from the first memory device to the second memory device is an output of the training or inference operation executed on the first memory device.
3. The apparatus of claim 1, wherein the data transferred from the first memory device to the second memory device is an input of the training or inference operation executed on the second memory device.
4. The apparatus of any one of claims 1-3, wherein the first memory device and the second memory device are selected by the controller to transfer the data on a bus shared by the number of memory devices.
5. The apparatus of any one of claims 1-3, wherein a command enables the first and second memory devices to enter an artificial intelligence (AI) mode to perform the training or inference operation.
6. The apparatus of any one of claims 1-3, wherein the first memory device is configured to transfer the data to the second memory device in response to the first memory device completing a first portion of the training or inference operation.
7. The apparatus of any one of claims 1-3, wherein the second memory device is configured complete the training or inference operation in response to receiving the data from the first memory device.
8. A system, comprising: a controller; and a number of memory devices coupled to the controller, wherein each of the number of memory devices are configured as part of a neural network and include a number of memory arrays and wherein the number of memory devices are configured to: execute a first portion of a training or inference operation on a first memory device wherein the first portion of the training or inference operation comprises combining a first input or a first weight, or both, represented as one or more data vales stored within the first memory device with another input or another weight, or both, represented as other data stored within the first memory device or received from another memory device; transfer an output of the first portion of the training or inference operation from the first memory device to a second memory device; store the output of the first portion of the training or inference operation in the second memory device represented as one or more data values; and execute a second portion of the training or inference operation on the second memory device using the output of the first portion of the AI operation as an input of the second portion of the training or inference operation.
9. The system of claim 8, wherein the memory devices are configured to execute a third portion of the training or inference operation on the first memory device.
10. The system of claim 9, wherein the third portion of the training or inference operation is executed while the second portion of the training or inference operation is executed.
11. The system of claim 8, wherein the memory devices are configured to transfer an output of the second portion of the training or inference operation from the second memory device to the first memory device.
12. The system of claim 11, wherein the memory devices are configured to execute a third portion of the training or inference operation on the first memory device using the output of the second portion of the training or inference operation as an input of the third portion of the training or inference operation.
13. The system of claim 8, wherein the memory devices are configured to transfer neural network data from the first memory device to the second memory device.
14. The system of claim 8, wherein the memory devices are configured to transfer activation function data from the first memory device to the second memory device.
15. A method, comprising: executing a first portion of a training or inference operation on a first memory device that is configured as part of a neural network, wherein the first portion of the training or inference operation comprises combining a first input or a first weight, or both, represented as one or more data values stored within the first memory device with another input or another weight, or both, represented as other data stored within the first memory device or received from another memory device; transferring, from the first memory device to a second memory device, data that is based at least in part on the inputs or weights combined at the first memory device; and executing a second portion of the training or inference operation on the second memory device using the data transferred from the first memory device to the second memory device, wherein the second portion of the training or inference operation comprises combining a second input or a second weight, or both, represented as one or more data values stored within the second memory device with an additional input or an additional weight, or both, represented as additional data stored within the second memory device or received from an additional memory device.
16. The method claim 15, wherein transferring the data from the first memory device to the second memory device includes transferring an output of training or inference operation.
17. The method claim 15, wherein executing the second portion of the training or inference operation includes using the data transferred from the first memory device to the second memory device as an input for the second portion of the training or inference operation.
18. The method claim of any one of claims 15-17, further including transferring an output of the second portion of the training or inference operation to the controller.
19. The method claim of any one of claims 15-17, wherein transferring data from the first memory device to the second memory device includes transferring neural network data for the training or inference operation.
20. The method claim of any one of claims 15-17, wherein transferring data from the first memory device to the second memory device includes transferring activation function data for the training or inference operation.
PCT/US2020/048160 2019-08-29 2020-08-27 Transfer data in a memory system with artificial intelligence mode WO2021041644A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020227009913A KR20220052358A (en) 2019-08-29 2020-08-27 Data transfer in memory system with artificial intelligence mode
CN202080060027.3A CN114303136A (en) 2019-08-29 2020-08-27 Transferring data in a memory system having an artificial intelligence mode
EP20859442.4A EP4022525A4 (en) 2019-08-29 2020-08-27 Transfer data in a memory system with artificial intelligence mode

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16/554,981 2019-08-29
US16/554,981 US20210064971A1 (en) 2019-08-29 2019-08-29 Transfer data in a memory system with artificial intelligence mode

Publications (1)

Publication Number Publication Date
WO2021041644A1 true WO2021041644A1 (en) 2021-03-04

Family

ID=74679834

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/048160 WO2021041644A1 (en) 2019-08-29 2020-08-27 Transfer data in a memory system with artificial intelligence mode

Country Status (5)

Country Link
US (1) US20210064971A1 (en)
EP (1) EP4022525A4 (en)
KR (1) KR20220052358A (en)
CN (1) CN114303136A (en)
WO (1) WO2021041644A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160358661A1 (en) 2012-02-23 2016-12-08 Micron Technology, Inc. Methods of operating memory
US20180218257A1 (en) * 2017-01-27 2018-08-02 Hewlett Packard Enterprise Development Lp Memory side acceleration for deep learning parameter updates
US20190056885A1 (en) 2018-10-15 2019-02-21 Amrita MATHURIYA Low synch dedicated accelerator with in-memory computation capability
US20190073259A1 (en) * 2017-09-06 2019-03-07 Western Digital Technologies, Inc. Storage of neural networks
US20190146788A1 (en) 2017-11-15 2019-05-16 Samsung Electronics Co., Ltd. Memory device performing parallel arithmetic processing and memory module including the same
US20190251034A1 (en) 2019-04-26 2019-08-15 Intel Corporation Architectural enhancements for computing systems having artificial intelligence logic disposed locally to memory

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109196528B (en) * 2016-05-17 2022-03-18 硅存储技术公司 Deep learning neural network classifier using non-volatile memory array
US20170344283A1 (en) * 2016-05-27 2017-11-30 Intel Corporation Data access between computing nodes
US10915791B2 (en) * 2017-12-27 2021-02-09 Intel Corporation Storing and retrieving training data for models in a data center
US11373088B2 (en) * 2017-12-30 2022-06-28 Intel Corporation Machine learning accelerator mechanism
US11775799B2 (en) * 2018-08-02 2023-10-03 Advanced Micro Devices, Inc. Runtime extension for neural network training with heterogeneous memory
US20200193282A1 (en) * 2018-12-17 2020-06-18 Spin Transfer Technologies System and Method for Training Artificial Neural Networks
CN109783157B (en) * 2018-12-29 2020-11-24 深圳云天励飞技术有限公司 Method and related device for loading algorithm program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160358661A1 (en) 2012-02-23 2016-12-08 Micron Technology, Inc. Methods of operating memory
US20180218257A1 (en) * 2017-01-27 2018-08-02 Hewlett Packard Enterprise Development Lp Memory side acceleration for deep learning parameter updates
US20190073259A1 (en) * 2017-09-06 2019-03-07 Western Digital Technologies, Inc. Storage of neural networks
US20190146788A1 (en) 2017-11-15 2019-05-16 Samsung Electronics Co., Ltd. Memory device performing parallel arithmetic processing and memory module including the same
US20190056885A1 (en) 2018-10-15 2019-02-21 Amrita MATHURIYA Low synch dedicated accelerator with in-memory computation capability
US20190251034A1 (en) 2019-04-26 2019-08-15 Intel Corporation Architectural enhancements for computing systems having artificial intelligence logic disposed locally to memory

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LI BING ET AL., RERAM-BASED ACCELERATOR FOR DEEP LEARNING
See also references of EP4022525A4
SONG LINGHAO ET AL., PIPELAYER: A PIPELINED RERAMBASED ACCELERATOR FOR DEEP LEARNING

Also Published As

Publication number Publication date
CN114303136A (en) 2022-04-08
EP4022525A1 (en) 2022-07-06
KR20220052358A (en) 2022-04-27
EP4022525A4 (en) 2023-08-23
US20210064971A1 (en) 2021-03-04

Similar Documents

Publication Publication Date Title
CN114341768B (en) Operation mode register
US11854661B2 (en) Copy data in a memory system with artificial intelligence mode
CN114341981B (en) Memory with artificial intelligence mode
CN114286977B (en) Artificial intelligence accelerator
US20230015438A1 (en) Debug operations on artificial intelligence operations
EP4022523A1 (en) Activation functions for artificial intelligence operations
WO2021041644A1 (en) Transfer data in a memory system with artificial intelligence mode

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20859442

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20227009913

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020859442

Country of ref document: EP

Effective date: 20220329