US10929058B2 - Enhanced memory device architecture for machine learning - Google Patents

Enhanced memory device architecture for machine learning Download PDF

Info

Publication number
US10929058B2
US10929058B2 US16/363,661 US201916363661A US10929058B2 US 10929058 B2 US10929058 B2 US 10929058B2 US 201916363661 A US201916363661 A US 201916363661A US 10929058 B2 US10929058 B2 US 10929058B2
Authority
US
United States
Prior art keywords
neural network
data
memory
volatile memory
computations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US16/363,661
Other versions
US20200310674A1 (en
Inventor
Luiz M. Franca-Neto
Viacheslav DUBEYKO
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Western Digital Technologies Inc
Original Assignee
Western Digital Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US16/363,661 priority Critical patent/US10929058B2/en
Application filed by Western Digital Technologies Inc filed Critical Western Digital Technologies Inc
Assigned to WESTERN DIGITAL TECHNOLOGIES, INC. reassignment WESTERN DIGITAL TECHNOLOGIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FRANCA-NETO, LUIZ M., DUBEYKO, VIACHESLAV
Priority to CN201911218064.8A priority patent/CN111738430B/en
Assigned to JPMORGAN CHASE BANK, N.A., AS AGENT reassignment JPMORGAN CHASE BANK, N.A., AS AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: WESTERN DIGITAL TECHNOLOGIES, INC.
Publication of US20200310674A1 publication Critical patent/US20200310674A1/en
Priority to US17/143,001 priority patent/US11372577B2/en
Publication of US10929058B2 publication Critical patent/US10929058B2/en
Application granted granted Critical
Assigned to WESTERN DIGITAL TECHNOLOGIES, INC. reassignment WESTERN DIGITAL TECHNOLOGIES, INC. RELEASE OF SECURITY INTEREST AT REEL 052915 FRAME 0566 Assignors: JPMORGAN CHASE BANK, N.A.
Assigned to JPMORGAN CHASE BANK, N.A. reassignment JPMORGAN CHASE BANK, N.A. PATENT COLLATERAL AGREEMENT - A&R LOAN AGREEMENT Assignors: WESTERN DIGITAL TECHNOLOGIES, INC.
Assigned to JPMORGAN CHASE BANK, N.A. reassignment JPMORGAN CHASE BANK, N.A. PATENT COLLATERAL AGREEMENT - DDTL LOAN AGREEMENT Assignors: WESTERN DIGITAL TECHNOLOGIES, INC.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C14/00Digital stores characterised by arrangements of cells having volatile and non-volatile storage properties for back-up when the power is down
    • G11C14/0009Digital stores characterised by arrangements of cells having volatile and non-volatile storage properties for back-up when the power is down in which the volatile element is a DRAM cell
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/54Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using elements simulating biological cells, e.g. neuron
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C14/00Digital stores characterised by arrangements of cells having volatile and non-volatile storage properties for back-up when the power is down
    • G11C14/0009Digital stores characterised by arrangements of cells having volatile and non-volatile storage properties for back-up when the power is down in which the volatile element is a DRAM cell
    • G11C14/0018Digital stores characterised by arrangements of cells having volatile and non-volatile storage properties for back-up when the power is down in which the volatile element is a DRAM cell whereby the nonvolatile element is an EEPROM element, e.g. a floating gate or metal-nitride-oxide-silicon [MNOS] transistor
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C14/00Digital stores characterised by arrangements of cells having volatile and non-volatile storage properties for back-up when the power is down
    • G11C14/0009Digital stores characterised by arrangements of cells having volatile and non-volatile storage properties for back-up when the power is down in which the volatile element is a DRAM cell
    • G11C14/0036Digital stores characterised by arrangements of cells having volatile and non-volatile storage properties for back-up when the power is down in which the volatile element is a DRAM cell and the nonvolatile element is a magnetic RAM [MRAM] element or ferromagnetic cell
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C14/00Digital stores characterised by arrangements of cells having volatile and non-volatile storage properties for back-up when the power is down
    • G11C14/0009Digital stores characterised by arrangements of cells having volatile and non-volatile storage properties for back-up when the power is down in which the volatile element is a DRAM cell
    • G11C14/0045Digital stores characterised by arrangements of cells having volatile and non-volatile storage properties for back-up when the power is down in which the volatile element is a DRAM cell and the nonvolatile element is a resistive RAM element, i.e. programmable resistors, e.g. formed of phase change or chalcogenide material

Definitions

  • the present disclosure relates to memory device architecture, and more particularly, to data processing inside the memory device via improving machine learning.
  • Machine learning techniques such as neural networks
  • These technologies can operate on large data sets and thus can require large amounts of storage space.
  • current memory architectures do not allow for scalability of big data analysis. The present disclosure addresses these and other problems.
  • FIGS. 1A and 1B are examples of persistent data transferred between DRAM and persistent storage according to the prior art.
  • FIG. 2 is an example of analyzing data through artificial intelligence models according to the prior art.
  • FIG. 3 is an example of a non-volatile memory for central processing unit (CPU) and data processing unit (DPU) operations according to some embodiments.
  • CPU central processing unit
  • DPU data processing unit
  • FIG. 4 is an example system illustrating communication between a CPU and smart memory according to some embodiments.
  • FIG. 5 is an example system for processing data in a neural network stored in non-volatile memory according to some embodiments.
  • FIG. 6 is an example of data processing in layers of a neural network stored in non-volatile memory according to some embodiments.
  • FIG. 7A is an example of repurposing the non-volatile memory for multiple neural networks according to some embodiments.
  • FIG. 7B is an example of a process for repurposing the non-volatile memory for multiple neural networks according to some embodiments.
  • FIG. 8 is an example of multiple neural networks configured in non-volatile memory according to some embodiments.
  • FIG. 9A illustrates an example of the CPU and controller architecture according to some embodiments.
  • FIG. 9B illustrates an example of a process for performing one or more neural network operations according to some embodiments.
  • FIG. 10 illustrates an example of the CPU delegating data processing to the neural network according to some embodiments.
  • a memory device configured to perform neural network computations, the device comprising: a volatile memory; a non-volatile memory configured to store one or more layers of a neural network; and a controller configured to: store data in at least one of the volatile memory or the non-volatile memory and retrieve data from at least one of the volatile memory or the non-volatile memory in response to at least one data transfer command received from a host system; perform neural network computations in the non-volatile memory by applying one or more neural network layers to input data received from the host system; and store a result of the neural network computations in the volatile memory for retrieval by the host system.
  • the input data can be stored in the volatile memory.
  • the controller can be further configured to perform neural network computations for a plurality of neural networks and use a result of neural network computations for a first neural network as input data for a successive neural network.
  • the controller can be further configured to reconfigure the first neural network as the successive neural network before inputting the data into the successive network.
  • the controller can be a sole controller of the memory device.
  • the controller can be further configured to provide the result of the neural network computations to the host system asynchronously.
  • provision of the result asynchronously can comprise at least one of polling a state of memory pages in the non-volatile memory or issuing an interrupt.
  • polling can comprise periodic polling of the state of memory pages.
  • the result of the neural network computations can be configured to be retrieved synchronously.
  • the memory device can be further configured to receive a request to initiate neural network computations, the request comprising neural network configuration parameters and input data for neural network computations.
  • the request to initiate neural network computations can comprise a type of data processing
  • the controller can be further configured to identify neural network configuration parameters based on the type of data processing.
  • Various embodiments of this disclosure provide a method of performing neural network computations in a memory device, the method comprising: by a controller of the memory device: storing data in at least one of the volatile memory or the non-volatile memory and retrieve data from at least one of the volatile memory or the non-volatile memory in response to at least one data transfer command received from a host system; performing neural network computations in the non-volatile memory by applying one or more neural network layers to input data received from the host system; and storing a result of the neural network computations in the volatile memory for retrieval by the host system.
  • the method of the preceding paragraph or any paragraphs herein can include setting a locked state of the data before inputting the data into the neural network, and setting an unlocked state of the data after making the output of the neural network available, wherein the locked state can prevent changing the data.
  • the method of the preceding paragraph or any paragraphs herein can include configuring the neural network configured to perform the data processing function on the data based on at least one of a number of nodes or a type of activation function.
  • the method of the preceding paragraph or any paragraphs herein can include inputting the data into the neural network by initiating back propagation on the neural network, and output of the neural network can include an adjusted weighting for one or more nodes of the neural network.
  • a data storage device configured to perform neural network computations
  • the data storage device comprising a volatile memory, non-volatile memory, and a sole controller configured to: store data in at least one of the volatile memory or the non-volatile memory and retrieve data from at least one of the volatile memory or the non-volatile memory in response to at least one data transfer command received from a host system; perform neural network computations in the non-volatile memory by applying one or more neural network layers to input data received from the host system and stored in the volatile memory; and store a result of the neural network computations in the volatile memory for retrieval by the host system.
  • the request to initiate neural network computations can comprise a type of data processing
  • the controller can be further configured to identify neural network configuration parameters based on the type of data processing.
  • the neural network may not be directly accessible by a processor of the host system.
  • the request to perform the data processing function can comprise neural network configuration parameters and input data for the neural network computations
  • the controller can be further configured to define the one or more neural network layers based on the neural network configuration parameters.
  • the request to perform the data processing function can comprise a type of data processing
  • the controller can be further configured to identify neural network configuration parameters based on the type of data processing and define the one or more neural network layers based on the neural network configuration parameters.
  • NVM non-volatile memory
  • MRAM magnetic random-access memory
  • ReRAM resistive random-access memory
  • NRAM nantero random-access memory
  • DRAM dynamic random-access memory
  • FIGS. 1A and 1B are examples 100 and 150 of persistent data transferred between DRAM and persistent storage.
  • the host 102 can include a CPU 104 and DRAM 106 .
  • the interface circuitry for the DRAM 106 communicates with the interface circuitry for the persistent storage, such as the solid state drive (SSD) 108 A or a hybrid SSD 108 B, for each data that has to be processed.
  • the SSD 108 A can include a NAND flash memory 110 A.
  • the hybrid SSD 108 B can include a NAND flash memory 110 A and a non-volatile memory (NVM) 110 B.
  • NVM non-volatile memory
  • FIG. 2 is an example 200 of analyzing data through artificial intelligence models.
  • a host can request analysis of data.
  • the data can be inputted into an artificial intelligence model 204 , the data 206 can be processed via the artificial intelligence model, and the data 208 outputted. Then, the user 210 can receive the outputted data.
  • the memory device is typically waiting on receiving the output data and can be wasting time 212 and resources that could have otherwise been used to perform other operations.
  • each CPU can be dedicated to a subset of data such as modifying the subset of data, resulting in an inconsistent state of data across the CPUs.
  • increasing size of the DRAM also comes with inefficiencies, such as an increase in power consumption.
  • the CPU may not be able to address a DRAM over a certain size, and thus the DRAM is not scalable.
  • FIG. 3 is an example 300 of a non-volatile memory for CPU and DPU operations.
  • a storage device can include a central processing unit (CPU) core 302 , a data processing unit (DPU) core 304 , a non-volatile memory 306 , and passive storage 308 .
  • the systolic flow engine is described in more detail in patent application titled “Systolic Neural Network Engine Capable of Forward Propagation” (U.S. patent application Ser. No. 15/981,624, filed on May 16, 2018), and in patent application titled “Reconfigurable Systolic Neural Network Engine” (U.S. patent application Ser. No. 16/233,968, filed on Dec. 27, 2018), the disclosures of each of which is hereby incorporated by reference in its entirety.
  • non-volatile memory can enable scalability for large data processing and reduce power requirements over DRAM.
  • introducing non-volatile memory can create new issues.
  • the number of CPU cores cannot simply increase because of inefficiencies created in the task scheduler.
  • the activity of the task scheduler by assignment of time slices for threads execution is increased.
  • the number of context switches are increased as well.
  • the task scheduler does not need to manage the shared CPU cores.
  • cache coherence where the data from DRAM is copied into a CPU's L1/L2 cache for data processing, the same portion of data into L1/L2 cache for every CPU core. But if one core modifies the data, then the DRAM contains an inconsistent state of data.
  • disclosed embodiments solve at least these problems.
  • FIG. 4 is an example system 400 illustrating communication between a processor or controller, such as a CPU, and an improved or enhanced (sometimes referred to as “smart”) memory or memory device according to some embodiments.
  • the smart memory device 406 can include a neural network, such as a systolic flow engine, implemented by the non-volatile memory 408 , which as described herein can include one or more processors or controllers 410 , and a volatile memory, such as DRAM 404 , or a non-volatile storage class memory such as MRAM or ReRAM.
  • DRAM dynamic random access memory
  • ReRAM non-volatile storage class memory
  • the various disclosed embodiments are not so limited to the DRAM implementation and can include or apply to any volatile or non-volatile memory used in the same manner by the CPU or other processing unit in the architecture.
  • the DRAM 404 can communicate with an external CPU 402 , such as CPU of a host system. Such communication can be performed via a suitable interface (not shown).
  • the smart memory device 406 as the combination of DRAM 404 and non-volatile memory 408 based neural network in a single chip and/or device, can synthesize the CPU-based approach and neural network approach to reduce and/or eliminate the drawbacks mentioned herein.
  • Such combination is able to provide the opportunity to access the data in the DRAM space by a CPU, and also enables the CPU to delegate the execution of specialized processing to the neural network implemented in the non-volatile memory in a faster and more efficient way than by a general purpose CPU.
  • the CPU can initiate data processing in the neural network, and thereafter continue other CPU functions.
  • the CPU can offload the data processing to one or more neural network engines in the memory, thereby relieving the CPU of its resources to perform other tasks.
  • the CPU can be the gateway for data processing, ensuring data consistencies.
  • the smart memory device concept may be able to boost the overall system performance.
  • the improved memory architecture of the smart memory device can transfer data from the storage device into a smart memory device, and thus, the smart memory device can process data internally.
  • data processing on the smart memory device can be scalable, with the ability to process large amounts of data.
  • the smart memory device 406 can store one or more layers of a neural network, as described herein.
  • FIG. 5 is an example system 500 for processing data in a neural network implemented in non-volatile memory according to some embodiments.
  • the stored layer in DRAM 504 (partitioned into memory units Xi to XD) can receive input data from the CPU 502 .
  • the DRAM can input such data into a neural network implemented in the non-volatile memory 506 .
  • the neural network can process the input data through its layers and output data that is stored back in the DRAM 504 .
  • the steps and/or functions described below can be performed by the CPU 502 and/or a controller within the smart memory.
  • the non-volatile memory can configure and/or reconfigure one or more neural networks, and/or store preconfigured neural networks.
  • the non-volatile memory can configure a neural network based on certain received parameters, such as a number of nodes, layers, weights, a desired inference operation, and/or the like.
  • the CPU 502 (and/or a controller) can communicate with the DRAM 504 without knowledge of the underlying data processing via the neural network in the non-volatile memory.
  • the CPU 502 can use the DRAM 504 to perform a particular operation on a set of data.
  • the CPU 502 can determine whether to perform the operation internally or to send the data to the non-volatile memory to process the data.
  • the particular operation can be an inference (or training) operation of a neural network that may require substantial processing.
  • the non-volatile memory 506 can receive the input data from the DRAM 504 , configure the neural network to perform the inference (or training) operation, process the data through the neural network, and send (or store) the output data to the DRAM 504 .
  • the CPU 502 can subsequently retrieve the results of the inference operation from the DRAM 504 .
  • the CPU 502 can offload the execution of the inference operation to a separate non-volatile memory 506 .
  • the non-volatile memory 506 can execute inference operations of the neural network in parallel or substantially in parallel with the other operations being performed in the DRAM 504 .
  • FIG. 6 is an example 600 of data processing in layers of a neural network stored in non-volatile memory according to some embodiments.
  • the neural network can efficiently implement specialized data processing. Artificial neural networks (or connectionist systems or machine learning models) can learn to perform certain tasks based on training data. Moreover, such training can occur without task-specific programming. For example, a neural network can learn to identify images that contain cats by analyzing training data of example images that have been manually labeled as “cat” or “no cat.” The neural network can adjust it's weightings in the nodes to identify cats in other images.
  • the neural network engine used by the disclosed embodiments can be configured to any type of neural network.
  • the neural network engine can define a neural network based on one or more factors, including (1) the number of nodes in one layer, (2) the number of hidden layers, (3) the type of activation function, and/or (4) the matrix of weights for every connection between nodes of layers.
  • the neural network can be defined based on a functionality, and the neural network engine can retrieve a predefined neural network corresponding to the desired functionality.
  • a controller such as the external CPU and/or a controller of the non-volatile memory, can configure the neural network, such as define the type of neural network for processing of the data.
  • the controller can identify the appropriate input data.
  • the input data may include a picture that is sent into a neural network, such as a systolic flow engine, that is trained to identify people in the picture.
  • the systolic flow engine may output an output stream that provides an indication on whether a person was identified in the picture of the input stream.
  • the DRAM 602 can receive and store the input data (e.g. N Bytes of input data) and push the data into the neural network.
  • the non-volatile memory can include the layers of the neural network 604 A, 604 B, . . . 604 N.
  • the output of the neural network can be stored back into the DRAM 602 .
  • an output of one neural network can be fed into an input of another neural network.
  • the DRAM can feed multiple neural networks in non-volatile memory for data processing of multiple functionalities.
  • the CPU can lock the corresponding input data as the input data is pushed into the neural network.
  • the CPU can wait for the neural network to complete its computations before modifying the input data.
  • the CPU can access the data without modification, such as by performing a read operation.
  • the CPU or DRAM's controller can copy the corresponding input data, and push the copy of the data into the neural network.
  • the CPU can modify the original input data while the copy of the data is being processed.
  • the circuitry between the neural network layers can include one or more memory cells to store the outputs of a previous layer as inputs to the next layer.
  • the DRAM 602 can serve as the input layer and/or the output layer for the neural network. In other embodiments, the DRAM 602 can input the data into an input layer of a neural network and/or receive the output of an output layer of a neural network.
  • the non-volatile memory can include all of the layers of the neural network. In other embodiments, the non-volatile memory (e.g. 408 in FIG. 4 ) and the DRAM 602 can each implement a subset of the layers of the neural network.
  • a controller can control the receiving and/or sending of data to and/or from the DRAM 602 .
  • the controller can configure the non-volatile memory for a particular neural network.
  • the controller can facilitate data processing through the neural network stored in the non-volatile memory.
  • data can be back-propagated through the layers of the non-volatile memory for training purposes.
  • training data can be forward propagated through the neural network.
  • the controller can back propagate through each layer by increasing the weight for the nodes that contributed to the desired output and vice versa.
  • FIG. 7A is an example 700 A of repurposing the non-volatile memory for multiple neural networks according to some embodiments.
  • the controller or an external CPU, can set a neural network type for the non-volatile memory 702 .
  • the controller can cause input data from the DRAM to be inputted into the non-volatile memory 702 . This can be accomplished by issuing one or more commands to the smart memory device.
  • the data can be processed through the layers of the neural network 702 A, 702 B, . . . 702 N.
  • the output of the non-volatile memory can be inputted back into the non-volatile memory for processing by a subsequent layer.
  • multiple neural networks can be used to process data in sequence. For example, at step L, result of processing by a particular neural network can be stored in memory such as temporary memory or buffer (which can be part of the DRAM).
  • subsequent neural network can be configured for the non-volatile memory, and at step L+2, the output that was inputted back into the non-volatile memory can be processed through such subsequent neural network.
  • FIG. 7B is an example of a process 700 B for repurposing the non-volatile memory for multiple neural networks according to some embodiments.
  • the process 700 B can be implemented by a system including controller and smart memory device as described herein, such as any of the systems 400 , 500 , or 900 A.
  • the process can define the type of neural network. For example, the process can identify the appropriate neural network for the desired data processing required by a host.
  • the process can store input data in the DRAM and cause the input data to be provided to the neural network stored in the non-volatile memory.
  • the data can be processed by the neural network.
  • the process can receive the output of the neural network.
  • step 710 the process can determine whether another neural network is to further process the data or if the data processing is complete. If data processing is complete, then the process ends at step 722 .
  • the process can define the type of neural network. The process can determine that the same neural network can be rerun and/or a different neural network is needed.
  • step 714 the process can retrieve the stored data from the previous neural network, and in step 716 , can input the saved output data from the previous neural network into the newly configured neural network.
  • step 718 the data can be processed through the neural network.
  • step 720 the process can save the output of the neural network, for example in the DRAM. Then, the process can continue to step 710 , where the process can determine whether another neural network is to further process the data or if the data processing is complete.
  • FIG. 8 is an example 800 of multiple neural networks implemented in non-volatile memory 802 according to some embodiments.
  • a controller can configure multiple neural networks in the non-volatile memory 802 .
  • a first neural network 802 A and a second neural network 802 L can be configured in the non-volatile memory.
  • the output of a first neural network 802 A can be inputted into the next neural network 802 L.
  • the output of the first neural network 802 A may not have to be stored in temporary memory before being inputted into the next neural network 802 L.
  • a temporary non-volatile or volatile memory buffer can be used between neural network layers to temporarily save the result of every layer.
  • a neural network activity can continue even after a sudden power-off.
  • a smart memory device can process neural networks in series, such as the example shown in FIG. 8 , in parallel, and/or a combination thereof.
  • FIG. 9A illustrates an example 900 A of the CPU, communicating with a smart memory device according to some embodiments.
  • the steps and/or functions described below can be performed by the CPU 502 and/or a controller within the smart memory.
  • the CPU 904 transmits data to a memory page (or another memory unit) 906 of the DRAM in step 1 .
  • the CPU 904 determines whether the requested processing on the data involves neural network processing. If not, then the CPU 904 can access and/or modify the data (for example, read and/or write data).
  • the CPU 904 can send configuration parameters of the desired neural network to a non-volatile memory controller 908 .
  • the controller 908 can process the data through the layers 902 A, 902 B, . . . 902 C of the neural network implemented in the non-volatile memory and send the output of the neural network to the memory page 906 of the DRAM (or another area of DRAM) at step 4 .
  • the controller 908 can indicate to the CPU 904 that the neural network operation is complete. This can be performed by setting or activating an interrupt. In other embodiments, the CPU 904 can poll the controller 908 for a status of the neural network operation.
  • FIG. 9B illustrates an example 900 B of a process for performing one or more neural network operations according to some embodiments.
  • the CPU can receive data from a host CPU to store into memory.
  • the CPU can determine whether the request from the host CPU requires neural network computations at step 914 . If not, then at step 916 , the CPU can access and/or process the request directly from memory.
  • the CPU can send characteristics of a neural network to a controller 918 .
  • the controller 918 can determine the corresponding neural network based on the received characteristics, and at step 922 , input the data stored in memory into the neural network.
  • the neural network engine can process the data through the neural network in step 924 .
  • the controller 918 can send the output of the neural network to the DRAM, and at step 928 , the DRAM 910 can store the output data into memory for the CPU to access.
  • the memory device can process the data synchronously, and the CPU can wait for the neural network operations to complete.
  • the CPU can optionally send an end function to stop the processing of data through the neural network during data processing. Otherwise, the CPU can poll the memory device.
  • the CPU does not have to wait for neural network data processing.
  • FIG. 10 illustrates an example of a process 1000 delegating data processing to the non-volatile memory implementing one or more neural networks according to some embodiments.
  • the process 1000 can be implemented by a system including a controller as described herein, such as any of the systems 400 , 500 , or 900 A.
  • the task scheduler 1002 (which can be implemented by a processor or controller) can select a process for execution, and the CPU can execute the selected process in the allocated time slice.
  • the task scheduler 1002 can manage the delegation of tasks, such as by assigning a time slice to the CPU 1004 to perform a certain task, where the CPU activity is split between time slices.
  • the CPU 1004 can initiate data processing in step 3 by sending the request to a controller 1012 of a smart memory device.
  • the controller 1012 can configure a neural network 1008 to perform the neural network operation(s), receive the input data from memory 1006 (such as DRAM), process the data through the neural network, and send the output data to DRAM memory page, as described herein.
  • the CPU 1004 can indicate to the task scheduler 1002 to put the process into a sleep state (for example, because the CPU 1004 is waiting for completion of the neural network processing). Then, the task scheduler 1002 doesn't assign a time slice for the process' 1010 B. In some embodiments, the CPU 1004 can perform other tasks while the controller 1012 is managing the neural network processing.
  • step 6 the task scheduler 1002 is in a ready state.
  • offloading the neural network processing from the CPU to the smart memory device can dramatically improve system performance by freeing the CPU's resources.
  • the whole memory space can be able to perform large data processing without affecting the system performance.
  • power consumption can be reduced, for example, because processing-intensive neural network computations are performed by the non-volatile memory device, rather than the CPU.

Abstract

Embodiments of an improved memory architecture by processing data inside of the memory device are described. In some embodiments, the memory device can store neural network layers, such as a systolic flow engine, in non-volatile memory and/or a separate DRAM memory. Central processing unit (CPU) of a host system can delegate the execution of a neural network to the memory device. Advantageously, neural network processing in the memory device can be scalable, with the ability to process large amounts of data.

Description

INCORPORATION BY REFERENCE TO ANY PRIORITY APPLICATIONS
Any and all applications for which a foreign or domestic priority claim is identified in the Application Data Sheet as filed with the present application are hereby incorporated by reference under 37 CFR 1.57.
TECHNICAL FIELD
The present disclosure relates to memory device architecture, and more particularly, to data processing inside the memory device via improving machine learning.
BACKGROUND
Machine learning techniques, such as neural networks, are frequently being utilized by modern computing systems. These technologies can operate on large data sets and thus can require large amounts of storage space. However, current memory architectures do not allow for scalability of big data analysis. The present disclosure addresses these and other problems.
BRIEF DESCRIPTION OF THE DRAWINGS
The innovations described in the claims each have several aspects, no single one of which is solely responsible for its desirable attributes. Without limiting the scope of the claims, some prominent features of this disclosure will now be briefly described.
FIGS. 1A and 1B are examples of persistent data transferred between DRAM and persistent storage according to the prior art.
FIG. 2 is an example of analyzing data through artificial intelligence models according to the prior art.
FIG. 3 is an example of a non-volatile memory for central processing unit (CPU) and data processing unit (DPU) operations according to some embodiments.
FIG. 4 is an example system illustrating communication between a CPU and smart memory according to some embodiments.
FIG. 5 is an example system for processing data in a neural network stored in non-volatile memory according to some embodiments.
FIG. 6 is an example of data processing in layers of a neural network stored in non-volatile memory according to some embodiments.
FIG. 7A is an example of repurposing the non-volatile memory for multiple neural networks according to some embodiments.
FIG. 7B is an example of a process for repurposing the non-volatile memory for multiple neural networks according to some embodiments.
FIG. 8 is an example of multiple neural networks configured in non-volatile memory according to some embodiments.
FIG. 9A illustrates an example of the CPU and controller architecture according to some embodiments.
FIG. 9B illustrates an example of a process for performing one or more neural network operations according to some embodiments.
FIG. 10 illustrates an example of the CPU delegating data processing to the neural network according to some embodiments.
DETAILED DESCRIPTION
While certain embodiments are described, these embodiments are presented by way of example only, and are not intended to limit the scope of protection. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions, and changes in the form of the methods and systems described herein may be made without departing from the scope of protection.
Various embodiments of this disclosure provide a memory device (or storage device) configured to perform neural network computations, the device comprising: a volatile memory; a non-volatile memory configured to store one or more layers of a neural network; and a controller configured to: store data in at least one of the volatile memory or the non-volatile memory and retrieve data from at least one of the volatile memory or the non-volatile memory in response to at least one data transfer command received from a host system; perform neural network computations in the non-volatile memory by applying one or more neural network layers to input data received from the host system; and store a result of the neural network computations in the volatile memory for retrieval by the host system.
In the memory device of the preceding paragraph or any paragraphs herein, the input data can be stored in the volatile memory.
In the memory device of the preceding paragraph or any paragraphs herein, the controller can be further configured to perform neural network computations for a plurality of neural networks and use a result of neural network computations for a first neural network as input data for a successive neural network.
In the memory device of the preceding paragraph or any paragraphs herein, the controller can be further configured to reconfigure the first neural network as the successive neural network before inputting the data into the successive network.
In the memory device of the preceding paragraph or any paragraphs herein, the controller can be a sole controller of the memory device.
In the memory device of the preceding paragraph or any paragraphs herein, the controller can be further configured to provide the result of the neural network computations to the host system asynchronously.
In the memory device of the preceding paragraph or any paragraphs herein, provision of the result asynchronously can comprise at least one of polling a state of memory pages in the non-volatile memory or issuing an interrupt.
In the memory device of the preceding paragraph or any paragraphs herein, polling can comprise periodic polling of the state of memory pages.
In the memory device of the preceding paragraph or any paragraphs herein, the result of the neural network computations can be configured to be retrieved synchronously.
In the memory device of the preceding paragraph or any paragraphs herein, the memory device can be further configured to receive a request to initiate neural network computations, the request comprising neural network configuration parameters and input data for neural network computations.
In the memory device of the preceding paragraph or any paragraphs herein, the request to initiate neural network computations can comprise a type of data processing, and the controller can be further configured to identify neural network configuration parameters based on the type of data processing.
Various embodiments of this disclosure provide a method of performing neural network computations in a memory device, the method comprising: by a controller of the memory device: storing data in at least one of the volatile memory or the non-volatile memory and retrieve data from at least one of the volatile memory or the non-volatile memory in response to at least one data transfer command received from a host system; performing neural network computations in the non-volatile memory by applying one or more neural network layers to input data received from the host system; and storing a result of the neural network computations in the volatile memory for retrieval by the host system.
The method of the preceding paragraph or any paragraphs herein, can include setting a locked state of the data before inputting the data into the neural network, and setting an unlocked state of the data after making the output of the neural network available, wherein the locked state can prevent changing the data.
The method of the preceding paragraph or any paragraphs herein, can include configuring the neural network configured to perform the data processing function on the data based on at least one of a number of nodes or a type of activation function.
The method of the preceding paragraph or any paragraphs herein, can include inputting the data into the neural network by initiating back propagation on the neural network, and output of the neural network can include an adjusted weighting for one or more nodes of the neural network.
Various embodiments of this disclosure provide a data storage device configured to perform neural network computations, the data storage device comprising a volatile memory, non-volatile memory, and a sole controller configured to: store data in at least one of the volatile memory or the non-volatile memory and retrieve data from at least one of the volatile memory or the non-volatile memory in response to at least one data transfer command received from a host system; perform neural network computations in the non-volatile memory by applying one or more neural network layers to input data received from the host system and stored in the volatile memory; and store a result of the neural network computations in the volatile memory for retrieval by the host system.
In the device of the preceding paragraph or any paragraphs herein, the request to initiate neural network computations can comprise a type of data processing, and the controller can be further configured to identify neural network configuration parameters based on the type of data processing.
In the device of the preceding paragraph or any paragraphs herein, the neural network may not be directly accessible by a processor of the host system.
In the device of the preceding paragraph or any paragraphs herein, the request to perform the data processing function can comprise neural network configuration parameters and input data for the neural network computations, and the controller can be further configured to define the one or more neural network layers based on the neural network configuration parameters.
In the device of the preceding paragraph or any paragraphs herein, the request to perform the data processing function can comprise a type of data processing, and the controller can be further configured to identify neural network configuration parameters based on the type of data processing and define the one or more neural network layers based on the neural network configuration parameters.
Overview
Traditional memory architectures, such as the architecture found in non-volatile memory (NVM), magnetic random-access memory (MRAM), resistive random-access memory (ReRAM), nantero random-access memory (NRAM), and/or the like, can have low latency properties, providing opportunities to increase performance of computer systems dramatically. However, these traditional memory architectures are unable to efficiently take advantage of the non-volatile memory. Traditional memory architectures suffer from critical drawbacks, in particular if some data is not pre-fetched into the page cache, then persistent data is transferred to the dynamic random-access memory (DRAM) from persistent storage when some data is processed.
FIGS. 1A and 1B are examples 100 and 150 of persistent data transferred between DRAM and persistent storage. The host 102 can include a CPU 104 and DRAM 106. The interface circuitry for the DRAM 106 communicates with the interface circuitry for the persistent storage, such as the solid state drive (SSD) 108A or a hybrid SSD 108B, for each data that has to be processed. The SSD 108A can include a NAND flash memory 110A. The hybrid SSD 108B can include a NAND flash memory 110A and a non-volatile memory (NVM) 110B.
FIG. 2 is an example 200 of analyzing data through artificial intelligence models. In step 202, a host can request analysis of data. The data can be inputted into an artificial intelligence model 204, the data 206 can be processed via the artificial intelligence model, and the data 208 outputted. Then, the user 210 can receive the outputted data. The memory device is typically waiting on receiving the output data and can be wasting time 212 and resources that could have otherwise been used to perform other operations.
Furthermore, current memory chip architectures do not allow for scalability of big data analysis. With such architectures, large amounts of data would have to be transferred to and from the DRAM and the persistent storage devices. As such, simply increasing the number of cores for increased data processing does not address the issues described herein. For example, the storage device may have to copy data to a host side, and the host side may have to process the data. Then, one set of data needs to be copied in DRAM, the CPUs would process the set of data, and the next set of data would then be copied again for processing. This creates a large bottleneck for performance and cannot scale for large data processing. As such, the data processing would take a large amount of time and resource. Moreover, this would result in large overhead in the software stack. Furthermore, with separate CPU cores, each CPU can be dedicated to a subset of data such as modifying the subset of data, resulting in an inconsistent state of data across the CPUs. Moreover, increasing size of the DRAM also comes with inefficiencies, such as an increase in power consumption. Furthermore, the CPU may not be able to address a DRAM over a certain size, and thus the DRAM is not scalable.
FIG. 3 is an example 300 of a non-volatile memory for CPU and DPU operations. A storage device can include a central processing unit (CPU) core 302, a data processing unit (DPU) core 304, a non-volatile memory 306, and passive storage 308. The systolic flow engine is described in more detail in patent application titled “Systolic Neural Network Engine Capable of Forward Propagation” (U.S. patent application Ser. No. 15/981,624, filed on May 16, 2018), and in patent application titled “Reconfigurable Systolic Neural Network Engine” (U.S. patent application Ser. No. 16/233,968, filed on Dec. 27, 2018), the disclosures of each of which is hereby incorporated by reference in its entirety.
Advantageously, non-volatile memory can enable scalability for large data processing and reduce power requirements over DRAM. However, introducing non-volatile memory can create new issues. Moreover, the number of CPU cores cannot simply increase because of inefficiencies created in the task scheduler. The activity of the task scheduler by assignment of time slices for threads execution is increased. Moreover, the number of context switches are increased as well. However, if we can offload data processing into memory pages of the smart memory device, then the task scheduler does not need to manage the shared CPU cores. Moreover, there are issues with cache coherence where the data from DRAM is copied into a CPU's L1/L2 cache for data processing, the same portion of data into L1/L2 cache for every CPU core. But if one core modifies the data, then the DRAM contains an inconsistent state of data. As described herein, disclosed embodiments solve at least these problems.
Communication Between Processor and Smart Memory
Generally, some embodiments of systems and methods described herein improve memory architecture by processing data inside of the memory device. FIG. 4 is an example system 400 illustrating communication between a processor or controller, such as a CPU, and an improved or enhanced (sometimes referred to as “smart”) memory or memory device according to some embodiments. The smart memory device 406 can include a neural network, such as a systolic flow engine, implemented by the non-volatile memory 408, which as described herein can include one or more processors or controllers 410, and a volatile memory, such as DRAM 404, or a non-volatile storage class memory such as MRAM or ReRAM. For the sake of brevity, the rest of the examples in this disclosure will primarily use DRAM for illustration. Note that the various disclosed embodiments are not so limited to the DRAM implementation and can include or apply to any volatile or non-volatile memory used in the same manner by the CPU or other processing unit in the architecture. The DRAM 404 can communicate with an external CPU 402, such as CPU of a host system. Such communication can be performed via a suitable interface (not shown). The smart memory device 406, as the combination of DRAM 404 and non-volatile memory 408 based neural network in a single chip and/or device, can synthesize the CPU-based approach and neural network approach to reduce and/or eliminate the drawbacks mentioned herein. Such combination is able to provide the opportunity to access the data in the DRAM space by a CPU, and also enables the CPU to delegate the execution of specialized processing to the neural network implemented in the non-volatile memory in a faster and more efficient way than by a general purpose CPU. The CPU can initiate data processing in the neural network, and thereafter continue other CPU functions. The CPU can offload the data processing to one or more neural network engines in the memory, thereby relieving the CPU of its resources to perform other tasks. Moreover, the CPU can be the gateway for data processing, ensuring data consistencies. Advantageously, the smart memory device concept may be able to boost the overall system performance.
The improved memory architecture of the smart memory device can transfer data from the storage device into a smart memory device, and thus, the smart memory device can process data internally. Advantageously, data processing on the smart memory device can be scalable, with the ability to process large amounts of data. The smart memory device 406 can store one or more layers of a neural network, as described herein.
Data Processing Via Neural Network in Non-Volatile Memory
FIG. 5 is an example system 500 for processing data in a neural network implemented in non-volatile memory according to some embodiments. The stored layer in DRAM 504 (partitioned into memory units Xi to XD) can receive input data from the CPU 502. The DRAM can input such data into a neural network implemented in the non-volatile memory 506. The neural network can process the input data through its layers and output data that is stored back in the DRAM 504. The steps and/or functions described below can be performed by the CPU 502 and/or a controller within the smart memory.
In some embodiments, the non-volatile memory can configure and/or reconfigure one or more neural networks, and/or store preconfigured neural networks. The non-volatile memory can configure a neural network based on certain received parameters, such as a number of nodes, layers, weights, a desired inference operation, and/or the like.
In some embodiments, the CPU 502 (and/or a controller) can communicate with the DRAM 504 without knowledge of the underlying data processing via the neural network in the non-volatile memory. For example, the CPU 502 can use the DRAM 504 to perform a particular operation on a set of data. The CPU 502 can determine whether to perform the operation internally or to send the data to the non-volatile memory to process the data. The particular operation can be an inference (or training) operation of a neural network that may require substantial processing. The non-volatile memory 506 can receive the input data from the DRAM 504, configure the neural network to perform the inference (or training) operation, process the data through the neural network, and send (or store) the output data to the DRAM 504. The CPU 502 can subsequently retrieve the results of the inference operation from the DRAM 504. Advantageously, the CPU 502 can offload the execution of the inference operation to a separate non-volatile memory 506. Moreover, the non-volatile memory 506 can execute inference operations of the neural network in parallel or substantially in parallel with the other operations being performed in the DRAM 504.
Data Processing in Layers of a Neural Network
FIG. 6 is an example 600 of data processing in layers of a neural network stored in non-volatile memory according to some embodiments. The neural network can efficiently implement specialized data processing. Artificial neural networks (or connectionist systems or machine learning models) can learn to perform certain tasks based on training data. Moreover, such training can occur without task-specific programming. For example, a neural network can learn to identify images that contain cats by analyzing training data of example images that have been manually labeled as “cat” or “no cat.” The neural network can adjust it's weightings in the nodes to identify cats in other images.
The neural network engine used by the disclosed embodiments can be configured to any type of neural network. The neural network engine can define a neural network based on one or more factors, including (1) the number of nodes in one layer, (2) the number of hidden layers, (3) the type of activation function, and/or (4) the matrix of weights for every connection between nodes of layers. In some embodiments, the neural network can be defined based on a functionality, and the neural network engine can retrieve a predefined neural network corresponding to the desired functionality.
In some embodiments, a controller, such as the external CPU and/or a controller of the non-volatile memory, can configure the neural network, such as define the type of neural network for processing of the data. The controller can identify the appropriate input data. For example, the input data may include a picture that is sent into a neural network, such as a systolic flow engine, that is trained to identify people in the picture. The systolic flow engine may output an output stream that provides an indication on whether a person was identified in the picture of the input stream.
The DRAM 602 can receive and store the input data (e.g. N Bytes of input data) and push the data into the neural network. The non-volatile memory can include the layers of the neural network 604A, 604B, . . . 604N. The output of the neural network can be stored back into the DRAM 602. In some embodiments, an output of one neural network can be fed into an input of another neural network. In some embodiments, the DRAM can feed multiple neural networks in non-volatile memory for data processing of multiple functionalities.
In some embodiments, the CPU can lock the corresponding input data as the input data is pushed into the neural network. Thus, if the neural network is still processing the input data, the CPU can wait for the neural network to complete its computations before modifying the input data. The CPU can access the data without modification, such as by performing a read operation.
In some embodiments, the CPU or DRAM's controller can copy the corresponding input data, and push the copy of the data into the neural network. In such cases, the CPU can modify the original input data while the copy of the data is being processed. The circuitry between the neural network layers can include one or more memory cells to store the outputs of a previous layer as inputs to the next layer.
In some embodiments, the DRAM 602 can serve as the input layer and/or the output layer for the neural network. In other embodiments, the DRAM 602 can input the data into an input layer of a neural network and/or receive the output of an output layer of a neural network.
In some embodiments, the non-volatile memory can include all of the layers of the neural network. In other embodiments, the non-volatile memory (e.g. 408 in FIG. 4) and the DRAM 602 can each implement a subset of the layers of the neural network.
In some embodiments, a controller can control the receiving and/or sending of data to and/or from the DRAM 602. The controller can configure the non-volatile memory for a particular neural network. The controller can facilitate data processing through the neural network stored in the non-volatile memory.
In some embodiments, data can be back-propagated through the layers of the non-volatile memory for training purposes. For example, training data can be forward propagated through the neural network. Based on the output of the neural network, the controller can back propagate through each layer by increasing the weight for the nodes that contributed to the desired output and vice versa.
Repurposing Non-Volatile Memory for Multiple Neural Networks
FIG. 7A is an example 700A of repurposing the non-volatile memory for multiple neural networks according to some embodiments. In step 1, the controller, or an external CPU, can set a neural network type for the non-volatile memory 702. Then at step 2, the controller can cause input data from the DRAM to be inputted into the non-volatile memory 702. This can be accomplished by issuing one or more commands to the smart memory device.
At step 3, the data can be processed through the layers of the neural network 702A, 702B, . . . 702N. The output of the non-volatile memory can be inputted back into the non-volatile memory for processing by a subsequent layer. In some cases, multiple neural networks can be used to process data in sequence. For example, at step L, result of processing by a particular neural network can be stored in memory such as temporary memory or buffer (which can be part of the DRAM). At step L+1, subsequent neural network can be configured for the non-volatile memory, and at step L+2, the output that was inputted back into the non-volatile memory can be processed through such subsequent neural network.
FIG. 7B is an example of a process 700B for repurposing the non-volatile memory for multiple neural networks according to some embodiments. The process 700B can be implemented by a system including controller and smart memory device as described herein, such as any of the systems 400, 500, or 900A. In step 702, the process can define the type of neural network. For example, the process can identify the appropriate neural network for the desired data processing required by a host.
In step 704, the process can store input data in the DRAM and cause the input data to be provided to the neural network stored in the non-volatile memory. In step 706, the data can be processed by the neural network. In step 708, the process can receive the output of the neural network.
In step 710, the process can determine whether another neural network is to further process the data or if the data processing is complete. If data processing is complete, then the process ends at step 722.
If there are further neural network processing operations, at step 712, the process can define the type of neural network. The process can determine that the same neural network can be rerun and/or a different neural network is needed.
In step 714, the process can retrieve the stored data from the previous neural network, and in step 716, can input the saved output data from the previous neural network into the newly configured neural network. In step 718, the data can be processed through the neural network. In step 720, the process can save the output of the neural network, for example in the DRAM. Then, the process can continue to step 710, where the process can determine whether another neural network is to further process the data or if the data processing is complete.
Multiple Neural Networks Configured in Non-Volatile Memory
FIG. 8 is an example 800 of multiple neural networks implemented in non-volatile memory 802 according to some embodiments. As described herein, a controller can configure multiple neural networks in the non-volatile memory 802. For example, a first neural network 802A and a second neural network 802L can be configured in the non-volatile memory. The output of a first neural network 802A can be inputted into the next neural network 802L. Advantageously, the output of the first neural network 802A may not have to be stored in temporary memory before being inputted into the next neural network 802L. In some embodiments, a temporary non-volatile or volatile memory buffer can be used between neural network layers to temporarily save the result of every layer. Advantageously, a neural network activity can continue even after a sudden power-off.
In some embodiments, a smart memory device can process neural networks in series, such as the example shown in FIG. 8, in parallel, and/or a combination thereof.
Smart Memory Device Architecture
FIG. 9A illustrates an example 900A of the CPU, communicating with a smart memory device according to some embodiments. The steps and/or functions described below can be performed by the CPU 502 and/or a controller within the smart memory. In this example, the CPU 904 transmits data to a memory page (or another memory unit) 906 of the DRAM in step 1. The CPU 904 determines whether the requested processing on the data involves neural network processing. If not, then the CPU 904 can access and/or modify the data (for example, read and/or write data).
If neural network processing is requested, the CPU 904 can send configuration parameters of the desired neural network to a non-volatile memory controller 908. The controller 908 can process the data through the layers 902A, 902B, . . . 902C of the neural network implemented in the non-volatile memory and send the output of the neural network to the memory page 906 of the DRAM (or another area of DRAM) at step 4. In step 5, the controller 908 can indicate to the CPU 904 that the neural network operation is complete. This can be performed by setting or activating an interrupt. In other embodiments, the CPU 904 can poll the controller 908 for a status of the neural network operation.
FIG. 9B illustrates an example 900B of a process for performing one or more neural network operations according to some embodiments. In step 912, the CPU can receive data from a host CPU to store into memory. The CPU can determine whether the request from the host CPU requires neural network computations at step 914. If not, then at step 916, the CPU can access and/or process the request directly from memory.
If the request requires a neural network operation, at step 920 the CPU can send characteristics of a neural network to a controller 918. The controller 918 can determine the corresponding neural network based on the received characteristics, and at step 922, input the data stored in memory into the neural network. The neural network engine can process the data through the neural network in step 924. In step 926, the controller 918 can send the output of the neural network to the DRAM, and at step 928, the DRAM 910 can store the output data into memory for the CPU to access.
In some embodiments, the memory device can process the data synchronously, and the CPU can wait for the neural network operations to complete. The CPU can optionally send an end function to stop the processing of data through the neural network during data processing. Otherwise, the CPU can poll the memory device. Advantageously for asynchronous processing, the CPU does not have to wait for neural network data processing.
CPU Delegation of Processing to the Non-Volatile Memory
FIG. 10 illustrates an example of a process 1000 delegating data processing to the non-volatile memory implementing one or more neural networks according to some embodiments. The process 1000 can be implemented by a system including a controller as described herein, such as any of the systems 400, 500, or 900A. In step 1, the task scheduler 1002 (which can be implemented by a processor or controller) can select a process for execution, and the CPU can execute the selected process in the allocated time slice.
In step 2, the task scheduler 1002 can manage the delegation of tasks, such as by assigning a time slice to the CPU 1004 to perform a certain task, where the CPU activity is split between time slices. The CPU 1004 can initiate data processing in step 3 by sending the request to a controller 1012 of a smart memory device. The controller 1012 can configure a neural network 1008 to perform the neural network operation(s), receive the input data from memory 1006 (such as DRAM), process the data through the neural network, and send the output data to DRAM memory page, as described herein.
In some embodiments, while the data is being processed by the neural network, the CPU 1004 can indicate to the task scheduler 1002 to put the process into a sleep state (for example, because the CPU 1004 is waiting for completion of the neural network processing). Then, the task scheduler 1002 doesn't assign a time slice for the process' 1010B. In some embodiments, the CPU 1004 can perform other tasks while the controller 1012 is managing the neural network processing.
After neural network processing is finished in step 5, in step 6, the task scheduler 1002 is in a ready state. Advantageously, offloading the neural network processing from the CPU to the smart memory device can dramatically improve system performance by freeing the CPU's resources. In addition, the whole memory space can be able to perform large data processing without affecting the system performance. Also, power consumption can be reduced, for example, because processing-intensive neural network computations are performed by the non-volatile memory device, rather than the CPU.
Other Variations
Any of the embodiments disclosed herein can be used with any of the concepts disclosed in co-pending U.S. Patent Application No. 16/363,744, field on Mar. 25, 2019, and titled “ENHANCED STORAGE DEVICE MEMORY ARCHITECTURE FOR MACHINE LEARNING”, which is hereby incorporated by reference in its entirety.
Those skilled in the art will appreciate that in some embodiments additional system components can be utilized, and disclosed system components can be combined or omitted. Although some embodiments describe video data transmission, disclosed systems and methods can be used for transmission of any type of data. In addition, although some embodiments utilize erasure coding, any suitable error correction schemes can be used. The actual steps taken in the disclosed processes may differ from those shown in the figures. Depending on the embodiment, certain of the steps described above may be removed, others may be added. Accordingly, the scope of the present disclosure is intended to be defined only by reference to the appended claims.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the protection. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the protection. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the protection. For example, the systems and methods disclosed herein can be applied to hard disk drives, hybrid hard drives, and the like. In addition, other forms of storage (such as, DRAM or SRAM, battery backed-up volatile DRAM or SRAM devices, EPROM, EEPROM memory, etc.) may additionally or alternatively be used. As another example, the various components illustrated in the figures may be implemented as software and/or firmware on a processor, ASIC/FPGA, or dedicated hardware. Also, the features and attributes of the specific embodiments disclosed above may be combined in different ways to form additional embodiments, all of which fall within the scope of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of this disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will further be understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Further, references to “a method” or “an embodiment” throughout are not intended to mean the same method or same embodiment, unless the context clearly indicates otherwise.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the various embodiments of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of this disclosure. The example embodiments were chosen and described in order to best explain the principles of this disclosure and the practical application, and to enable others of ordinary skill in the art to understand this disclosure for various embodiments with various modifications as are suited to the particular use contemplated.
Although the present disclosure provides certain preferred embodiments and applications, other embodiments that are apparent to those of ordinary skill in the art, including embodiments which do not provide all of the features and advantages set forth herein, are also within the scope of this disclosure. Accordingly, the scope of the present disclosure is intended to be defined only by reference to the appended claims.

Claims (18)

What is claimed is:
1. A memory device configured to perform neural network computations, the device comprising:
a volatile memory;
a non-volatile memory configured to store one or more layers of a neural network; and
a controller configured to:
store data in at least one of the volatile memory or the non-volatile memory and retrieve data from at least one of the volatile memory or the non-volatile memory in response to at least one data transfer command received from a host system;
perform neural network computations in the non-volatile memory by applying one or more neural network layers to input data received from the host system;
store a result of the neural network computations in the volatile memory; and
provide the result of the neural network computations to the host system asynchronously before completion of neural network computations for all neural network layers stored in the non-volatile memory.
2. The device of claim 1, wherein the input data is stored in the volatile memory.
3. The device of claim 1, wherein the controller is further configured to perform neural network computations for a plurality of neural networks and use a result of neural network computations for a first neural network as input data for a successive neural network.
4. The device of claim 3, wherein the controller is further configured to reconfigure the first neural network as the successive neural network before inputting the data into the successive network.
5. The device of claim 1, wherein the controller is a sole controller of the memory device.
6. The device of claim 1, wherein provision of the result asynchronously comprises at least one of polling a state of memory pages in the non-volatile memory or issuing an interrupt.
7. The device of claim 6, wherein the polling comprises periodic polling of the state of memory pages.
8. The device of claim 1, wherein the memory device is further configured to receive a request to initiate neural network computations, the request comprising neural network configuration parameters and input data for neural network computations.
9. The device of claim 8, wherein the request to initiate neural network computations comprises a type of data processing, and wherein the controller is further configured to identify neural network configuration parameters based on the type of data processing.
10. A method of performing neural network computations in a memory device, the method comprising:
by a controller of the memory device:
storing data in at least one of a volatile memory of the memory device or a non-volatile memory of the memory device and retrieving data from at least one of the volatile memory or the non-volatile memory in response to at least one data transfer command received from a host system;
performing neural network computations in the non-volatile memory by applying one or more neural network layers of a neural network to input data received from the host system; and
storing a result of the neural network computations in the volatile memory for retrieval by the host system, wherein the result of the neural network computations is configured to be retrieved synchronously following completion of neural network computations for all neural network layers stored in the non-volatile memory.
11. The method of claim 10, further comprising, by the controller, setting a locked state of the data before inputting the data into the neural network, and setting an unlocked state of the data after making the output of the neural network available, wherein the locked state prevents changing the data.
12. The method of claim 10, further comprising configuring the neural network based on at least one of a number of nodes or a type of activation function.
13. The method of claim 10, further comprising inputting the data into the neural network by initiating back propagation on the neural network, and wherein output of the neural network includes an adjusted weighting for one or more nodes of the neural network.
14. A data storage device comprising:
a first memory;
a second memory; and
a controller configured to:
store data in at least one of the first memory or the second memory and retrieve data from at least one of the first memory or the second memory in response to at least one data transfer command received from a host system;
perform neural network computations for a plurality of neural networks in the second memory by applying neural network layers to input data received from the host system and stored in the first memory, wherein a first result of neural network computations for a first neural network is used as input data for a successive neural network; and
store a result of the neural network computations in the first memory for retrieval by the host system.
15. The device of claim 14, wherein the device is configured to receive a request to initiate neural network computations comprising a type of data processing, and wherein the controller is further configured to identify neural network configuration parameters based on the type of data processing.
16. The device of claim 14, wherein the plurality of neural networks is not directly accessible by a processor of the host system.
17. The device of claim 14, wherein the device is configured to receive neural network configuration parameters and input data for the neural network computations, and wherein the controller is further configured to define one or more neural network layers based on the neural network configuration parameters.
18. The device of claim 14, wherein the device is configured to receive a request to perform a data processing function comprising a type of data processing, and wherein the controller is further configured to identify neural network configuration parameters based on the type of data processing and define one or more neural network layers based on the neural network configuration parameters.
US16/363,661 2019-03-25 2019-03-25 Enhanced memory device architecture for machine learning Active 2039-04-02 US10929058B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US16/363,661 US10929058B2 (en) 2019-03-25 2019-03-25 Enhanced memory device architecture for machine learning
CN201911218064.8A CN111738430B (en) 2019-03-25 2019-12-03 Enhanced memory device architecture for machine learning
US17/143,001 US11372577B2 (en) 2019-03-25 2021-01-06 Enhanced memory device architecture for machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US16/363,661 US10929058B2 (en) 2019-03-25 2019-03-25 Enhanced memory device architecture for machine learning

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/143,001 Continuation US11372577B2 (en) 2019-03-25 2021-01-06 Enhanced memory device architecture for machine learning

Publications (2)

Publication Number Publication Date
US20200310674A1 US20200310674A1 (en) 2020-10-01
US10929058B2 true US10929058B2 (en) 2021-02-23

Family

ID=72606070

Family Applications (2)

Application Number Title Priority Date Filing Date
US16/363,661 Active 2039-04-02 US10929058B2 (en) 2019-03-25 2019-03-25 Enhanced memory device architecture for machine learning
US17/143,001 Active US11372577B2 (en) 2019-03-25 2021-01-06 Enhanced memory device architecture for machine learning

Family Applications After (1)

Application Number Title Priority Date Filing Date
US17/143,001 Active US11372577B2 (en) 2019-03-25 2021-01-06 Enhanced memory device architecture for machine learning

Country Status (2)

Country Link
US (2) US10929058B2 (en)
CN (1) CN111738430B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210357739A1 (en) * 2020-05-14 2021-11-18 Micron Technology, Inc. Memory device to train neural networks
WO2022116051A1 (en) * 2020-12-02 2022-06-09 Alibaba Group Holding Limited Neural network near memory processing
CN116170802B (en) * 2023-04-26 2023-07-07 浙江鹏信信息科技股份有限公司 Internet of things communication method, system and computer readable storage medium

Citations (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3602186A (en) 1970-08-06 1971-08-31 Charles H Popenoe Opti-mechanical stress-strain indicator
KR790000473B1 (en) 1972-02-16 1979-05-20 Gyrfalcon Inc Opti-mechanical stress-strain indicator
US5519811A (en) 1991-10-17 1996-05-21 Kawasaki Steel Corporation Neural network, processor, and pattern recognition apparatus
US5627943A (en) 1993-02-17 1997-05-06 Kawasaki Steel Corporation Neural network processor including systolic array of two-dimensional layers
US20030004907A1 (en) 2001-05-31 2003-01-02 Canon Kabushiki Kaisha Pulse signal circuit, parallel processing circuit, pattern recognition system, and image input system
US20040156547A1 (en) 2003-01-17 2004-08-12 Parimics, Inc. Method and apparatus for image processing
US20120257506A1 (en) 2009-12-22 2012-10-11 Cuneyt Bazlamacci Systolic array architecture for fast ip lookup
US20140270494A1 (en) 2013-03-15 2014-09-18 Sri International Computer vision as a service
US20160142731A1 (en) 2009-02-19 2016-05-19 Sony Corporation Image processing apparatus and method
US20160342893A1 (en) 2015-05-21 2016-11-24 Google Inc. Rotating data for neural network computations
WO2017006512A1 (en) 2015-07-08 2017-01-12 株式会社デンソー Arithmetic processing device
US9721203B1 (en) 2016-11-10 2017-08-01 Google Inc. Performing kernel striding in hardware
US20180005115A1 (en) 2016-06-29 2018-01-04 International Business Machines Corporation Accelerated neural network training using a pipelined resistive processing unit architecture
US20180101748A1 (en) 2016-10-10 2018-04-12 Gyrfalcon Technology Inc. Hierarchical Category Classification Scheme Using Multiple Sets of Fully-Connected Networks With A CNN Based Integrated Circuit As Feature Extractor
US20180101747A1 (en) 2016-10-10 2018-04-12 Gyrfalcon Technology, Inc. Data Structure For CNN Based Digital Integrated Circuit For Extracting Features Out Of An Input Image
US20180101743A1 (en) 2016-10-10 2018-04-12 Gyrfalcon Technology, Inc. Digital Integrated Circuit For Extracting Features Out Of An Input Image Based On Cellular Neural Networks
US9959500B1 (en) 2017-04-21 2018-05-01 Gyrfalcon Technology Inc. Embedded spin transfer torque memory for cellular neural network based processing unit
US20180157940A1 (en) 2016-10-10 2018-06-07 Gyrfalcon Technology Inc. Convolution Layers Used Directly For Feature Extraction With A CNN Based Integrated Circuit
US20180174031A1 (en) 2016-10-10 2018-06-21 Gyrfalcon Technology Inc. Implementation Of ResNet In A CNN Based Digital Integrated Circuit
US20180189595A1 (en) 2016-10-10 2018-07-05 Gyrfalcon Technology Inc. Implementation Of MobileNet In A CNN Based Digital Integrated Circuit
US20180247113A1 (en) 2016-10-10 2018-08-30 Gyrfalcon Technology Inc. Image Classification Systems Based On CNN Based IC and Light-Weight Classifier
US20180268234A1 (en) 2016-10-10 2018-09-20 Gyrfalcon Technology Inc. Object Detection And Recognition Apparatus Based On CNN Based Integrated Circuits
US10083171B1 (en) 2017-08-03 2018-09-25 Gyrfalcon Technology Inc. Natural language processing using a CNN based integrated circuit
US20180285713A1 (en) 2017-04-03 2018-10-04 Gyrfalcon Technology Inc. Buffer Memory Architecture For A CNN Based Processing Unit And Creation Methods Thereof
US20180285720A1 (en) 2017-04-03 2018-10-04 Gyrfalcon Technology Inc. Memory subsystem in cnn based digital ic for artificial intelligence
US20180285005A1 (en) 2017-04-03 2018-10-04 Gyrfalcon Technology Inc. Embedded Memory Subsystems For A CNN Based Processing Unit And Methods of Making
US20180285006A1 (en) 2017-04-03 2018-10-04 Gyrfalcon Technology Inc. Mlc based magnetic random access memory used in cnn based digital ic for ai
US20180285714A1 (en) 2017-04-03 2018-10-04 Gyrfalcon Technology Inc. Fabrication methods of memory subsystem used in cnn based digital ic for ai
US20180285723A1 (en) 2017-04-03 2018-10-04 Gyrfalcon Technology Inc. Memory subsystem in cnn based digital ic for artificial intelligence
US20180285722A1 (en) 2017-04-03 2018-10-04 Gyrfalcon Technology Inc. Memory subsystem in cnn based digital ic for artificial intelligence
US10102453B1 (en) 2017-08-03 2018-10-16 Gyrfalcon Technology Inc. Natural language processing via a two-dimensional symbol having multiple ideograms contained therein
US20200073726A1 (en) * 2018-08-29 2020-03-05 International Business Machines Corporation Learning-based thermal estimation in multicore architecture
US20200134462A1 (en) * 2018-10-25 2020-04-30 International Business Machines Corporation Perform destages of tracks with holes in a storage system by training a machine learning module
US20200133531A1 (en) * 2018-10-31 2020-04-30 Western Digital Technologies, Inc. Transferring computational operations to controllers of data storage devices

Family Cites Families (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2703010B2 (en) 1988-12-23 1998-01-26 株式会社日立製作所 Neural net signal processing processor
US5138695A (en) 1989-10-10 1992-08-11 Hnc, Inc. Systolic array image processing system
WO1991018349A1 (en) 1990-05-22 1991-11-28 International Business Machines Corporation Scalable flow virtual learning neurocomputer
US5226092A (en) 1991-06-28 1993-07-06 Digital Equipment Corporation Method and apparatus for learning in a neural network
US5659781A (en) 1994-06-29 1997-08-19 Larson; Noble G. Bidirectional systolic ring network
US5799134A (en) 1995-03-13 1998-08-25 Industrial Technology Research Institute One dimensional systolic array architecture for neural network
US5812993A (en) 1996-03-07 1998-09-22 Technion Research And Development Foundation Ltd. Digital hardware architecture for realizing neural network
US8442927B2 (en) 2009-07-30 2013-05-14 Nec Laboratories America, Inc. Dynamically configurable, multi-ported co-processor for convolutional neural networks
US8392683B1 (en) 2009-11-30 2013-03-05 Micron Technology, Inc. Dynamic range unlock or lock memory device and method to operate the same
US8824603B1 (en) 2013-03-01 2014-09-02 Futurewei Technologies, Inc. Bi-directional ring-bus architecture for CORDIC-based matrix inversion
US20140289445A1 (en) 2013-03-22 2014-09-25 Antony Savich Hardware accelerator system and method
US9390368B2 (en) 2013-10-21 2016-07-12 International Business Machines Corporation Coupling parallel event-driven computation with serial computation
US9978014B2 (en) 2013-12-18 2018-05-22 Intel Corporation Reconfigurable processing unit
US10331997B2 (en) * 2014-05-07 2019-06-25 Seagate Technology Llc Adaptive configuration of a neural network device
KR101572932B1 (en) 2014-07-11 2015-11-30 현대자동차주식회사 Method and apparatus for controlling an origination call in vehicle using voice recognition function
US10049322B2 (en) 2015-05-21 2018-08-14 Google Llc Prefetching weights for use in a neural network processor
US10192162B2 (en) 2015-05-21 2019-01-29 Google Llc Vector computation unit in a neural network processor
US10083395B2 (en) 2015-05-21 2018-09-25 Google Llc Batch processing in a neural network processor
US9747546B2 (en) 2015-05-21 2017-08-29 Google Inc. Neural network processor
US10438117B1 (en) 2015-05-21 2019-10-08 Google Llc Computing convolutions using a neural network processor
US10445650B2 (en) 2015-11-23 2019-10-15 Microsoft Technology Licensing, Llc Training and operating multi-layer computational models
GB201718756D0 (en) 2017-11-13 2017-12-27 Cambridge Bio-Augmentation Systems Ltd Neural interface
US10817802B2 (en) 2016-05-07 2020-10-27 Intel Corporation Apparatus for hardware accelerated machine learning
EP3459019A4 (en) * 2016-05-17 2020-02-19 Silicon Storage Technology, Inc. Deep learning neural network classifier using non-volatile memory array
WO2017206156A1 (en) 2016-06-03 2017-12-07 Intel Corporation Look-up convolutional layer in convolutional neural network
US9715656B1 (en) 2016-09-12 2017-07-25 International Business Machines Corporation Killing asymmetric resistive processing units for neural network training
CN106485317A (en) * 2016-09-26 2017-03-08 上海新储集成电路有限公司 A kind of neutral net accelerator and the implementation method of neural network model
KR101997975B1 (en) * 2016-12-01 2019-07-08 한국과학기술원 Spiking neural network system for dynamic control of flexible, stable and hybrid memory storage
US10528321B2 (en) 2016-12-07 2020-01-07 Microsoft Technology Licensing, Llc Block floating point for neural network implementations
US10037490B2 (en) 2016-12-13 2018-07-31 Google Llc Performing average pooling in hardware
US10359953B2 (en) 2016-12-16 2019-07-23 Western Digital Technologies, Inc. Method and apparatus for offloading data processing to hybrid storage devices
US11010431B2 (en) * 2016-12-30 2021-05-18 Samsung Electronics Co., Ltd. Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet SSD
US10922607B2 (en) 2016-12-30 2021-02-16 Intel Corporation Event driven and time hopping neural network
US10521488B1 (en) 2016-12-30 2019-12-31 X Development Llc Dynamic partitioning
US10402527B2 (en) 2017-01-04 2019-09-03 Stmicroelectronics S.R.L. Reconfigurable interconnect
US10909447B2 (en) 2017-03-09 2021-02-02 Google Llc Transposing neural network matrices in hardware
US10585621B2 (en) 2017-04-21 2020-03-10 Intel Corporation Statically-schedulable feed and drain structure for systolic array architecture
US10824938B2 (en) 2017-04-24 2020-11-03 Intel Corporation Specialized fixed function hardware for efficient convolution
US10838910B2 (en) 2017-04-27 2020-11-17 Falcon Computing Systems and methods for systolic array design from a high-level program
TW202024961A (en) 2017-05-17 2020-07-01 美商谷歌有限責任公司 Low latency matrix multiply unit
US10019668B1 (en) 2017-05-19 2018-07-10 Google Llc Scheduling neural network processing
US9928460B1 (en) 2017-06-16 2018-03-27 Google Llc Neural network accelerator tile architecture with three-dimensional stacking
US10790828B1 (en) 2017-07-21 2020-09-29 X Development Llc Application specific integrated circuit accelerators
US20190042918A1 (en) 2017-08-01 2019-02-07 Wave Computing, Inc. Remote usage of machine learned layers by a second machine learning construct
US10552251B2 (en) 2017-09-06 2020-02-04 Western Digital Technologies, Inc. Storage of neural networks
WO2019075267A1 (en) 2017-10-11 2019-04-18 Google Llc Self-gating activation neural network layers
US20190114548A1 (en) 2017-10-17 2019-04-18 Xilinx, Inc. Static block scheduling in massively parallel software defined hardware systems
US11386644B2 (en) 2017-10-17 2022-07-12 Xilinx, Inc. Image preprocessing for generalized image processing
US10360214B2 (en) 2017-10-19 2019-07-23 Pure Storage, Inc. Ensuring reproducibility in an artificial intelligence infrastructure
US10936942B2 (en) 2017-11-21 2021-03-02 Google Llc Apparatus and mechanism for processing neural network tasks using a single chip package with multiple identical dies
US10846621B2 (en) 2017-12-12 2020-11-24 Amazon Technologies, Inc. Fast context switching for computational networks
CN108038542B (en) * 2017-12-27 2022-01-07 上海闪易半导体有限公司 Storage module, module and data processing method based on neural network
US10685446B2 (en) 2018-01-12 2020-06-16 Intel Corporation Method and system of recurrent semantic segmentation for image processing
US10459876B2 (en) 2018-01-31 2019-10-29 Amazon Technologies, Inc. Performing concurrent operations in a processing element
US11494582B2 (en) 2018-02-08 2022-11-08 Western Digital Technologies, Inc. Configurable neural network engine of tensor arrays and memory cells
US11551064B2 (en) 2018-02-08 2023-01-10 Western Digital Technologies, Inc. Systolic neural network engine capable of forward propagation
US10963394B2 (en) * 2018-04-16 2021-03-30 Samsung Electronics Co., Ltd. System and method for optimizing performance of a solid-state drive using a deep neural network
US10459849B1 (en) 2018-08-31 2019-10-29 Sas Institute Inc. Scheduling operations in an access-controlled region of memory
US20200127685A1 (en) 2018-10-19 2020-04-23 Nyquist Semiconductor Limited Systems and methods for a hybrid non-volatile storage system
US11562214B2 (en) 2019-03-14 2023-01-24 Baidu Usa Llc Methods for improving AI engine MAC utilization
US11783176B2 (en) 2019-03-25 2023-10-10 Western Digital Technologies, Inc. Enhanced storage device memory architecture for machine learning

Patent Citations (59)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3602186A (en) 1970-08-06 1971-08-31 Charles H Popenoe Opti-mechanical stress-strain indicator
BE771045A (en) 1970-08-06 1971-12-16 Gyrfalcon Inc OPTICO-MECHANICAL STRESS INDICATOR
FR2104032A5 (en) 1970-08-06 1972-04-14 Gyrfalcon Inc
AU3190271A (en) 1970-08-06 1973-02-08 Gyrfalcon, Inc Opti mechanical stress strain indicator
GB1316899A (en) 1970-08-06 1973-05-16 Gyrfalcon Inc Opti-mechanical stress-strain indicator
CA930619A (en) 1970-08-06 1973-07-24 Gyrfalcon Opti-mechanical stress-strain indicator
SE361090B (en) 1970-08-06 1973-10-15 Gyrfalcon Inc
IL37434A (en) 1970-08-06 1974-01-14 Gyrfalcon Inc Opti-mechanical stress-strain indicator
ES196704Y (en) 1970-08-06 1975-08-01 Gyrfalcon Inc. OPTICAL-MECHANICAL INDICATOR OF TENSION OR STRESS.
DE2139302C3 (en) 1970-08-06 1978-10-26 Gyrfalcon Inc. Camp Hill, Pa. (V.St.A.) Optical-mechanical tension or strain indicator
KR790000473B1 (en) 1972-02-16 1979-05-20 Gyrfalcon Inc Opti-mechanical stress-strain indicator
US5519811A (en) 1991-10-17 1996-05-21 Kawasaki Steel Corporation Neural network, processor, and pattern recognition apparatus
US5627943A (en) 1993-02-17 1997-05-06 Kawasaki Steel Corporation Neural network processor including systolic array of two-dimensional layers
US20080270335A1 (en) 2001-05-31 2008-10-30 Canon Kabushiki Kaisha Pulse signal circuit, parallel processing circuit, and pattern recognition system
US20030004907A1 (en) 2001-05-31 2003-01-02 Canon Kabushiki Kaisha Pulse signal circuit, parallel processing circuit, pattern recognition system, and image input system
US7743004B2 (en) 2001-05-31 2010-06-22 Canon Kabushiki Kaisha Pulse signal circuit, parallel processing circuit, and pattern recognition system
US7085749B2 (en) 2001-05-31 2006-08-01 Canon Kabushiki Kaisha Pulse signal circuit, parallel processing circuit, pattern recognition system, and image input system
US20070011120A1 (en) 2001-05-31 2007-01-11 Canon Kabushiki Kaisha Pulse signal circuit, parallel processing circuit, pattern recognition system, and image input system
US7437339B2 (en) 2001-05-31 2008-10-14 Canon Kabuhsiki Kaisha Pulse signal circuit, parallel processing circuit, pattern recognition system, and image input system
US20040156547A1 (en) 2003-01-17 2004-08-12 Parimics, Inc. Method and apparatus for image processing
US7489834B2 (en) 2003-01-17 2009-02-10 Parimics, Inc. Method and apparatus for image processing
US20040156546A1 (en) 2003-01-17 2004-08-12 Parimics, Inc. Method and apparatus for image processing
US7564996B2 (en) 2003-01-17 2009-07-21 Parimics, Inc. Method and apparatus for image processing
US20160142731A1 (en) 2009-02-19 2016-05-19 Sony Corporation Image processing apparatus and method
US20120257506A1 (en) 2009-12-22 2012-10-11 Cuneyt Bazlamacci Systolic array architecture for fast ip lookup
US8724624B2 (en) 2009-12-22 2014-05-13 Cuneyt Bazlamacci Systolic array architecture for fast IP lookup
US20140270494A1 (en) 2013-03-15 2014-09-18 Sri International Computer vision as a service
US20170103318A1 (en) 2015-05-21 2017-04-13 Google Inc. Rotating data for neural network computations
US20180107921A1 (en) 2015-05-21 2018-04-19 Google Llc Rotating data for neural network computations
US20160342893A1 (en) 2015-05-21 2016-11-24 Google Inc. Rotating data for neural network computations
US9747548B2 (en) 2015-05-21 2017-08-29 Google Inc. Rotating data for neural network computations
US9805303B2 (en) 2015-05-21 2017-10-31 Google Inc. Rotating data for neural network computations
WO2017006512A1 (en) 2015-07-08 2017-01-12 株式会社デンソー Arithmetic processing device
US20180005115A1 (en) 2016-06-29 2018-01-04 International Business Machines Corporation Accelerated neural network training using a pipelined resistive processing unit architecture
US20180189595A1 (en) 2016-10-10 2018-07-05 Gyrfalcon Technology Inc. Implementation Of MobileNet In A CNN Based Digital Integrated Circuit
US20180268234A1 (en) 2016-10-10 2018-09-20 Gyrfalcon Technology Inc. Object Detection And Recognition Apparatus Based On CNN Based Integrated Circuits
US20180101743A1 (en) 2016-10-10 2018-04-12 Gyrfalcon Technology, Inc. Digital Integrated Circuit For Extracting Features Out Of An Input Image Based On Cellular Neural Networks
US20180101748A1 (en) 2016-10-10 2018-04-12 Gyrfalcon Technology Inc. Hierarchical Category Classification Scheme Using Multiple Sets of Fully-Connected Networks With A CNN Based Integrated Circuit As Feature Extractor
US20180101747A1 (en) 2016-10-10 2018-04-12 Gyrfalcon Technology, Inc. Data Structure For CNN Based Digital Integrated Circuit For Extracting Features Out Of An Input Image
US20180157940A1 (en) 2016-10-10 2018-06-07 Gyrfalcon Technology Inc. Convolution Layers Used Directly For Feature Extraction With A CNN Based Integrated Circuit
US20180174031A1 (en) 2016-10-10 2018-06-21 Gyrfalcon Technology Inc. Implementation Of ResNet In A CNN Based Digital Integrated Circuit
US10043095B2 (en) 2016-10-10 2018-08-07 Gyrfalcon Technology, Inc. Data structure for CNN based digital integrated circuit for extracting features out of an input image
US20180247113A1 (en) 2016-10-10 2018-08-30 Gyrfalcon Technology Inc. Image Classification Systems Based On CNN Based IC and Light-Weight Classifier
US20180129936A1 (en) 2016-11-10 2018-05-10 Google Inc. Performing kernel striding in hardware
US9721203B1 (en) 2016-11-10 2017-08-01 Google Inc. Performing kernel striding in hardware
US20180285713A1 (en) 2017-04-03 2018-10-04 Gyrfalcon Technology Inc. Buffer Memory Architecture For A CNN Based Processing Unit And Creation Methods Thereof
US20180285723A1 (en) 2017-04-03 2018-10-04 Gyrfalcon Technology Inc. Memory subsystem in cnn based digital ic for artificial intelligence
US20180285722A1 (en) 2017-04-03 2018-10-04 Gyrfalcon Technology Inc. Memory subsystem in cnn based digital ic for artificial intelligence
US20180285720A1 (en) 2017-04-03 2018-10-04 Gyrfalcon Technology Inc. Memory subsystem in cnn based digital ic for artificial intelligence
US20180285005A1 (en) 2017-04-03 2018-10-04 Gyrfalcon Technology Inc. Embedded Memory Subsystems For A CNN Based Processing Unit And Methods of Making
US20180285006A1 (en) 2017-04-03 2018-10-04 Gyrfalcon Technology Inc. Mlc based magnetic random access memory used in cnn based digital ic for ai
US20180285714A1 (en) 2017-04-03 2018-10-04 Gyrfalcon Technology Inc. Fabrication methods of memory subsystem used in cnn based digital ic for ai
US9959500B1 (en) 2017-04-21 2018-05-01 Gyrfalcon Technology Inc. Embedded spin transfer torque memory for cellular neural network based processing unit
US20180309050A1 (en) 2017-04-21 2018-10-25 Gyrfalcon Technology Inc. Process of fabricating embedded spin transfer torque memory for cellular neural network based processing unit
US10083171B1 (en) 2017-08-03 2018-09-25 Gyrfalcon Technology Inc. Natural language processing using a CNN based integrated circuit
US10102453B1 (en) 2017-08-03 2018-10-16 Gyrfalcon Technology Inc. Natural language processing via a two-dimensional symbol having multiple ideograms contained therein
US20200073726A1 (en) * 2018-08-29 2020-03-05 International Business Machines Corporation Learning-based thermal estimation in multicore architecture
US20200134462A1 (en) * 2018-10-25 2020-04-30 International Business Machines Corporation Perform destages of tracks with holes in a storage system by training a machine learning module
US20200133531A1 (en) * 2018-10-31 2020-04-30 Western Digital Technologies, Inc. Transferring computational operations to controllers of data storage devices

Non-Patent Citations (14)

* Cited by examiner, † Cited by third party
Title
International Search Report and Written Opinion from International Application No. PCT/US2018/066593, dated Mar. 29, 2019, 11 pages.
International Search Report and Written Opinion from International Application No. PCT/US2018/066917, dated Mar. 29, 2019, 11 pages.
Mahapatra et al.; "Mapping of Neural Network Models onto Systolic Arrays", Journal of Parallel and Distributed Computing, vol. 60, Issue 6, Jun. 2000, pp. 677-689; available at: https://www.sciencedirect.com/science/article/abs/pii/S0743731500916344.
Pending U.S. Appl. No. 15/981,679, filed May 16, 2018, entitled "Systolic Neural Network Engine With Crossover Connection Optimization", Luiz M. Franca-Neto.
Pending U.S. Appl. No. 16/363,744, filed Mar. 25, 2019, entitled "Enhanced Storage Device Memory Architecture for Machine Learning", Luiz M. Franca-Neto.
U.S. Appl. No. 15/981,624, filed May 16, 2018, Franca-Neto.
U.S. Appl. No. 15/981,664, filed May 16, 2018, Franca-Neto.
U.S. Appl. No. 15/981,711, filed May 16, 2018, Franca-Neto.
U.S. Appl. No. 15/981,719, filed May 16, 2018, Franca-Neto.
U.S. Appl. No. 15/981,735, filed May 16, 2018, Franca-Neto.
U.S. Appl. No. 16/233,876, filed Dec. 27, 2018, Franca-Neto.
U.S. Appl. No. 16/233,968, filed Dec. 27, 2018, Franca-Neto.
U.S. Appl. No. 16/234,166, filed Dec. 27, 2018, Franca-Neto.
U.S. Appl. No. 16/234,184, filed Dec. 27, 2018, Franca-Neto.

Also Published As

Publication number Publication date
US20210124524A1 (en) 2021-04-29
US11372577B2 (en) 2022-06-28
CN111738430A (en) 2020-10-02
US20200310674A1 (en) 2020-10-01
CN111738430B (en) 2023-11-28

Similar Documents

Publication Publication Date Title
US11372577B2 (en) Enhanced memory device architecture for machine learning
US20210042219A1 (en) Apparatuses and methods for memory address translation during block migration
US11113198B2 (en) Timed data transfer between a host system and a memory sub-system
US11573742B2 (en) Dynamic data placement for collision avoidance among concurrent write streams
US11669272B2 (en) Predictive data transfer based on availability of media units in memory sub-systems
US11269552B2 (en) Multi-pass data programming in a memory sub-system having multiple dies and planes
US11782841B2 (en) Management of programming mode transitions to accommodate a constant size of data transfer between a host system and a memory sub-system
US11740812B2 (en) Data storage device idle time processing
US11783176B2 (en) Enhanced storage device memory architecture for machine learning
JP7381429B2 (en) Storage system and method for accelerating hierarchical sorting around storage
US20180196611A1 (en) Highly scalable computational active ssd storage device
US10289306B1 (en) Data storage system with core-affined thread processing of data movement requests
US11782643B2 (en) Partial execution of a write command from a host system
US20220374348A1 (en) Hardware Acceleration
US20230325117A1 (en) Speculative command processing interface in storage systems
US20230185739A1 (en) Efficient and concurrent model execution
JP2024030903A (en) memory system
TW202347118A (en) Hardware accelerated database sorting in solid state storage drives

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: WESTERN DIGITAL TECHNOLOGIES, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FRANCA-NETO, LUIZ M.;DUBEYKO, VIACHESLAV;SIGNING DATES FROM 20190320 TO 20190423;REEL/FRAME:049005/0649

AS Assignment

Owner name: JPMORGAN CHASE BANK, N.A., AS AGENT, ILLINOIS

Free format text: SECURITY INTEREST;ASSIGNOR:WESTERN DIGITAL TECHNOLOGIES, INC.;REEL/FRAME:052915/0566

Effective date: 20200113

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: WESTERN DIGITAL TECHNOLOGIES, INC., CALIFORNIA

Free format text: RELEASE OF SECURITY INTEREST AT REEL 052915 FRAME 0566;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:059127/0001

Effective date: 20220203

AS Assignment

Owner name: JPMORGAN CHASE BANK, N.A., ILLINOIS

Free format text: PATENT COLLATERAL AGREEMENT - A&R LOAN AGREEMENT;ASSIGNOR:WESTERN DIGITAL TECHNOLOGIES, INC.;REEL/FRAME:064715/0001

Effective date: 20230818

Owner name: JPMORGAN CHASE BANK, N.A., ILLINOIS

Free format text: PATENT COLLATERAL AGREEMENT - DDTL LOAN AGREEMENT;ASSIGNOR:WESTERN DIGITAL TECHNOLOGIES, INC.;REEL/FRAME:067045/0156

Effective date: 20230818