EP3756186A2 - Puce de mémoire non volatile à réseau neuronal à apprentissage profond - Google Patents

Puce de mémoire non volatile à réseau neuronal à apprentissage profond

Info

Publication number
EP3756186A2
EP3756186A2 EP19888248.2A EP19888248A EP3756186A2 EP 3756186 A2 EP3756186 A2 EP 3756186A2 EP 19888248 A EP19888248 A EP 19888248A EP 3756186 A2 EP3756186 A2 EP 3756186A2
Authority
EP
European Patent Office
Prior art keywords
neural network
elements
nvm
nand
die
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP19888248.2A
Other languages
German (de)
English (en)
Other versions
EP3756186A4 (fr
Inventor
Rami Rom
Ofir Pele
Alexander Bazarsky
Tomer Tzvi ELIASH
Ran ZAMIR
Karin Inbar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Western Digital Technologies Inc
Original Assignee
Western Digital Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/212,596 external-priority patent/US11133059B2/en
Priority claimed from US16/212,586 external-priority patent/US20200184335A1/en
Application filed by Western Digital Technologies Inc filed Critical Western Digital Technologies Inc
Priority to EP20178429.5A priority Critical patent/EP3789925A1/fr
Publication of EP3756186A2 publication Critical patent/EP3756186A2/fr
Publication of EP3756186A4 publication Critical patent/EP3756186A4/fr
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the disclosure relates, in some embodiments, to non-volatile memory (NVM) arrays and to data storage controllers for use therewith. More specifically, but not exclusively, the disclosure relates to methods and apparatus for implementing deep learning neural networks within an NVM die under the control of a data storage controller.
  • NVM non-volatile memory
  • Deep learning (which also may be referred to as deep structured learning or hierarchical learning) relates to machine learning methods based on learning data representations or architectures, such as deep neural networks (DNNs), rather than to task- specific procedures or algorithms. Deep learning is applied to such fields as speech recognition, computer vision, and self-driving vehicles. Deep learning may be accomplished by, or facilitated by, deep learning accelerators (DLAs), e.g., microprocessor devices designed to accelerate the generation of useful neural networks to implement deep learning.
  • DLAs deep learning accelerators
  • One embodiment of the disclosure provides an apparatus that includes: a die comprising non-volatile memory (NVM) elements formed in the die and configured to store neural network synaptic weight values within a plurality of word lines; and a plurality of neural network processing components formed in the die and configured to access the synaptic weight values in parallel from the word lines and perform neural network operations in parallel using the synaptic weight values.
  • NVM non-volatile memory
  • Another embodiment of the disclosure provides a method including: storing neural network synaptic weight values for a neural network within a plurality of word lines of NVM elements of a die; sensing a plurality of the neural network synaptic weight values in parallel from the word lines of the NVM elements; and performing neural network operations in parallel using the sensed neural network synaptic weight values, wherein the neural network operations are performed in parallel by a plurality of neural network processing components formed within the die.
  • Yet another embodiment of the disclosure provides an apparatus that includes: means for storing neural network synaptic weight values for a neural network within non volatile memory (NVM) elements of a die of the apparatus, where the synaptic weight values are within the NVM elements of the die within a plurality of word lines; means for accessing synaptic weight values in parallel from the word lines using synaptic weight value access components formed within the die; means for inputting neural network input data; and means for performing neural network operations in parallel using the neural network input data and the synaptic weight values accessed by the means for accessing, wherein the means for performing the neural network operations in parallel comprises a plurality of neural network processing components formed within the die.
  • NVM non volatile memory
  • NVM non-volatile memory
  • a neural network processing component formed under the die and coupled to the NVM elements, the neural network processing component configured to perform neural network operations using neural network data stored in the NVM elements; and an on-chip copy with update component configured to perform an on-chip copy of at least some updated synaptic weights stored in the NVM elements.
  • Yet another embodiment of the disclosure provides a method for neural network processing using an apparatus including NAND NVM elements.
  • the method includes: sensing synaptic weights of a neural network stored within the NAND NVM elements; performing a neural network operation on the sensed synaptic weights, wherein the neural network operation modifies at least some of the synaptic weights; and performing a NAND- based on-chip copy and update within the apparatus to save the modified synaptic weights within the NAND NVM elements.
  • Yet another embodiment of the disclosure provides an apparatus that includes: an NVM array comprising a die with an on-chip copy with update component; a processor configured to generate a first mapping table that maps neural-network-weight units to corresponding virtual locations within the memory array, where a virtual location of the virtual locations is represented by a virtual block identifier corresponding to physical location in the memory array, generate a second mapping table that maps the virtual block identifier to a physical block identifier, convert a neural-network-weight unit to a virtual block identifier using the first table, and convert the virtual block identifier to a physical block identifier using the second table; and an output component configured to send the physical block identifier to the die of the NVM for processing in connection with the on-chip copy component of the die.
  • Still another embodiment of the disclosure provides a method for use by a controller of an apparatus that includes a memory array of NVM elements and an on-chip copy with update component.
  • the method includes: generating a first mapping table that maps neural-network-weight unit to corresponding virtual locations within the memory array, where a virtual location of the virtual locations is represented by a virtual block identifier corresponding to physical location in the memory array; generating a second mapping table that maps the virtual block identifier to a physical block identifier; converting a neural- network-weight unit to a virtual block identifier using the first table; converting the virtual block identifier to a physical block identifier using the second table; and sending the physical block identifier to the memory array for processing in connection with the on-chip copy with update component of the memory array.
  • FIG. 1 shows a schematic block diagram configuration for an exemplary solid state device (SSD) having one or more non-volatile memory (NVM) array dies, where the dies have under-the-array or next-to-the-array deep learning accelerator (DLA) components.
  • SSD solid state device
  • NVM non-volatile memory
  • DLA deep learning accelerator
  • FIG. 2 illustrates an example of an NVM die having under-the-array or next-to- the-array components configured for neural network processing.
  • FIG. 3 illustrates another example of an NVM die having under-the-array or next- to-the-array components configured for neural network processing.
  • FIG. 4 illustrates an example of a NAND block for storing synaptic weights in word lines that can be sensed in parallel by under-the-array or next-to-the-array die components.
  • FIG. 5 illustrates a flow chart of an exemplary method according to aspects of the present disclosure for performing neural accelerator operations.
  • FIG. 6 illustrates a flow chart of exemplary feedforward operations.
  • FIG. 7 illustrates a flow chart of exemplary backpropagation operations.
  • FIG. 8 illustrates a flow chart that summarizes exemplary NAND-based on-chip copy with update operations.
  • FIG. 9 illustrates a flow chart of exemplary NAND-based on-chip copy with update operations for use in updating synaptic weights.
  • FIG. 10 illustrates exemplary first and second flash translation layer (FTL) mapping tables for use within a controller of an NVM die that stores synaptic weights.
  • FTL flash translation layer
  • FIG. 11 illustrates a flow chart of exemplary FTL processing performed by a controller that uses first and second FTL mapping tables.
  • FIG. 12 illustrates a flow chart that summarizes exemplary neural network operations performed by an NVM die.
  • FIG. 13 illustrates a flow chart of exemplary feedforward neural network operations performed by an NVM die using under-the-array or next-to-the-array circuit components.
  • FIG. 14 illustrates a flow chart of additional exemplary feedforward neural network operations performed by an NVM die.
  • FIG. 15 illustrates a flow chart of exemplary backpropagation neural network operations performed by an NVM die that uses an off-chip read-modify-write to update synaptic weights.
  • FIG. 16 illustrates a flow chart of exemplary backpropagation neural network operations performed by an NVM die that uses an NAND-based on-chip copy to update synaptic weights.
  • FIG. 17 illustrates a flow chart of exemplary mapping table operations performed by a controller that uses first and second mapping tables.
  • FIG. 18 illustrates a schematic block diagram configuration for an exemplary NVM apparatus such as a NAND die.
  • FIG. 19 illustrates a schematic block diagram configuration for an exemplary data storage apparatus such as an SSD having a controller and a NAND die.
  • NVM non-volatile memory
  • data storage devices or apparatus for controlling the NVM arrays such as a controller of a data storage device (such as an SSD), and in particular to NAND flash memory storage devices (herein "NANDs").
  • NAND NAND flash memory storage devices
  • a NAND is a type of non-volatile storage technology that does not require power to retain data. It exploits negative-AND, i.e. NAND, logic.
  • NAND NAND flash memory storage devices
  • an SSD having one or more NAND dies will be used below in the description of various embodiments. It is understood that at least some aspects described herein may be applicable to other forms of data storage devices as well. For example, at least some aspects described herein may be applicable to phase-change memory (PCM) arrays, magneto resistive random access memory (MRAM) arrays and resistive random access memory (ReRAM) arrays.
  • PCM phase-change memory
  • MRAM magneto resistive random access memory
  • ReRAM resistive random access memory
  • deep learning may be accomplished by, or facilitated by, deep learning accelerators (DLAs), e.g., microprocessor devices designed to accelerate the generation of deep neural networks (DNNs) to implement deep learning.
  • DLAs deep learning accelerators
  • DNNs deep neural networks
  • methods and apparatus are disclosed for implementing DLAs or other neural network components within the die of an NVM using, for example, under-the- array circuit components.
  • DLA NAND arrays or DLA NAND architectures.
  • synaptic weight values are stored vertically within a die (such as within a 3D flash NAND array) in blocks so that synaptic values that belong to different neurons can be sensed and processed in parallel.
  • a DNN is an example of an artificial neural network that has multiple layers between input and output layers.
  • a DNN operates to determine a mathematical computation or manipulation to convert the input into the output, which might be a linear or non-linear computation.
  • the DNN may work through its layers by calculating a probability of each output.
  • Each mathematical manipulation may be considered a layer.
  • Networks that have many layers are referred to as having "deep" layers, hence the term DNN.
  • the DNN might be configured to identify a person within an input image by processing the bits of the input image to yield identify the person, i.e. the output of the DNN is a value that identifies the particular person.
  • DNNs are often configured as feedforward networks, in which data flows from an input layer to an output layer in one direction.
  • the DNN may generate a map of virtual "neurons” and assign initial numerical values or "weights" to connections between the neurons.
  • the weights and inputs are multiplied to return output values between, e.g., 0 and 1.
  • the weights may be adjusted in an attempt to improve the accuracy by which the network relates its input to a known output (to, for example, correctly identified an input image).
  • a feedforward computation for a single neuron activation in DNN is given by Equation 1 below, where multiply-accumulate (MAC) operations using synaptic weights are summed and then an activation function is calculated, which is often a maximum function (such as a rectifier linear activation function computed by a rectifier linear unit (RLU or ReLU)) or a sigmoid function. That is, in some examples, the feedforward computation involves a sum over weights ( w ) multiplied by input values (a) to each neuron in the network plus a bias value (b), the result of which is then applied to a sigmoid activation function (s) to yield the next value in the network.
  • Equation 1 multiply-accumulate (MAC) operations using synaptic weights are summed and then an activation function is calculated, which is often a maximum function (such as a rectifier linear activation function computed by a rectifier linear unit (RLU or ReLU)) or a sigmoid function. That is, in some examples
  • Equation 1 w l jk , denotes the weight for a connection from a k th neuron (or node) of the neural network) in an (/- 1 ‘ layer of the neural network to a j th neuron in an I th layer.
  • Equation 1 the sum is over all neurons k in the ( l- ⁇ ) th layer. That is, for each layer, the weight w of each of the neurons in the layer is multiplied by a corresponding activation value for the neuron, the values of this intermediate computation are summed together.
  • the zeroth layer of the neural network may be referred to as the input layer
  • the first layer of the neural network may be referred to as the first hidden layer
  • the final layer of the neural network may be referred to as the output layer.
  • DLA learning schemes may be based on solving backpropagation equations to update the network weights (w).
  • Exemplary backpropagation equations are based on weighted sums using calculated d terms (in the equations below in a matrix and vector form) for the output and so-called hidden layer neurons in the DNN (i.e. the intermediate layers between the input layer and the output layer) and wherein training values are employed.
  • Error values d may be defined based on the cost function and a weighted input values z
  • d is the error of a neuron j in a layer l and where z l j is a weighted input for the neuron j in the layer /. It is noted that the error d3 ⁇ 4 is equal to a rate of change of C relative to the bias value b for the /th neuron of the /th layer, e.g.: where d is evaluated at the same neuron as the bias b.
  • T of Eq. (6) indicates a matrix transpose
  • s' of Eq. (6) denotes a derivative of the sigmoid function s
  • Q denotes a Hadamard product, i.e. an elementwise product of two vectors.
  • the desired outputs, y(x) sometimes called in the literature “learning labels” or “learning targets” of a supervised learning scheme may be provided by the user/host device to the DLA NAND.
  • Some aspects disclosed herein relate to configuring under-the-array (or next-to-the- array) components of a NAND die to implement feedforward neural network operations and computations.
  • the main examples discussed are under-the- array examples, but the on-die logic/circuit can be also implemented, in at least some examples, as next-to-the-array logic/circuit. That is, the disclosure herein is not limited to under-the- array circuitry.
  • Other aspects relate to configuring the under-the-array components to implement backpropagation operations and computations.
  • Still other aspects relate to using a NAND-based on-chip copy function to update synaptic weights during backpropagation operations.
  • a controller e.g., a SSD controller
  • the SSD controller is provided with flash translation layer (FTL) tables configured for efficient use with the types of neural network data stored in the NVM die, such as FTL tables configured for use with synaptic weights whose values may change but whose overall structure typically does not change.
  • FTL flash translation layer
  • a high performance DNN system includes flash NAND dies with under-the-array circuitry to perform computations based on data and weights store in NAND data blocks.
  • the aforementioned feedforward MAC operations e.g. the weighted sum of Eq. 1, are implemented by a NAND die for a very large number of neuron cells in parallel (e.g., -4000 cells per die plane) with no need to transfer the stored weights data to the NAND controller or to a host device.
  • the aforementioned backpropagation operations also may be implemented by the NAND die without the need to transfer adjusted weights data to the NAND controller or to the host device. That is, in some examples, the learning backpropagation equations used for training the DLA of the NAND die are performed by under-the-array components NAND die. In some examples, the synaptic weights stored within NAND blocks are updated using an off-chip read-modify-write operation where the read-modify-write utilizes an external component such as a dynamic RAM (DRAM). In other examples, a NAND-based on-chip- copy operation is used to update the synaptic weights.
  • DRAM dynamic RAM
  • the on-chip copy involves self-folding three single layer cell (SLC) pages into a single tri-layer cell (TLC) word line (WL) having an upper, middle and lower pages, e.g. a weight-adapting on- chip copy operation is disclosed. That is, on-chip copy operation is generalized or modified herein to include logic and/or mathematical operations (e.g. the backpropagation equations above) before the data is folded and written back to a WL. In other examples, other multi level cells (MLCs) such as quad-level cells (QLCs) may be used.
  • MLCs multi level cells
  • QLCs quad-level cells
  • the weight-adapting on- chip copy operation may be, e.g., SLC to SLC, SLC to MLC, TLC, QLC and MLC to MLC, TLC to TLC, and/or QLC to QLC.
  • the learning backpropagation equations and the read-modify-write operations may be performed by a storage device controller with partial (or without any) NAND die assistance.
  • the FTL components operate to maintain control tables that associate the host data to the relevant neural network weights that the NAND die should use when reading the data, as well as information regarding the location of the weights in the NAND die (that is, the Physical Block Address (PBA)).
  • PBA Physical Block Address
  • the association between host data to weights can conform to a certain ratio, e.g., 32K of weights for each full SLC host block.
  • the weights are stored in the NAND die in separate blocks, which allows the NAND to perform certain maintenance operation on these blocks separately.
  • the FTL of the storage device controller is configured to support the DLA learning process by allocating a new target block in the NAND die for each source block or a new MLC target block for several source SLC blocks in case of SLC to MLC copy.
  • the generalized weight- adapting on-chip copy operation i.e. on-chip copy with update operation
  • PBAs physical block address
  • the FTL component releases the source blocks and updates the physical block address (PBA) of the weights that were copied.
  • the NAND die receives the updated PBAs for the neural network weights from the controller (e.g., as part of the command).
  • a first FTL table maps a neural-network-weight unit to a virtual location represented by a "virtual-block-ID" (along with, in some examples, a page-in-block identifier).
  • the virtual-block-ID corresponds to a physical location in the NAND die but identifies the physical location using a block-ID that is logical.
  • a second FTL table maps the virtual-block-ID to a "physical -block- ID.”
  • the FTL components of the controller need not search for "weight units” that were copied (by, e.g., scanning FTL tables, by reading the headers in the block, or by maintaining a reverse table, etc.).
  • the FTL components need not update each "weight unit” separately but rather may just update a single entry in the second FTL table (which maps the association of the virtual-block-ID into a new physical-block-ID) so as to simplify the FTL and reduce overhead.
  • the weight units may include or correspond to or be otherwise related to the synaptic weights stored in the NAND die.
  • the virtual-block-ID represents a block at the size of an SLC block, and each MLC block is associated with several virtual-block-IDs, each mapping a relative portion of the block.
  • the 2 nd FTL table maps a virtual-block-ID to a physical SLC block or to a portion of an MLC block.
  • garbage collection, compaction operations, wear leveling and other flash management operations may be required for the NAND blocks that store the synaptic weights as each "weight” unit is associated with separate host-data portion, which may get invalidated or updated separately.
  • the use of the two FTL tables, i.e. a "weight"-to-virtual-location (with “virtual-block-ID”) table and a "virtual-block-ID "-to- "physical-block- ID” table may be quite beneficial for the NAND die array, since the basic maintenance operation of DLA is done in full block granularity, for which NAND flash management operations can be minimized or reduced, thus simplifies the system by allowing independent updates by the NAND and also providing higher performance to the host.
  • An advantage of at least some of the exemplary DLA NAND architectures and systems describe herein is that only the final result of a DLA procedure is transferred back to the controller, thus avoiding the transfer time of all 64 WLs (for example) of a NAND block.
  • the DLA NAND dies described herein are different from graphics processing unit (GPUs) in that a GPU transfers calculated data from its NVM to a volatile RAM / DRAM, whereas the DLA computations described in various examples herein are done by the NAND dies.
  • the DLA NAND die includes under-the-array logic for performing the logic and/or mathematical operations, storing temporary results, performing the back propagation computations, generalized on-chip copies, and other on-chip operations.
  • a DLA NAND architecture is disclosed that offloads DLA computations from host devices or other devices and instead performs DLA computations for DNN processing in memory using synaptic weights and other DNN data.
  • FIG. 1 is a block diagram of a system 100 including an exemplary SSD having an NVM with under-the-array deep learning DLA components in accordance with aspects of the disclosure.
  • the system 100 includes a host 102 and a SSD 104 coupled to the host 102.
  • the host 102 provides commands to the SSD 104 for transferring data between the host 102 and the SSD 104.
  • the host 102 may provide a write command to the SSD 104 for writing data to the SSD 104 or read command to the SSD 104 for reading data from the SSD 104.
  • the host 102 may be any system or device having a need for data storage or retrieval and a compatible interface for communicating with the SSD 104.
  • the host 102 may a computing device, a personal computer, a portable computer, a workstation, a server, a personal digital assistant, a digital camera, or a digital phone as merely a few examples.
  • the host 102 may be a system or device having a need for neural network processing, such as speech recognition, computer vision, and self-driving vehicles.
  • the host 102 may be a component of a self-driving system of a vehicle.
  • the SSD 104 includes a host interface 106, a controller 108, a memory 110 (such as a random access memory (RAM)), an NVM interface 112 (which may be referred to as a flash interface), and an NVM 114, such as one or more NAND dies.
  • the host interface 106 is coupled to the controller 108 and facilitates communication between the host 102 and the controller 108.
  • the controller 108 is coupled to the memory 110 as well as to the NVM 114 via the NVM interface 112.
  • the host interface 106 may be any suitable communication interface, such as an Integrated Drive Electronics (IDE) interface, a Universal Serial Bus (USB) interface, a Serial Peripheral (SP) interface, an Advanced Technology Attachment (ATA) or Serial Advanced Technology Attachment (SATA) interface, a Small Computer System Interface (SCSI), an IEEE 1394 (Firewire) interface, or the like.
  • the host 102 includes the SSD 104.
  • the SSD 104 is remote from the host 102 or is contained in a remote computing system communicatively coupled with the host 102.
  • the host 102 may communicate with the SSD 104 through a wireless communication link.
  • the controller 108 controls operation of the SSD 104.
  • the controller 108 receives commands from the host 102 through the host interface 106 and performs the commands to transfer data between the host 102 and the NVM 114.
  • the controller 108 may manage reading from and writing to memory 110 for performing the various functions effected by the controller and to maintain and manage cached information stored in memory 110.
  • the controller 108 may include any type of processing device, such as a microprocessor, a microcontroller, an embedded controller, a logic circuit, software, firmware, or the like, for controlling operation of the SSD 104. In some aspects, some or all of the functions described herein as being performed by the controller 108 may instead be performed by another element of the SSD 104.
  • the SSD 104 may include a microprocessor, a microcontroller, an embedded controller, a logic circuit, software, firmware, or any kind of processing device, for performing one or more of the functions described herein as being performed by the controller 108. According to other aspects, one or more of the functions described herein as being performed by the controller 108 are instead performed by the host 102. In still further aspects, some or all of the functions described herein as being performed by the controller 108 may instead be performed by another element such as a controller in a hybrid drive including both non-volatile memory elements and magnetic storage elements.
  • the memory 110 may be any suitable memory, computing device, or system capable of storing data.
  • the memory 110 may be ordinary RAM, DRAM, double data rate (DDR) RAM (DRAM), static RAM (SRAM), synchronous dynamic RAM (SDRAM), a flash storage, an erasable programmable read-only-memory (EPROM), an electrically erasable programmable ROM (EEPROM), or the like.
  • the controller 108 uses the memory 110, or a portion thereof, to store data during the transfer of data between the host 102 and the NVM 114.
  • the memory 110 or a portion of the memory 110 may be a cache memory.
  • the NVM 114 receives data from the controller 108 via the NVM interface 112 and stores the data.
  • the NVM 114 may be any suitable type of non-volatile memory, such as a NAND-type flash memory or the like.
  • the controller 108 may include hardware, firmware, software, or any combinations thereof that provide a deep learning neural network controller 116 for use with the NVM array 114.
  • the neural network controller 116 may be configured with FTL components (not shown in FIG. 1) that include first and second tables configured as discussed above to work efficiently with DNN array data stored in the NVM array 114.
  • FIG. 1 shows an example SSD and an SSD is generally used as an illustrative example in the description throughout, the various disclosed embodiments are not necessarily limited to an SSD application/implementation.
  • the disclosed NVM die and associated processing components can be implemented as part of a package that includes other processing circuitry and/or components.
  • a processor may include, or otherwise be coupled with, embedded NVM and associated circuitry and/or components for deep learning that are described herein.
  • the processor could, as one example, off-load certain deep learning tasks to the NVM and associated circuitry and/or components.
  • the controller 108 may be a controller in another type of device and still include the neural network controller 116 and perform some or all of the functions described herein.
  • FIG. 2 illustrates a block diagram of an exemplary NVM die 200 that includes NVM storage array components 202 and under-the-array or next-to-the-array (or other extra array) processing components 204 (processing components 204).
  • processing components 204 processing components 204
  • the NVM array components 202 include NVM storage 206 configured for storing neural network synaptic weights and NVM storage 208 configured for storing other data such as neural network bias values, training values, etc. Note that the data stored in NVM storage 208 may include non-neural network related data.
  • the NVM processing components 204 include feedforward components 210 configured to perform feedforward neural network operations, such as computing values in accordance with Equation 1, above.
  • the feedforward components 210 include: a set of multiplication circuits 212 configured to operate in parallel to compute the products of synaptic weights and activation values (as in, e.g., Equation 1); a set of summation circuits 214 configured to operate in parallel to sum such products (as in, e.g., Equation 1); a set of bias addition circuits 216 configured to operate in parallel to add bias values to the sums (as in, e.g., Equation 1); and a set of RLU/sigmoid function circuits 218, configured to operate in parallel to compute RLU or sigmoid functions of the resulting values (as in, e.g., Equation 1).
  • the RLU function is more typically used within deep neural networks currently, as opposed to a sigmoid.
  • FIG. 2 only four instances of each of the aforementioned feedforward circuits are shown; however, it should be understood that far more circuits can be configured in parallel with, e.g., separate circuits provided for each of the N layers of a neural network.
  • the NVM processing components 204 also include backpropagation components 220 configured to perform backpropagation neural network operations, such as to compute values in accordance with Equations 5-8, above.
  • the backpropagation components 220 include: a set of weight update determination circuits 222 configured to operate in parallel to compute updates to the synaptic weights (as in, e.g., Equations 5-8) and a set of synaptic weight update circuits 224 configured to operate in parallel to update the synaptic weights stored in NVM storage 206 using the updates computed by circuit 222.
  • the update exploits one or more on-chip copy with update circuits 226.
  • the feedforward operations and backpropagation operations may be performed iteratively or sequentially using the various weight and bias values of a neural network stored in the NVM array 202, as well as activation values or training values input from an SSD.
  • default values for the synaptic weights and biases may be input and stored in the NVM array 202.
  • a set of weights and biases are already stored for use.
  • a current set of synaptic weights w for the neurons of the first layer of the neural network are sensed from NVM storage 206.
  • the multiplication circuits 212 and the summation circuits 214 may include various components arranged in parallel to multiply individual synaptic weights w with the corresponding activation values a and then sum the results for all of the neurons of the network.
  • Bias values b are sensed from NVM storage 208 and added to the output of the summation circuit 214 using the bias addition circuits 216.
  • the sigmoid function (or RLU) for each result is then computed using the sigmoid/RLU function circuits 218 to yield resulting activation values (e.g. the activation a of a j th neuron in the next layer).
  • the weight update determination circuits 222 then perform the computations of Equations 5-8, above, to generate updates to the synaptic weights.
  • the updates are applied to the stored synaptic weights of NVM storage 206 by update circuits 224.
  • the synaptic weight update circuits 224 exploit an off-chip read-modify-write operation to store the updated synaptic weights within the NVM storage 206.
  • the off-chip read-modify- write operation may be performed in conjunction with a separate component such as a DRAM of the SSD controller.
  • a separate component such as a DRAM of the SSD controller.
  • the NAND-based on-chip copy with update circuit 226 performs the weight update operation, without the need for an external component to perform the update.
  • FIG. 3 illustrates selected components of an NVM die 300 that highlights particular exemplary feedforward components.
  • the die 300 includes an NVM array 302 and under-the-array or next-to-the-array components 304 (processing components 304).
  • the under-the-array or next-to-the-array components may also be generally regarded as, or referred to as, extra-array components in the sense that they are formed, mounted, or positioned outside of the array, or may be referred to as ancillary components, auxiliary components, non-array components, non-top-of-the-array components, or non-embedded-in- the-array components.
  • the die 300 is shown coupled to an input data latch or register 305.
  • the data latch is an under-the-array component of the die.
  • the latch might be a separate component, such as volatile memory register.
  • Latch 305 is shown separately from the die for the sake of generality.
  • latch 305 is configured to store sixty-four (64) entries (X I -X M ) ⁇
  • Die 300 also includes a set of N NVM blocks, labeled 306 through 306 N - These may be used to store synaptic weights for each of the N layers of a neural network, where N may be, for example, 1000. That is, in some examples, 1000 such NAND blocks are stored on the die. Other values, such as bias values, may be stored elsewhere, such as within a set of user data blocks 309.
  • the die 300 includes a corresponding under-the-array multiplexer (MUX), 308
  • MUX under-the-array multiplexer
  • the processing components 304 additionally include: a set of MAC circuits, labeled 312o through 312 N ; a sense latch 316; and an accumulator latch 318.
  • the N MAC units are configured in this example to perform the aforementioned multiply- accumulate computations and to add the bias value and compute the sigmoid/RLU functions (so that separate bias and sigmoid/RLU components are not needed).
  • a current set of synaptic weights w for the neurons of the first layer of the neural network are sensed from the first NAND block 306i into the sense latch 316 (e.g. a first WL of data is read from the NAND die) and an initial set of input values (which may be the aforementioned activation values a for the neurons of the zeroth or input layer) are input from the controller 108 of FIG. 1 into latch 305. (Alternatively, such input values a may be obtained from data blocks 309, if already stored therein)
  • the set of MAC components 312 operate in parallel to perform the operations of Equation 1 to yield a resulting activation value (e.g.
  • These operations and computations may utilize sense latch 316 and accumulator latch 318. Intermediate values may be stored, as needed, in latch 305 or in other storage elements, not shown. For example, the result of the feedforward operations for the first layer may be stored in sense latch 316 with the values from each layer accumulated in latch 318. These operations proceed layer by layer until each of the layers has been processed and the final result is stored in accumulator latch 318.
  • the values of the accumulator latch 318 may be output to a separate device, such as the SSD controller that is controlling the NVM (using an output component not shown).
  • an individual synaptic weight is represented by four (4) bytes, and so four thousand (4000) synaptic weights may be stored in a NAND page of 16 K bytes.
  • a typical NAND sense operation typically takes about 50 microseconds and so, if there are 4000 weight values in a page and thirty-two planes (on sixteen dies) in the storage device being operated in parallel, 3.56 million MACs per second per SSD may be achieved.
  • an array of SSDs may be used so as to multiply the computing power of the overall system with 3.56 million MACs per SSD.
  • FIGS. 5 and 6 and discussed below instead use one (or several) configurable MUX(es) with the configurable MUX(es) updated for each block and layer currently computed, and likewise for the MAC units. That is, rather than having N MUXes and N MACs, the die includes, for example, M MUXes and M MACs, where M ⁇ N. (This is indicated in FIG. 3.) In this regard, in some examples, it may not be feasible to implement all N MAC units for the entire network in parallel as it might cost too much and consume too much power.
  • FIG. 3 primarily illustrates an example where there is one MAC and one MUX per of the N layers, but it also indicates that there can be fewer MACs and MUXes, configured as just described, e.g. M such components with M ⁇ N. Note also that in some examples there may be a different number of MACs than MUXes. For example, there might be one MUX and M MACs.
  • FIG. 4 illustrates an exemplary NAND block 400, which has data stored in sixty- four WLs, 402 O -302 63 .
  • FIG. 4 also illustrates an input latch 404 that has one entry per WL, 402 O 302 63 .
  • Each WL of the NAND block 400 stores N weight values W.
  • the weights of WL 0 are denoted W ⁇ o-W -p
  • the weights of WL 1 are denoted W2 ,O -W3 ⁇ 4 N ; and so on.
  • the indices are, of course, arbitrary and different indices may be used.
  • the latch 404 might have more or fewer entries and might store, e.g., N entries, rather than only sixty-four.
  • weight values w may be sensed from a WL (and stored in the sense latch 316 shown in LIG. 3), then multiplied against activation values a stored in the input latch and summed (using MACs 212 I -212 n ).
  • bias values b may be sensed during the feedforward computations and added to the summed MAC output values, with the results applied to the sigmoid function to compute a next set of activation values.
  • weight values w may be sensed from the WLs, updated using the aforementioned backpropagation components, which compute values using the formulae of Equations 5-8, above, based on the desired (e.g. known) output value.
  • the updated weight values may be saved in the NVM using off-chip read-modify-write or NAND-based on-chip copy (with the updated values stored in a different NAND block of the NVM array).
  • FIG. 5 illustrates a method 500 according to aspects of the present disclosure, which summarizes aspects of DLA processing and components employed to implement a method for feedforward computations where, for example, fewer than N MUXes and N MACs can be used for a neural network with N layers.
  • data is input from a NAND array using an interface latch, such as a sense latch.
  • DNN data in the latch is multiplexed in accordance with a MUXing configuration specified by a learning network configuration 506.
  • the MUXing configuration may, for example, specify the manner with which values sensed from the NAND blocks are routed to various MACs to enable feedforward processing.
  • the MUXing configuration at block 506 may define full or partial connectivity between layers, where in some cases not all neuron outputs of a previous layer are connected to neurons of the next layer.
  • the learning network configuration may also specify a current set of synaptic weights, bias values, etc. That is, the learning network configuration may be representative of the current configuration of the DNN.
  • the multiplexed data is applied to neuron accelerator components (e.g., a set of MACs, bias adders, RLU or sigmoid function circuits, etc.) along with synaptic weights of the network configuration 506 for the layer.
  • neuron accelerator components e.g., a set of MACs, bias adders, RLU or sigmoid function circuits, etc.
  • the output of the accelerator components (such as, e.g., feedforward activation values for the next layer of the network) is stored in another interface latch, e.g. an accumulator latch.
  • the operations of blocks/components 502, 504, 508 and 510 may be repeated for each layer of the DNN, with the final output returned to the NAND array for storage or output to the SSD controller and then to a host device, such as a self-driving vehicle control system.
  • FIG. 6 illustrates a method 600 according to aspects of the present disclosure, which summarizes various aspects of feedforward DLA processing for an example where the input data (e.g. activation values) are read from a NAND user block.
  • data is read from the NAND user data block and stored in an under-the-array input data latch.
  • a set of weights are read (sensed) from a weight block word line (WL) n.
  • the sensed weights and the data from the input data register are multiply-accumulated and stored in an under-the-array accumulator in parallel for all neurons in a current layer. If, at decision block 608, the current iteration is not the last WL, processing returns to read another WL of weights at block 604 and another set of MAC operations are performed at block 606 for the same layer. The procedure repeats for each of the WLs of synaptic weight data, then advances to the next layer.
  • processing returns to read a WL of weights at block 604 for the next layer and another set of MAC operations are performed at block 606 for that next layer.
  • the procedure repeats for each of the WLs of synaptic weight data of that next layer.
  • the final output is transferred to the SSD controller.
  • the final output may be an indicator that identifies the object, or a set of values that the SSD (or the host) can then use to identify the object.
  • FIG. 7 illustrates a method 700 according to aspects of the present disclosure, which summarizes aspects of backpropagation DLA processing for an example where the input data (e.g. training values) are read from a NAND user block.
  • training data is read from the NAND user data block and stored in an under-the-array input data latch.
  • a set of weights updates are calculated according to backpropagation equations per layer using under-the-array logic.
  • the weights for a current WL are read into an under-the-array latch, the weights are updated with the calculated updates, and the updated weights are stored in a new NAND block using an off-chip read-modify- write operation (e.g.
  • a read-modify-write using a separate DRAM) or a NAND-based on-chip copy with update If, at decision block 708, the current iteration was not the last WL, processing returns to block 704 and block 706 to update weights for a next WL for the same layer. The procedure repeats for each of the WLs, then advances to the next layer. If, at decision block 710, the current layer is not the last layer, processing returns to blocks 704 and 706 for the next layer. Once all layers have been processed, a final output may be transferred to the SSD controller and then to a host block 712. The final output might be a value indicating a final trained output result. Updating Synaptic Weights using NAND-based On-Chip Copy with Update Operation
  • FIG. 8 summarizes NAND-based on-chip copy and update procedures 800 for use with NVM-based neural network operations.
  • a data storage apparatus senses neural network data of a neural network (e.g. synaptic weights) stored within NAND NVM elements of the data storage apparatus (such as from a set of NAND storage elements).
  • the synaptic weights may be read or sensed, for example, by under-the-array components of a NAND die, as explained above.
  • the data storage apparatus performs a neural network operation on the sensed neural network data, wherein the neural network operation modifies at least some of the neural network synaptic weight data.
  • the neural network operation may be, for example, a backpropagation operation performed on synaptic weights stored in a set of NAND elements.
  • the data storage apparatus performs an NAND-based on-chip copy and update operation to save the modified neural network data within the NAND NVM elements.
  • the NAND-based on-chip copy with update may use under-the-array circuit components (as shown in FIGS. 2, discussed above).
  • NAND-based on-chip copy and update or “NAND-based on- chip copy with update” or “weight- adapting on-chip copy” is a type of read-modify-write operation to update values stored in a NAND array where the read-modify-write is implemented without an off-chip component such as a DRAM.
  • a NAND die may be configured with a fixed number of blocks that run in SLC mode, while others run in TLC mode. When data is moved from the SLC to the TLC portion, the transfer is performed internally in the die, using the on-chip copy.
  • an SLC to TLC transfer is performed like a wear-leveling operation by using the NAND interface (e.g., Toggle or ONFI) and an off-chip DRAM to move the data.
  • Overhead can be reduced using NAND-based on-chip copy with update because the copy is done within the die and using volatile latches in the die to store temporarily the three pages. Since an SLC block is often exactly one third of a TLC block, three SLC blocks may be folded into one TLC block. Note that NAND-based on-chip copy with update need not always employ TLC. In some cases, other types of single or MLC blocks might be used or, as noted below, in some examples, SLC to SLC on-chip copy and update may be performed.
  • NAND-based on-chip copy and update the die first reads weights from a first NAND block into a latch, modifies the weights according to a neural network backpropagation learning scheme in the latch, then writes updated weights from the latch to a new physical block that was previously erased, where the weight update is performed for the full block, and where flash management tables are updated accordingly.
  • FIG. 9 illustrates an exemplary NAND-based on-chip copy and update procedure 900 for use with backpropagation neural network operations.
  • synaptic weights are read from a first set of NAND elements, where the first set of NAND elements are SLC elements or TLC elements. The synaptic weights may be read or sensed, as noted, by under- the-array components of the NVM die.
  • backpropagation weight updates are determined to the synaptic weights by, for example, the above-described backpropagation components or circuits that compute values in accordance with Equations 5-8, above.
  • the synaptic weights are updated using the weight updates by, for example, replacing synaptic weights maintained in a latch with updated values.
  • an on- chip copy with update circuit performs an on-chip copy to store the updated synaptic weights in a second set of NAND elements, where the second set of NAND elements are SLC, MLC, TLC or QLC elements, and where the on-chip copy is SLC to SLC, SLC to MLC, TLC, QLC and MLC to MLC, TLC to TLC and/or QLC to QLC.
  • a first LTL table maps a neural network weight unit to a virtual-block-ID, which corresponds to a physical location in the NAND die but identifies the physical location using a block-ID that is logical.
  • the virtual-block-ID may also have a corresponding a page-in-block identifier.
  • a second LTL table maps the virtual-block-ID to a physical-block- ID.
  • FIG. 10 illustrates exemplary first and second FTL mapping tables.
  • a first (or primary) FTL mapping table 1002 includes a set of entries each of which includes a host neural network weight unit 1004 and a corresponding virtual -block- ID 1006.
  • a second (or secondary) FTL mapping table 1008 includes a set of entries each of which includes one of the virtual-block-ID's 1006 and a corresponding physical-block-ID 1020.
  • an input host neural network weight unit 1012 is applied to the first FTL mapping table 1002 to output a particular virtual-block-ID 1014, which is applied to the second FTL mapping table 1008 to output a corresponding particular physical-block-ID 1016.
  • FIG. 11 provides an example 1100 of processing performed by an SSD controller (or similar apparatus) equipped with the FTL tables of FIG. 10 for use with an NVM array that uses NAND-based on-chip copy w/update to update synaptic weights for backpropagation.
  • the controller generates a first FTL mapping table that maps neural network weight units to corresponding virtual locations within a NVM NAND array, where a virtual location is represented, for example, by a virtual-block-ID corresponding to a physical location in the NVM NAND array.
  • the controller generates a second FTL mapping table that maps virtual location block identifiers (e.g. virtual-block-IDs) to corresponding physical location block identifiers (e.g.
  • the controller converts a neural network weight unit received from a host (coupled to the controller) to a virtual block identifier (e.g. a virtual-block-ID) using the first FTL mapping table.
  • the controller converts the virtual block identifier (e.g. the virtual- block-ID) to a corresponding virtual block identifier (e.g. the physical-block-ID) using the second FTL mapping table.
  • the controller sends the physical block identifier (e.g.
  • the controller receives an on-chip copy command completion response from the NVM array providing the physical block identifiers (e.g. physical-block-ID) of synaptic weights (or other neural network data) within the NVM array that have been updated using the on-chip copy w/update feature.
  • the controller applies the physical block identifiers (e.g. physical-block-ID) received from the NVM array to the second FTL mapping table to update values in the second table that map the physical block identifiers to virtual block identifiers.
  • FIG. 12 illustrates a process 1200 in accordance with some aspects of the disclosure.
  • the process 1200 may take place within any suitable apparatus or device capable of performing the operations, such as a NAND die of an NVM array.
  • the memory apparatus e.g. a NAND die
  • the apparatus senses a plurality of the synaptic weight neural network values in parallel from the word lines of the NVM elements.
  • the apparatus performs neural network operations in parallel using the sensed neural network synaptic weight values, wherein the neural network operations are performed in parallel by a plurality of neural network processing components formed within the die.
  • the neural network processing components may include one or more circuits formed under-the-array or next-to- the-array within a NAND die.
  • the synaptic weight values may be stored vertically on separate or different word lines (such as within a 3D NAND).
  • the neural network processing components may include, e.g., a set of MAC circuits that operate in parallel.
  • neural network operations may include feedforward operations or backpropagation operations and may exploit various other types of additional neural network data such as activation values, bias values, etc.
  • FIG. 13 illustrates a process 1300 in accordance with other aspects of the disclosure.
  • the process 1300 may take place within any suitable apparatus or device capable of performing the operations, such as a NAND die of an NVM array.
  • an apparatus e.g. a NAND die
  • the apparatus maintains synaptic weights within a separate NAND data block of the NVM array.
  • the apparatus transfers the neural network data from the user data blocks to an input latch coupled to the die.
  • the apparatus senses the synaptic weights from a set of word lines.
  • the apparatus performs a set of MAC operations in parallel using a set of MAC components formed under-the-array or next-to-the array within the die of the NVM array, where each of the set of MAC operations is performed using a portion of the neural network data and corresponding synaptic weights.
  • the apparatus accumulates the results of the set of MAC operations in an accumulator latch within the die. Examples were described above.
  • FIG. 14 illustrates a feedforward process 1400 in accordance with still other aspects of the disclosure.
  • the process 1400 may take place within any suitable apparatus or device capable of performing the operations, such as a NAND die of an NVM array.
  • the apparatus inputs neural network data (such as activation values for a neural network with L layers and N neurons) from an external device or from NAND memory blocks.
  • the apparatus stores the neural network input data in an under-the-array input data latch.
  • the apparatus for each of the L layers of the neural network, and for each of N word lines of synaptic weights, the apparatus senses the synaptic weights from an nth word line of an Ith layer, multiply-accumulates the neural network input data and the synaptic weights corresponding to the nth word line of the Ith layer, and stores the results in an accumulator in parallel with other results from the Ith layer.
  • the apparatus outputs the final value of the accumulator to, for example, an SSD controller for forwarding to a host device.
  • FIG. 15 illustrates a backpropagation process 1500 in accordance with still other aspects of the disclosure that employs an off-chip read-modify- write.
  • the process 1500 may take place within any suitable apparatus or device capable of performing the operations, such as a NAND die of an NVM array.
  • the apparatus inputs training data for a neural network with L layers and N neurons from an external device or reads the data from NAND elements of the die.
  • the apparatus stores the training data in an under-the-array data latch.
  • the apparatus determines backpropagation weight updates for an Ith layer of the neural network, senses the synaptic weights from an nth word line of the Ith layer from a first data block of the NAND, updates the synaptic weights corresponding to the nth word line of the Ith layer, and stores the updated synaptic weights in a second (different) data block of the NAND using an off-chip read-modify-write operation, i.e. a read-modify-write that employs a device external to the chip, such as a DRAM, to facilitate the read-modify-write.
  • an off-chip read-modify-write operation i.e. a read-modify-write that employs a device external to the chip, such as a DRAM, to facilitate the read-modify-write.
  • FIG. 16 illustrates a backpropagation process 1600 in accordance with still other aspects of the disclosure that employs an on-chip copy with update.
  • the process 1600 may take place within any suitable apparatus or device capable of performing the operations, such as a NAND die of an NVM array.
  • the apparatus reads training data for a neural network with L layers and N neurons from a first set of NAND elements of a die of an NVM.
  • the apparatus stores the training data in an under-the-array data latch.
  • the apparatus determines backpropagation weight updates for an Ith layer of the neural network, senses the synaptic weights from an nth word line of the Ith layer from a second set of NAND elements of the die of the NVM, updates the synaptic weights corresponding to the nth word line of the Ith layer, and performs an on-chip copy with update to store the updated synaptic weights in a third data block of the set of NAND elements of the die of the NVM.
  • FIG. 17 illustrates a process 1700 in accordance with still other aspects of the disclosure.
  • the process 1700 may take place within any suitable apparatus or device capable of performing the operations, such as the SSD controller for use with an NVM array having one or more NAND dies equipped with on-chip copy with update.
  • an apparatus e.g. a controller
  • the apparatus generates a second mapping table that maps a virtual location block identifier to a physical location block identifier.
  • the apparatus converts a neural network weight unit received from a host to a virtual location block identifier using the first table.
  • the apparatus converts the virtual location block identifier to a physical location block identifier using the second table.
  • the apparatus sends the physical location block identifier(s) to the memory array for processing in connection with the on-chip copy with update component of the memory array.
  • FIG. 18 illustrates an embodiment of an apparatus 1800 configured according to one or more aspects of the disclosure.
  • the apparatus 1800, or components thereof could embody or be implemented within a NAND die or some other type of NVM device that supports data storage.
  • the apparatus 1800, or components thereof could be a component of a processor, a controller, a computing device, a personal computer, a portable device, or workstation, a server, a personal digital assistant, a digital camera, a digital phone, an entertainment device, a medical device, a self-driving vehicle control device, or any other electronic device that stores, processes or uses neural data.
  • the apparatus 1800 includes a communication interface 1802, a physical memory array (e.g., NAND blocks) 1804, a set or UA registers and/or latches 1806, and a set of under-the-array or next-to-the-array processing circuits 1810 (e.g., at least one UA processor and/or other suitable UA circuitry). These components can be coupled to and/or placed in electrical communication with one another via suitable components, represented generally by the connection lines in FIG. 18. Although not shown, other circuits such as timing sources, peripherals, voltage regulators, and power management circuits may be provided, which are well known in the art, and therefore, will not be described any further.
  • the communication interface 1802 provides a means for communicating with other apparatuses over a transmission medium.
  • the communication interface 1802 includes circuitry and/or programming (e.g., a program) adapted to facilitate the communication of information bi-directionally with respect to one or more devices in a system.
  • the communication interface 1802 may be configured for wire-based communication.
  • the communication interface 1802 could be a bus interface, a send/receive interface, or some other type of signal interface including circuitry for outputting and/or obtaining signals (e.g., outputting signal from and/or receiving signals into an SSD).
  • the communication interface 1802 serves as one example of a means for receiving and/or a means for transmitting.
  • the physical memory array 1804 may represent one or more NAND blocks.
  • the physical memory array 1804 may be used for storing data such as synaptic weights that is manipulated by the UA circuits 1810 or some other component of the apparatus 1800.
  • the physical memory array 1804 may be coupled to the UA circuits 1810 (via, e.g., registers/latches 1806) such that the UA circuits 1810 can read or sense information from, and write or program information to, the physical memory array 1804 (via, e.g., registers/latches 1806). That is, the physical memory array 1804 can be coupled to the UA circuits 1810 so that the physical memory array 1804 is accessible by the UA circuits 1810.
  • the UA registers/latches 1806 may include one or more of: an input latch 1812; a sensing latch 1814; an accumulator latch 1816; and one or more other latches or registers 1818.
  • the input latch might be separate from the NAND die.
  • the UA circuits 1810 are arranged or configured to obtain, process and/or send data, control data access and storage, issue or respond to commands, and control other desired operations.
  • the UA circuits 1810 may be implemented as one or more processors, one or more controllers, and/or other structures configured to perform functions.
  • the UA circuits 1810 may be adapted to perform any or all of the under- the-array features, processes, functions, operations and/or routines described herein.
  • the UA circuits 1810 may be configured to perform any of the steps, functions, and/or processes described with respect to FIGS. 2-9 and 12-16.
  • the term "adapted" in relation to the processing circuit 1810 may refer to the UA circuits 1810 being one or more of configured, employed, implemented, and/or programmed to perform a particular process, function, operation and/or routine according to various features described herein.
  • the UA circuits 1810 may include a specialized processor, such as an application specific integrated circuit (ASIC) that serves as a means for (e.g., structure for) carrying out any one of the operations described in conjunction with FIGS. 2-9 and 12-16.
  • ASIC application specific integrated circuit
  • the UA circuits 1810 serves as one example of a means for processing.
  • the UA circuits 1810 may provide and/or incorporate, at least in part, the functionality described above for the UA components 204 of FIG. 2.
  • the processing circuit 1810 may include one or more of: circuit/modules 1820 configured to perform feedforward operations in parallel; circuit/modules 1822 configured to perform backpropagation operations in parallel; a circuit/module 1824 configured to input neural network input (e.g. activation) data; a circuit/module 1826 configured to input neural network training data (e.g.
  • circuit/modules 1828 configured to determine weight updates via backpropagation in parallel; circuit/modules 1830 configured to apply weight updates in parallel to weights stored in the physical memory array 1804; a circuit/module 1832 configured to perform an on-chip copy with update; and a circuit/module 1834 configured to generate on-chip copy completion responses for sending to an SSD controller (so that, for example, the SSD controller can updated FTL tables or the like; and a circuit/module 1836 configured to perform an off-chip read-modify-write operations (in conjunction with an external device such as an SSD controller). It is noted that in some examples on-chip copy and off-chip read-modify-write (in conjunction with an external device) might not both be provided.
  • the processing circuit 1810 may also include a circuit module 1838 configured to sense neural network data (such as synaptic weights) stored vertically on different or separate word lines within the NAND NVM elements. Still further, the processing circuit 1810 may include a circuit module 1839 for configuring MUX and/or MAC connectivity.
  • a circuit module 1838 configured to sense neural network data (such as synaptic weights) stored vertically on different or separate word lines within the NAND NVM elements.
  • the processing circuit 1810 may include a circuit module 1839 for configuring MUX and/or MAC connectivity.
  • the physical memory array 1804 may include one or more of: blocks 1840 for storing user input data; blocks 1842 for storing training data; blocks 1844 for storing synaptic weights; blocks 1846 for storing bias values; and blocks 1848 for storing other user data and/or system data (e.g. data pertaining to the overall control of operations of the NAND die).
  • means may be provided for performing the functions illustrated in FIG. 18 and/or other functions illustrated or described herein.
  • the means may include one or more of: means, such as circuit/module 1820, for performing feedforward operations; means, such as circuit/module 1822, for performing backpropagation operations; means, such as circuit/module/component 1824, for inputting neural network input (e.g. activation) data; means, such as circuit/module 1826, for inputting neural network training data (e.g.
  • the means may include one or more of: means, such as input latch 1812, for inputting neural network input data; means, such as NAND blocks 1844, for storing synaptic weights for a neural network within NVM elements of a die; and means, such as UA components 204 of FIG. 4, for performing a neural network operation using the neural network input data and the synaptic weights, wherein the neural network operation is performed, at least in part, by a neural network processing component formed within the die.
  • means such as UA processors 1810, are provided for performing a neural network operation using the neural network data, wherein the neural network operation is performed, at least in part, by a neural network processing component formed within the die of the data storage apparatus.
  • the means may include: means, such as NAND block 400 of FIG. 4, for storing neural network synaptic weight values for a neural network within a plurality of word lines of the NVM elements; means, such as latch 316 of FIG. 4, for sensing a plurality of the neural network synaptic weight values in parallel from the word lines of the NVM elements; and means, such as MACs 312 of FIG.
  • the means may include: means, such as wordlines 402 of FIG. 4, for storing neural network synaptic weight values for a neural network within NVM elements of a die of the apparatus, where the synaptic weight values are within the NVM elements of the die within a plurality of word lines; means, such as sense latch 316 of FIG. 3, for accessing synaptic weight values in parallel from the word lines using synaptic weight value access components (e.g. the sense latch 316) formed within the die; means, such as input latch 305 of FIG.
  • the means may include: means, such as circuits 1820 and 1822, for performing a neural network operation on the sensed neural network data, wherein the neural network operation modifies at least some of the neural network data; means, such as feedforward components 210 of FIG. 2, for performing feedforward neural network operations in parallel; and means, such as backpropagation components 220 of FIG.
  • the NVM elements may be NAND elements and the means for storing the neural network synaptic weight values may operate to store the synaptic weight values vertically on separate word lines in the NAND elements in the die, as already described.
  • FIG. 19 illustrates an embodiment of an apparatus 1900 configured according to one or more other aspects of the disclosure.
  • the apparatus 1900, or components thereof could embody or be implemented within a processor, a controller, an SSD controller, a host device, or some other type of device that processes data or controls data storage.
  • the apparatus 1900, or components thereof could embody or be implemented within a computing device, a personal computer, a portable device, or workstation, a server, a personal digital assistant, a digital camera, a digital phone, an entertainment device, a medical device, a self-driving vehicle control device, or any other electronic device that stores neural data.
  • the apparatus 1900 includes a communication interface 1902, a storage medium 1904, a memory array (e.g., an NVM memory circuit) 1908, and a processing circuit 1910 (e.g., at least one processor and/or other suitable circuitry). These components can be coupled to and/or placed in electrical communication with one another via a signaling bus or other suitable component, represented generally by the connection lines in FIG. 19.
  • the signaling bus may include any number of interconnecting buses and bridges depending on the specific application of the processing circuit 1910 and the overall design constraints.
  • the signaling bus links together various circuits such that each of the communication interface 1902, the storage medium 1904, and the memory array 1908 are coupled to and/or in electrical communication with the processing circuit 1910.
  • the signaling bus may also link various other circuits (not shown) such as timing sources, peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further.
  • the communication interface 1902 provides a means for communicating with other apparatuses over a transmission medium.
  • the communication interface 1902 includes circuitry and/or programming (e.g., a program) adapted to facilitate the communication of information bi-directionally with respect to one or more devices in a system.
  • the communication interface 1902 may be configured for wire-based communication.
  • the communication interface 1902 could be a bus interface, a send/receive interface, or some other type of signal interface including drivers, buffers, or other circuitry for outputting and/or obtaining signals (e.g., outputting signal from and/or receiving signals into an integrated circuit).
  • the communication interface 1902 serves as one example of a means for receiving and/or a means for transmitting.
  • the memory array 1908 may represent one or more memory devices such as a NAND die. In some implementations, the memory array 1908 and the storage medium 1904 are implemented as a common memory component. The memory array 1908 may be used for storing data that is manipulated by the processing circuit 1910 or some other component of the apparatus 1900.
  • the storage medium 1904 may represent one or more computer-readable, machine-readable, and/or processor-readable devices for storing programming, such as processor executable code or instructions (e.g., software, firmware), electronic data, databases, or other digital information.
  • the storage medium 1904 may also be used for storing data that is manipulated by the processing circuit 1910 when executing programming.
  • the storage medium 1904 may be any available media that can be accessed by a general purpose or special purpose processor, including portable or fixed storage devices, optical storage devices, and various other mediums capable of storing, containing or carrying programming.
  • the storage medium 1904 may include a magnetic storage device (e.g., hard disk, floppy disk, magnetic strip), an optical disk (e.g., a compact disc (CD) or a digital versatile disc (DVD)), a smart card, a flash memory device (e.g., a card, a stick, or a key drive), a RAM, ROM, PROM, EPROM, an EEPROM, ReRAM, a register, a removable disk, and any other suitable medium for storing software and/or instructions that may be accessed and read by a computer.
  • the storage medium 1904 may be embodied in an article of manufacture (e.g., a computer program product).
  • a computer program product may include a computer-readable medium in packaging materials.
  • the storage medium 1904 may be a non-transitory (e.g., tangible) storage medium.
  • the storage medium 1904 may be a non-transitory computer-readable medium storing computer- executable code, including code to perform operations as described herein.
  • the storage medium 1904 may be coupled to the processing circuit 1910 such that the processing circuit 1910 can read information from, and write information to, the storage medium 1904. That is, the storage medium 1904 can be coupled to the processing circuit 1910 so that the storage medium 1904 is at least accessible by the processing circuit 1910, including examples where at least one storage medium is integral to the processing circuit 1910 and/or examples where at least one storage medium is separate from the processing circuit 1910 (e.g., resident in the apparatus 1900, external to the apparatus 1900, distributed across multiple entities, etc.).
  • Programming stored by the storage medium 1904 when executed by the processing circuit 1910, causes the processing circuit 1910 to perform one or more of the various functions and/or process operations described herein.
  • the storage medium 1904 may include operations configured for regulating operations at one or more hardware blocks of the processing circuit 1910, as well as to utilize the communication interface 1902 for wireless communication utilizing their respective communication protocols.
  • the processing circuit 1910 is generally adapted for processing, including the execution of such programming stored on the storage medium 1904.
  • code or “programming” shall be construed broadly to include without limitation instructions, instruction sets, data, code, code segments, program code, programs, programming, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
  • the processing circuit 1910 is arranged to obtain, process and/or send data, control data access and storage, issue commands, and control other desired operations.
  • the processing circuit 1910 may include circuitry configured to implement desired programming provided by appropriate media in at least one example.
  • the processing circuit 1910 may be implemented as one or more processors, one or more controllers, and/or other structure configured to execute executable programming.
  • Examples of the processing circuit 1910 may include a general purpose processor, a digital signal processor (DSP), an ASIC, a field programmable gate array (FPGA) or other programmable logic component, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein.
  • DSP digital signal processor
  • ASIC application-programmable gate array
  • FPGA field programmable gate array
  • a general purpose processor may include a microprocessor, as well as any conventional processor, controller, microcontroller, or state machine.
  • the processing circuit 1910 may also be implemented as a combination of computing components, such as a combination of a controller and a microprocessor, a number of microprocessors, one or more microprocessors in conjunction with an ASIC and a microprocessor, or any other number of varying configurations. These examples of the processing circuit 1910 are for illustration and other suitable configurations within the scope of the disclosure are also contemplated.
  • the processing circuit 1910 may be adapted to perform any or all of the features, processes, functions, operations and/or routines for any or all of the controller apparatuses described herein.
  • the processing circuit 1910 may be configured to perform any of the steps, functions, and/or processes described with respect to FIGS. 1 and 10-11.
  • the term "adapted" in relation to the processing circuit 1910 may refer to the processing circuit 1910 being one or more of configured, employed, implemented, and/or programmed to perform a particular process, function, operation and/or routine according to various features described herein.
  • the processing circuit 1910 may be a specialized processor, such as an ASIC that serves as a means for (e.g., structure for) carrying out any one of the operations described in conjunction with FIGS. FIGS. 1 and 10-11.
  • the processing circuit 1910 serves as one example of a means for processing.
  • the processing circuit 1910 may provide and/or incorporate, at least in part, the functionality described above for the controller 108 of FIG. 1.
  • the processing circuit 1910 may include one or more of: a circuit/module 1920 for storing neural network input data in an NVM (such as a NAND die); a circuit/module 1922 for storing neural network training data in an NVM (such as a NAND die); a circuit/module 1924 for receiving and processing neural network output data (e.g. from a NAND die); a circuit/module 1926 for generating and maintaining first FTL mapping table (such as the first table 1002 of FIG. 10); a circuit/module 1928 for generating and maintaining second FTL mapping table (such as the second table 1008 of FIG.
  • a circuit/module 1920 for storing neural network input data in an NVM (such as a NAND die); a circuit/module 1922 for storing neural network training data in an NVM (such as a NAND die); a circuit/module 1924 for receiving and processing neural network output data (e.g. from a NAND die); a circuit/module 1926 for generating and maintaining first FTL
  • circuit/module 1930 for converting neural network weight units to logical-block-IDS using the first table
  • circuit/module 1932 for converting virtual- block-IDs to physical-block-IDs using the second table
  • circuit/module 1934 for applying PBAs received from the NVM to the second FTL table to update values in second FTL table.
  • a program stored by the storage medium 1904 when executed by the processing circuit 1910, causes the processing circuit 1910 to perform one or more of the various functions and/or process operations described herein.
  • the program may cause the processing circuit 1910 to perform and/or control the various functions, steps, and/or processes described herein with respect to FIGS. 1-18, including operations performed by a NAND die. As shown in FIG.
  • the storage medium 1904 may include one or more of: code 1940 for storing neural network input data in the NVM (such as in a NAND die); code 1942 for storing neural network training data in NVM (such as in a NAND die); code 1944 for receiving and processing neural network output data (such as from a NAND die); code 1946 for generating and maintaining a first FTL mapping table; code 1948 for generating and maintaining a second FTL mapping table; code 1950 for converting neural network weight units to virtual-block- ID's using a first FTL mapping table; code 1952 for converting virtual-block-ID's to physical-block- ID's using a second FTL mapping table; code 1954 for applying PBAs received from the NVM (such as from a NAND) to the second FTL table to update values in second FTL table.
  • code 1940 for storing neural network input data in the NVM (such as in a NAND die); code 1942 for storing neural network training data in NVM (such as in a NAND die); code 1944 for receiving and processing neural network output
  • means may be provided for performing the functions illustrated in FIG. 19 and/or other functions illustrated or described herein.
  • the means may include one or more of: means, such as circuit/module 1920, for storing neural network input data in an NVM (such as a NAND die); means, such as circuit/module 1922, for storing neural network training data in an NVM (such as a NAND die); means, such as circuit/module 1924, for receiving and processing neural network output data (e.g. from a NAND die); means, such as circuit/module 1926, for generating and maintaining first FTL mapping table (such as the first table 1002 of FIG.
  • circuit/module 1928 for generating and maintaining second FTL mapping table (such as the second table 1008 of FIG. 10); means, such as circuit/module 1930, for converting neural network weight units to logical-block-IDS using the first table; means, such as circuit/module 1932, for converting virtual-block-IDs to physical-block-IDs using the second table; and means, such as circuit/module 1934, for applying PBAs received from the NVM to the second FTL table to update values in second FTL table.
  • second FTL mapping table such as the second table 1008 of FIG. 10
  • circuit/module 1930 for converting neural network weight units to logical-block-IDS using the first table
  • circuit/module 1932 for converting virtual-block-IDs to physical-block-IDs using the second table
  • circuit/module 1934 for applying PBAs received from the NVM to the second FTL table to update values in second FTL table.
  • NAND flash memory such as 3D NAND flash memory.
  • Semiconductor memory devices include volatile memory devices, such as DRAM) or SRAM devices, NVM devices, such as ReRAM, EEPROM, flash memory (which can also be considered a subset of EEPROM), ferroelectric random access memory (FRAM), and MRAM, and other semiconductor elements capable of storing information.
  • NVM devices such as ReRAM, EEPROM, flash memory (which can also be considered a subset of EEPROM), ferroelectric random access memory (FRAM), and MRAM, and other semiconductor elements capable of storing information.
  • Each type of memory device may have different configurations.
  • flash memory devices may be configured in a NAND or a NOR configuration.
  • some features described herein are specific to NAND-based devices, such as the NAND-based on-chip copy with update.
  • the memory devices can be formed from passive and/or active elements, in any combinations.
  • passive semiconductor memory elements include ReRAM device elements, which in some embodiments include a resistivity switching storage element, such as an anti-fuse, phase change material, etc., and optionally a steering element, such as a diode, etc.
  • active semiconductor memory elements include EEPROM and flash memory device elements, which in some embodiments include elements containing a charge storage region, such as a floating gate, conductive nanoparticles, or a charge storage dielectric material.
  • Multiple memory elements may be configured so that they are connected in series or so that each element is individually accessible.
  • flash memory devices in a NAND configuration typically contain memory elements connected in series.
  • a NAND memory array may be configured so that the array is composed of multiple strings of memory in which a string is composed of multiple memory elements sharing a single bit line and accessed as a group.
  • memory elements may be configured so that each element is individually accessible, e.g., a NOR memory array.
  • NAND and NOR memory configurations are exemplary, and memory elements may be otherwise configured.
  • the semiconductor memory elements located within and/or over a substrate may be arranged in two or three dimensions, such as a two dimensional memory structure or a three dimensional memory structure.
  • the semiconductor memory elements are arranged in a single plane or a single memory device level.
  • memory elements are arranged in a plane (e.g., in an x-y direction plane) which extends substantially parallel to a major surface of a substrate that supports the memory elements.
  • the substrate may be a wafer over or in which the layer of the memory elements are formed or it may be a carrier substrate which is attached to the memory elements after they are formed.
  • the substrate may include a semiconductor such as silicon.
  • the memory elements may be arranged in the single memory device level in an ordered array, such as in a plurality of rows and/or columns. However, the memory elements may be arrayed in non-regular or non-orthogonal configurations.
  • the memory elements may each have two or more electrodes or contact lines, such as bit lines and word lines.
  • a three dimensional memory array is arranged so that memory elements occupy multiple planes or multiple memory device levels, thereby forming a structure in three dimensions (i.e., in the x, y and z directions, where the z direction is substantially perpendicular and the x and y directions are substantially parallel to the major surface of the substrate).
  • a three dimensional memory structure may be vertically arranged as a stack of multiple two dimensional memory device levels.
  • a three dimensional memory array may be arranged as multiple vertical columns (e.g., columns extending substantially perpendicular to the major surface of the substrate, i.e., in the z direction) with each column having multiple memory elements in each column.
  • the columns may be arranged in a two dimensional configuration, e.g., in an x- y plane, resulting in a three dimensional arrangement of memory elements with elements on multiple vertically stacked memory planes.
  • Other configurations of memory elements in three dimensions can also constitute a three dimensional memory array.
  • the memory elements may be coupled together to form a NAND string within a single horizontal (e.g., x-y) memory device levels.
  • the memory elements may be coupled together to form a vertical NAND string that traverses across multiple horizontal memory device levels.
  • Other three dimensional configurations can be envisioned wherein some NAND strings contain memory elements in a single memory level while other strings contain memory elements which span through multiple memory levels.
  • Three dimensional memory arrays may also be designed in a NOR configuration and in a ReRAM configuration.
  • one or more memory device levels are formed above a single substrate.
  • the monolithic three dimensional memory array may also have one or more memory layers at least partially within the single substrate.
  • the substrate may include a semiconductor such as silicon.
  • the layers constituting each memory device level of the array are typically formed on the layers of the underlying memory device levels of the array.
  • layers of adjacent memory device levels of a monolithic three dimensional memory array may be shared or have intervening layers between memory device levels.
  • two dimensional arrays may be formed separately and then packaged together to form a non-monolithic memory device having multiple layers of memory.
  • non-monolithic stacked memories can be constructed by forming memory levels on separate substrates and then stacking the memory levels atop each other. The substrates may be thinned or removed from the memory device levels before stacking, but as the memory device levels are initially formed over separate substrates, the resulting memory arrays are not monolithic three dimensional memory arrays.
  • multiple two dimensional memory arrays or three dimensional memory arrays may be formed on separate chips and then packaged together to form a stacked-chip memory device.
  • Associated circuitry is typically required for operation of the memory elements and for communication with the memory elements.
  • memory devices may have circuitry used for controlling and driving memory elements to accomplish functions such as programming and reading.
  • This associated circuitry may be on the same substrate as the memory elements and/or on a separate substrate.
  • a controller for memory read-write operations may be located on a separate controller chip and/or on the same substrate as the memory elements.
  • the subject matter described herein may be implemented in hardware, software, firmware, or any combination thereof.
  • the terms "function,” “module,” and the like as used herein may refer to hardware, which may also include software and/or firmware components, for implementing the feature being described.
  • the subject matter described herein may be implemented using a computer readable medium having stored thereon computer executable instructions that when executed by a computer (e.g., a processor) control the computer to perform the functionality described herein.
  • Examples of computer readable media suitable for implementing the subject matter described herein include non-transitory computer-readable media, such as disk memory devices, chip memory devices, programmable logic devices, and application specific integrated circuits.
  • a computer readable medium that implements the subject matter described herein may be located on a single device or computing platform or may be distributed across multiple devices or computing platforms.
  • any reference to an element herein using a designation such as “first,” “second,” and so forth does not generally limit the quantity or order of those elements. Rather, these designations may be used herein as a convenient method of distinguishing between two or more elements or instances of an element. Thus, a reference to first and second elements does not mean that only two elements may be used there or that the first element must precede the second element in some manner. Also, unless stated otherwise a set of elements may include one or more elements. In addition, terminology of the form “at least one of A, B, or C" or "A,
  • a and B or A and C, or A and B and C, or 2A, or 2B, or 2C, or 2A and B, and so on.
  • "at least one of: A, B, or C” is intended to cover A, B, C, A-B, A-C, B- C, and A-B-C, as well as multiples of the same members (e.g., any lists that include AA, BB, or CC).
  • "at least one of: A, B, and C” is intended to cover A, B, C, A-B, A-C, B-C, and A-B-C, as well as multiples of the same members.
  • a phrase referring to a list of items linked with “and/or” refers to any combination of the items.
  • “A and/or B” is intended to cover A alone, B alone, or A and B together.
  • “A, B and/or C” is intended to cover A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B, and C together.
  • determining encompasses a wide variety of actions. For example, “determining” may include calculating, computing, processing, deriving, investigating, looking up (e.g., looking up in a table, a database or another data structure), ascertaining, and the like. Also, “determining” may include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory), and the like. Also, “determining” may include resolving, selecting, choosing, establishing, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Semiconductor Memories (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Error Detection And Correction (AREA)
  • Retry When Errors Occur (AREA)
  • Complex Calculations (AREA)

Abstract

Selon des modes de réalisation donnés à titre d'exemple, l'invention concerne des procédés et des appareils de mise en œuvre d'un accélérateur d'apprentissage profond (DLA) ou d'autres composants de réseau neuronal à l'intérieur de la puce d'un appareil de mémoire non volatile (NVM) au moyen, par exemple, des composants de circuit de sous-réseau à l'intérieur de la puce. Certains aspects de la présente invention concernent la configuration des composants de sous-réseau pour la mise en œuvre d'opérations de DLA par anticipation. D'autres aspects concernent des opérations de rétropropagation. D'autres aspects encore concernent l'utilisation d'une copie sur puce à base NON-ET avec une fonction de mise à jour pour faciliter la mise à jour de pondérations synaptiques d'un réseau neuronal mémorisé sur une puce. D'autres aspects de la présente invention concernent la configuration d'un contrôleur de dispositif à semi-conducteurs (SSD) destiné à être utilisé avec la NVM. Selon certains aspects, le contrôleur SSD comprend des tables de couche de traduction flash (FTL) configurées spécifiquement pour être utilisées avec des données de réseau neuronal mémorisées dans la NVM.
EP19888248.2A 2018-12-06 2019-09-06 Puce de mémoire non volatile à réseau neuronal à apprentissage profond Pending EP3756186A4 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP20178429.5A EP3789925A1 (fr) 2018-12-06 2019-09-06 Puce de mémoire non volatile à réseau neuronal à apprentissage profond

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US16/212,596 US11133059B2 (en) 2018-12-06 2018-12-06 Non-volatile memory die with deep learning neural network
US16/212,586 US20200184335A1 (en) 2018-12-06 2018-12-06 Non-volatile memory die with deep learning neural network
PCT/US2019/050105 WO2020117348A2 (fr) 2018-12-06 2019-09-06 Puce de mémoire non volatile à réseau neuronal à apprentissage profond

Related Child Applications (2)

Application Number Title Priority Date Filing Date
EP20178429.5A Division-Into EP3789925A1 (fr) 2018-12-06 2019-09-06 Puce de mémoire non volatile à réseau neuronal à apprentissage profond
EP20178429.5A Division EP3789925A1 (fr) 2018-12-06 2019-09-06 Puce de mémoire non volatile à réseau neuronal à apprentissage profond

Publications (2)

Publication Number Publication Date
EP3756186A2 true EP3756186A2 (fr) 2020-12-30
EP3756186A4 EP3756186A4 (fr) 2021-06-02

Family

ID=70975628

Family Applications (2)

Application Number Title Priority Date Filing Date
EP19888248.2A Pending EP3756186A4 (fr) 2018-12-06 2019-09-06 Puce de mémoire non volatile à réseau neuronal à apprentissage profond
EP20178429.5A Pending EP3789925A1 (fr) 2018-12-06 2019-09-06 Puce de mémoire non volatile à réseau neuronal à apprentissage profond

Family Applications After (1)

Application Number Title Priority Date Filing Date
EP20178429.5A Pending EP3789925A1 (fr) 2018-12-06 2019-09-06 Puce de mémoire non volatile à réseau neuronal à apprentissage profond

Country Status (3)

Country Link
EP (2) EP3756186A4 (fr)
CN (2) CN112154460A (fr)
WO (1) WO2020117348A2 (fr)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112182965B (zh) * 2020-09-25 2024-02-02 苏州融睿纳米复材科技有限公司 一种基于深度学习来设计红外相变材料的方法
WO2022132287A1 (fr) * 2020-12-15 2022-06-23 Microchip Technology Inc. Procédé et appareil pour effectuer une opération de réseau de neurones
CN112598122B (zh) * 2020-12-23 2023-09-05 北方工业大学 一种基于可变电阻式随机存储器的卷积神经网络加速器
CN112669893B (zh) * 2020-12-30 2022-08-16 杭州海康存储科技有限公司 确定待使用读电压的方法、系统、装置及设备
US11514992B2 (en) 2021-02-25 2022-11-29 Microchip Technology Inc. Method and apparatus for reading a flash memory device
US11934696B2 (en) 2021-05-18 2024-03-19 Microchip Technology Inc. Machine learning assisted quality of service (QoS) for solid state drives
US11699493B2 (en) 2021-05-24 2023-07-11 Microchip Technology Inc. Method and apparatus for performing a read of a flash memory using predicted retention-and-read-disturb-compensated threshold voltage shift offset values
US11514994B1 (en) 2021-05-28 2022-11-29 Microchip Technology Inc. Method and apparatus for outlier management
DE112022002131T5 (de) 2021-09-28 2024-04-11 Microchip Technology Inc. Ldpc-dekodierung mit trapped-block-management
CN114915496B (zh) * 2022-07-11 2023-01-10 广州番禺职业技术学院 基于时间权重和深度神经网络的网络入侵检测方法和装置

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9753695B2 (en) * 2012-09-04 2017-09-05 Analog Devices Global Datapath circuit for digital signal processors
FR3025344B1 (fr) * 2014-08-28 2017-11-24 Commissariat Energie Atomique Reseau de neurones convolutionnels
US9678832B2 (en) * 2014-09-18 2017-06-13 Sandisk Technologies Llc Storage module and method for on-chip copy gather
US9778863B2 (en) * 2014-09-30 2017-10-03 Sandisk Technologies Llc System and method for folding partial blocks into multi-level cell memory blocks
CN105488565A (zh) * 2015-11-17 2016-04-13 中国科学院计算技术研究所 加速深度神经网络算法的加速芯片的运算装置及方法
US10460237B2 (en) * 2015-11-30 2019-10-29 International Business Machines Corporation Neuron-centric local learning rate for artificial neural networks to increase performance, learning rate margin, and reduce power consumption
WO2017162129A1 (fr) * 2016-03-21 2017-09-28 成都海存艾匹科技有限公司 Neuroprocesseur intégré comprenant une matrice de mémoire tridimensionnelle
US10387303B2 (en) * 2016-08-16 2019-08-20 Western Digital Technologies, Inc. Non-volatile storage system with compute engine to accelerate big data applications
US11501130B2 (en) * 2016-09-09 2022-11-15 SK Hynix Inc. Neural network hardware accelerator architectures and operating method thereof
US9646243B1 (en) * 2016-09-12 2017-05-09 International Business Machines Corporation Convolutional neural networks using resistive processing unit array
US10846595B2 (en) * 2016-12-20 2020-11-24 Intel Corporation Rapid competitive learning techniques for neural networks
US10909449B2 (en) * 2017-04-14 2021-02-02 Samsung Electronics Co., Ltd. Monolithic multi-bit weight cell for neuromorphic computing
US10699778B2 (en) * 2017-04-28 2020-06-30 Arizona Board Of Regents On Behalf Of Arizona State University Static random access memory (SRAM) cell and related SRAM array for deep neural network and machine learning applications
KR102473579B1 (ko) * 2017-05-11 2022-12-01 포항공과대학교 산학협력단 가중치 소자 및 이의 작동 방법
US10127494B1 (en) * 2017-08-02 2018-11-13 Google Llc Neural network crossbar stack
CN108053848A (zh) * 2018-01-02 2018-05-18 清华大学 电路结构及神经网络芯片

Also Published As

Publication number Publication date
WO2020117348A2 (fr) 2020-06-11
CN112154460A (zh) 2020-12-29
EP3789925A1 (fr) 2021-03-10
EP3756186A4 (fr) 2021-06-02
CN117669663A (zh) 2024-03-08
WO2020117348A3 (fr) 2020-12-10

Similar Documents

Publication Publication Date Title
US11705191B2 (en) Non-volatile memory die with deep learning neural network
EP3789925A1 (fr) Puce de mémoire non volatile à réseau neuronal à apprentissage profond
US20200311512A1 (en) Realization of binary neural networks in nand memory arrays
US11170290B2 (en) Realization of neural networks with ternary inputs and binary weights in NAND memory arrays
US11568200B2 (en) Accelerating sparse matrix multiplication in storage class memory-based convolutional neural network inference
US11328204B2 (en) Realization of binary neural networks in NAND memory arrays
US20200184335A1 (en) Non-volatile memory die with deep learning neural network
US11507843B2 (en) Separate storage and control of static and dynamic neural network data within a non-volatile memory array
US20200401850A1 (en) Non-volatile memory die with on-chip data augmentation components for use with machine learning
US11625586B2 (en) Realization of neural networks with ternary inputs and ternary weights in NAND memory arrays
US20160179399A1 (en) System and Method for Selecting Blocks for Garbage Collection Based on Block Health
WO2021126294A1 (fr) Techniques de transformation de noyau pour réduire la consommation d'énergie d'un moteur d'inférence de réseau de neurones à convolution en mémoire de poids binaire et d'entrée binaire
US11662904B2 (en) Non-volatile memory with on-chip principal component analysis for generating low dimensional outputs for machine learning
US20120290897A1 (en) Data storage system having multi-bit memory device and on-chip buffer program method thereof
US11397885B2 (en) Vertical mapping and computing for deep neural networks in non-volatile memory
CN110751276A (zh) 在nand存储器阵列中实现具有三值输入和二值权重的神经网络
US11251812B2 (en) Encoding and decoding of hamming distance-based binary representations of numbers
US11556311B2 (en) Reconfigurable input precision in-memory computing
Liu et al. Era-bs: Boosting the efficiency of reram-based pim accelerator with fine-grained bit-level sparsity
US20230418600A1 (en) Non-volatile memory die with latch-based multiply-accumulate components
US20230418738A1 (en) Memory device with latch-based neural network weight parity detection and trimming
US11663471B2 (en) Compute-in-memory deep neural network inference engine using low-rank approximation technique
US11507835B2 (en) Neural network data updates using in-place bit-addressable writes within storage class memory
TW202341150A (zh) 記憶體系統及記憶體陣列的操作方法

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20200604

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

R17D Deferred search report published (corrected)

Effective date: 20201210

REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Free format text: PREVIOUS MAIN CLASS: G11C0011560000

Ipc: G06N0003040000

A4 Supplementary search report drawn up and despatched

Effective date: 20210503

RIC1 Information provided on ipc code assigned before grant

Ipc: G06N 3/04 20060101AFI20210426BHEP

Ipc: G06N 3/063 20060101ALI20210426BHEP

Ipc: G06N 3/08 20060101ALI20210426BHEP

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20230117