US20190311018A1 - Systems and methods for efficient matrix multiplication - Google Patents
Systems and methods for efficient matrix multiplication Download PDFInfo
- Publication number
- US20190311018A1 US20190311018A1 US16/376,169 US201916376169A US2019311018A1 US 20190311018 A1 US20190311018 A1 US 20190311018A1 US 201916376169 A US201916376169 A US 201916376169A US 2019311018 A1 US2019311018 A1 US 2019311018A1
- Authority
- US
- United States
- Prior art keywords
- electrodes
- analog
- input
- matrix
- voltages
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06G—ANALOGUE COMPUTERS
- G06G7/00—Devices in which the computing operation is performed by varying electric or magnetic quantities
- G06G7/12—Arrangements for performing computing operations, e.g. operational amplifiers
- G06G7/16—Arrangements for performing computing operations, e.g. operational amplifiers for multiplication or division
- G06G7/163—Arrangements for performing computing operations, e.g. operational amplifiers for multiplication or division using a variable impedance controlled by one of the input signals, variable amplification or transfer function
-
- G06N3/0635—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
- G06N3/065—Analogue means
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/54—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using elements simulating biological cells, e.g. neuron
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C13/00—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
- G11C13/0002—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
- G11C13/0004—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements comprising amorphous/crystalline phase transition cells
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C13/00—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
- G11C13/0002—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
- G11C13/0021—Auxiliary circuits
- G11C13/0023—Address circuits or decoders
- G11C13/0026—Bit-line or column circuits
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C13/00—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
- G11C13/0002—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
- G11C13/0021—Auxiliary circuits
- G11C13/0023—Address circuits or decoders
- G11C13/0028—Word-line or row circuits
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C13/00—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
- G11C13/0002—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
- G11C13/0021—Auxiliary circuits
- G11C13/003—Cell access
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C13/00—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
- G11C13/0002—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
- G11C13/0021—Auxiliary circuits
- G11C13/004—Reading or sensing circuits or methods
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C13/00—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
- G11C13/0002—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
- G11C13/0021—Auxiliary circuits
- G11C13/0069—Writing or programming circuits or methods
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
- G11C7/1006—Data managing, e.g. manipulating data before writing or reading out, data bus switches or control circuits therefor
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C13/00—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
- G11C13/0002—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
- G11C13/0021—Auxiliary circuits
- G11C13/0069—Writing or programming circuits or methods
- G11C2013/0073—Write using bi-directional cell biasing
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C2213/00—Indexing scheme relating to G11C13/00 for features not covered by this group
- G11C2213/10—Resistive cells; Technology aspects
- G11C2213/18—Memory cell being a nanowire having RADIAL composition
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C2213/00—Indexing scheme relating to G11C13/00 for features not covered by this group
- G11C2213/10—Resistive cells; Technology aspects
- G11C2213/19—Memory cell comprising at least a nanowire and only two terminals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C2213/00—Indexing scheme relating to G11C13/00 for features not covered by this group
- G11C2213/70—Resistive array aspects
- G11C2213/77—Array wherein the memory element being directly connected to the bit lines and word lines without any access device being used
Definitions
- This invention relates generally to computer hardware, and in particular to accelerators designed for performing efficient matrix operations in fields such as artificial intelligence and memory devices.
- Matrix operations are used in a variety of modern computing tasks. Many physical phenomena can be represented by one or more matrices of numerical values and processed in modern computers. For example, still photographs, video image frames, sensor output data, an interval of speech, financial transaction data, autonomous driving sensor data, and many other physical objects or parameters can be represented by one or more matrices of numerical values suitable for processing, manipulation and operation in modern computers. While general-purpose computing hardware can be used to perform matrix operations, the characteristics of matrix data and matrix operations can make them good candidates for designing hardware customized to more efficiently process matrix workloads and matrix operations compared to general-purpose computers. One form of matrix operation frequently used in modern computing tasks is digital vector-matrix multiplication.
- a system of sparse vector-matrix multiplication includes: a silicon substrate; a circuit layer formed in or on the substrate; a plurality of electrodes formed on the circuit layer; and a mesh formed randomly on the plurality of electrodes, wherein the circuit layer is configured to: receive a plurality of digital input signals; convert the plurality of digital input signals to a plurality of analog input signals; write the plurality of analog input signals on an input set of the plurality of electrodes; read from an output set of the plurality of electrodes a plurality of analog output signals, convert the plurality of analog output signals to a plurality of digital output signals, and output the plurality of digital output signals.
- the mesh includes coaxial nanowires having a metal core wrapped in two-terminal non-volatile memory (NVM) material.
- NVM non-volatile memory
- the non-volatile memory material includes a voltage-controlled resistance.
- the circuit layer includes: an input register configured to receive the plurality of the digital input signals; one or more digital to analog converters configured to convert the plurality of digital input signals to a plurality of analog input signals; one or more analog to digital converters configured to convert the plurality of analog output signals to the plurality of digital output signals; and an output register configured to receive and store the plurality of digital output signals.
- the circuit layer further includes a column driver and a row driver configured to selectively provide biasing voltages and/or training voltages to the plurality of the electrodes.
- the plurality of analog input signals include voltages and the plurality of analog output signals include currents, or vice versa.
- the plurality of analog input signals include voltages
- the plurality of analog output signals include currents
- the circuit layer further includes: a plurality of amplifiers coupled to the plurality of electrodes, wherein amplifiers coupled to the input set of the plurality of electrodes are configured as a sample-and-hold (SAH) amplifier and configured to write the plurality of analog input signals to the input set, and amplifiers coupled to the output set of the plurality of the electrodes are configured as current-sensing amplifiers and configured to read the plurality of analog output signals.
- SAH sample-and-hold
- the plurality of electrodes include neurons in a neural network layer.
- the plurality of electrodes and the randomly formed mesh include a matrix of conductances.
- the matrix of conductances is tunable using one or more of temperature-driven phase-change memory mechanisms, unipolar resistive switching, and bipolar memristive mechanisms.
- a method of sparse vector-matrix multiplication includes: providing a plurality of electrodes on a silicon substrate; forming a layer of randomly arranged coaxial nanowires on the plurality of electrodes; receiving a plurality of digital input signals; converting the plurality of digital input signals to a plurality of analog input signals; writing the plurality of analog input signals on an input set of the plurality of electrodes; reading from an output set of the plurality of electrodes a plurality of analog output signals; converting the plurality of analog output signals to a plurality of digital output signals; and outputting the plurality of digital output signals.
- the coaxial nanowires include a metal core wrapped in two-terminal non-volatile memory (NVM) material.
- NVM non-volatile memory
- the NVM material includes one or more of a voltage-controlled resistance, memristor, phase-change material (PCM), and resistive random-access-memory (ReRAM) material.
- PCM phase-change material
- ReRAM resistive random-access-memory
- the method further includes: selectively providing biasing voltages to the plurality of the electrodes to enable writing voltages into or reading currents from the plurality of the electrodes.
- voltage-controlled resistances are formed at intersections of the plurality of the electrodes and the randomly arranged coaxial nanowires and the method further comprises selectively providing training voltages to the plurality of the electrodes to adjust the voltage-controlled resistances.
- the method further includes receiving a training signal indicating which electrodes in the plurality of the electrodes are to be applied the training voltages.
- the plurality of the electrodes include neurons in a neural network layer.
- the plurality of the electrodes and the layer of randomly arranged coaxial nanowires form a matrix of conductances and the conductances are tuned by performing gradient descent.
- the plurality of analog input signals include voltages and the plurality of analog output signals include currents.
- the input and output sets each comprise half of the electrodes of the plurality of electrodes.
- FIG. 1 illustrates a diagram of a matrix in a dot-product engine used to perform vector-matrix multiplication.
- FIG. 2 illustrates a diagram of a coaxial nanowire, which can be utilized in building high efficiency computing hardware.
- FIG. 3 illustrates a diagram of a sparse vector-matrix multiplication (SVMM) engine according to an embodiment.
- SVMM sparse vector-matrix multiplication
- FIG. 4 illustrates a diagram of an embodiment of the circuit layer and the electrodes of the embodiment of FIG. 3 .
- FIG. 5 illustrates a flow chart of a method of sparse vector-matrix multiplication according to an embodiment.
- processor can refer to various microprocessors, controllers, and/or hardware and software optimized for loading and executing software programming instructions or processors including graphics processing units (GPUs) optimized for handling high volume matrix data related to image processing.
- GPUs graphics processing units
- Conductance refers to the degree by which a component conducts electricity. Conductance can be calculated as the ratio of the current that flows through the component to the potential difference present across the component. Conductance is the reciprocal of the resistance and is measured in siemens.
- ense in the context of matrix multiplication engines described herein can refer to engines where there is an electrical connection or path from each input to each output node of the matrix multiplication engine.
- One example of hardware specialized for performing vector-matrix multiplication is a dot-product-engine based on a crossbar architecture.
- FIG. 1 illustrates a diagram of a matrix 20 in a dot-product engine used to perform vector-matrix multiplication.
- Matrix 20 utilizes a crossbar array architecture and includes horizontal input voltage lines intersecting vertical output current lines.
- the input/output voltage/current lines can be neurons in a neural network layer, when matrix 20 is used to perform vector-matrix multiplication in the context of neural networks.
- the input voltage lines and output current lines are made of conductive metal material.
- a material made of non-volatile memory (NVM) 21 connects the input voltage lines to the intersecting output current lines.
- NVM non-volatile memory
- this is achieved via lithographically patterning electrode lines (horizontal and vertical lines) to sandwich an NVM-type material 21 .
- the vector of input voltages is applied on the input voltage lines.
- the output current at each column is determined by the sum of currents from each intersection of that column with input voltage lines and determined by applying Kirchhoff s current law (KCL) for each intersection.
- the matrix 20 is partially formed by NVM material 21 whose resistances are controllable by applying an appropriate voltage. Therefore, a matrix of parameter values (e.g., matrix of weights in a layer of a neural network) can be constructed in the matrix 20 by adjusting the intersection resistances to match the matrix of parameter values of a desired computation.
- the dot-product engine utilizing matrix 20 can be characterized as a dense array structure, where each input and output are connected.
- the chip area required to implement the matrix 20 scales quadratically relative to the number of input and output neurons it provides.
- input/output neurons and chip area needed to implement the matrix 20 scale at different rates. While input/output neurons scale linearly (on the edges of matrix 20 ), the chip area needed to implement vector-matrix multiplication of those additional neurons grows quadratically (in the area of the matrix 20 ).
- Recent discoveries of new materials have the potential to revolutionize computing hardware and approaches to designing hardware tasked with executing software.
- computing hardware designed only based on traditional silicon material approach its physical limit of performance designs based on new material alone or in combination with silicon-based circuits promise greater efficiency in hardware.
- nanoscale material such as nanowires and materials with desirable mechanical, or electrical properties promise advancements and improvements in computing methods and devices, including hardware customized and optimized for performing matrix multiplication.
- FIG. 2 illustrates a diagram of a coaxial nanowire 10 , which as will be described can be utilized in building high efficiency computing hardware.
- the coaxial nanowire 10 includes a metal core 12 wrapped in a two-terminal non-volatile memory (NVM) material 14 .
- the coaxial nanowire 10 touches two metal electrodes 16 and 18 .
- the NVM material is a two-terminal device, whose resistance is controlled by voltages applied above or below some threshold voltages across the two terminals. For example, when the electrode 16 applies a voltage above a positive threshold voltage (SET-voltage) to the NVM material 14 , the NVM material 14 may undergo dielectric breakdown and one or more conductive filaments are formed through it, thereby lowering its electrical resistance and increasing its conductivity. Subsequently, the electrical connection between the electrodes 16 and 18 can be strengthened via the now more-conductive NVM material 14 and the metal core 12 .
- SET-voltage positive threshold voltage
- the electrode 16 applies a voltage below a negative voltage threshold (RESET-voltage) to the NVM material 14 , the dielectric breakdown process is reversed, the filaments dissolve away and the electrical resistance of the NVM material 14 reverts to its original value or some other lower resistance, thereby weakening the electrical connection between the electrodes 16 and 18 .
- the NVM material 14 is transformed to low resistance state (LRS) and for voltages below the RESET-voltage, the NVM material 14 is transformed to high resistance state (HRS).
- LRS low resistance state
- HRS high resistance state
- coaxial nanowire 10 forms a memory device at the intersection of its contact with an electrode. The resistance at the interface is dependent upon previously applied voltage (if the previous voltage was above the SET-voltage or below the RESET voltage).
- NVM material 14 includes, memristors, phase-change material (PCM), resistive random-access-memory (ReRAM) material, or any other material whose resistance is voltage-controlled, including any material which retains a resistance in response to an applied voltage with respect to one or more threshold voltages.
- PCM phase-change material
- ReRAM resistive random-access-memory
- Matrix multiplication is used in many modern computing tasks, such as artificial intelligence (AI), machine learning, neural network, neural network training, various transforms (e.g., Discrete Fourier Transform), and others.
- AI artificial intelligence
- the non-volatile and controllable memory properties of the shell of coaxial nanowire 10 can be exploited to make hardware that can efficiently perform matrix multiplication.
- a form of matrix multiplication used in modern computing tasks is digital vector-matrix multiplication, where a vector of input values is multiplied by a matrix of parameter values.
- the multiplication yields an output vector.
- the coaxial nanowire 10 can be used to construct the matrix of parameter values.
- Parameter values can be any parameter values used in various computing tasks, for example weights in a layer of neural network.
- a vector of input values can be multiplied by a matrix of weights producing an output vector of the layer.
- the analog output vector can be converted to digital output values and outputted.
- an alternative engine for performing vector-matrix multiplication can use electrodes, distributed over a chip area, sparsely connected with coaxial nanowires 10 , where electrodes can distribute input and output nodes over the chip area where they exist as opposed to only the lateral edges of a crossbar array as is the case in the dot-product engine of FIG. 1 .
- a network of distributed electrodes sparsely-connected with coaxial nanowires 10 can construct a conductance matrix, which can be used as a parameter matrix in desired computations.
- a subset of the electrodes can be used to feed a vector of input voltages and the complementary subset of the electrodes can be probed to read a vector of output currents.
- the output vector of currents is the result of vector-matrix multiplication of the vector of input voltages with the matrix of conductances according to Ohm's law.
- FIG. 3 illustrates a diagram of a sparse vector-matrix multiplication (SVMM) engine 22 .
- the SVMM engine 22 includes a silicon substrate 24 , control circuitry within a circuit layer 26 , for example, a complementary metal-oxide-semiconductor (CMOS) layer, a grid of electrodes 28 and a randomly formed mesh 30 of coaxial nanowires 10 deposited on top of the grid 28 .
- CMOS complementary metal-oxide-semiconductor
- Mesh 30 is placed above or formed on top of the electrode grid 28 , providing physical contact between the mesh 30 and the top of electrode grid 28 .
- the electrodes of the grid 28 can be grown through the mesh 30 as pillars of metal.
- the coaxial nanowires 10 deposited randomly on top of the electrodes of the grid 28 can provide electrical connections between the electrodes that they contact. Consequently, the coaxial nanowires 10 sparsely connect the electrodes of the grid 28 .
- the strength of the electrical connections between the electrodes can be modulated based on increasing or decreasing the resistances of the coaxial nanowires 10 .
- the circuitry in the circuit layer 26 can be used to apply a SET-voltage or a RESET-voltage to some or all of the coaxial nanowires 10 in the mesh 30 via electrodes in the grid 28 .
- the electrical resistances of the coaxial nanowires 10 in mesh 30 can increase or decrease depending on the voltages they receive via the electrodes in the grid 28 , thereby strengthening or weakening the electrical connections between the electrodes of the grid 28 .
- the coaxial nanowires 10 in mesh 30 are randomly formed, they can create random electrical connections between the electrodes in the grid 28 via the NVM-type material and the metal cores of the nanowires 10 .
- the electrodes of the grid 28 are sparsely connected via the coaxial nanowires 10 of mesh 30 .
- the grid 28 sparsely connected with the mesh 30 forms a sparsely connected matrix of conductances, which can be used for vector-matrix multiplication.
- a vector of input voltages can be applied to a subset of the electrodes in the grid 28 (the input electrodes) and the remainder of the electrodes (the output electrodes) can be used to read an output vector of currents.
- the output vector of currents can represent the output of a vector-matrix multiplication of the vector of input voltages with sparsely connected matrix of conductances formed by the grid 28 according to Ohm's law.
- the resistances formed at the intersection of the electrodes of the grid 28 and the mesh 30 can be adjusted by tuning or fitting to known sets of input/output pairs until a useful matrix of conductances is formed.
- the matrix of conductances formed by the SVMM engine 22 is made of unknown or random resistances, formed by random connections between electrodes of the grid 28 via coaxial nanowires 10 of mesh 30
- the conductances can be adjusted by applying a combination of SET-voltages and/or RESET-voltages to the electrodes of the grid 28 and observing the outputs.
- Various fitting techniques and algorithms may be used to determine the direction by which the electrode-mesh interface resistances should be adjusted.
- the interface resistances can be adjusted through a variety of means, including using voltage pulses at the electrodes of the grid 28 to switch or nudge the resistances according to temperature-driven phase-change memory mechanisms, unipolar resistive switching, or bipolar memristive mechanisms. These techniques can be used to tune the values of the conductance matrix to a task, for instance as a content-addressable memory (CAM), a neural network layer, or as a more general memory interconnect. Examples of algorithms, which can be used in connection with the SVMM engine 22 , can be found in International Patent Application No.
- the shape, number and geometry of the grid 28 can be modified based on implementation.
- the electrodes need not be in a grid format.
- Various design and implementation considerations may dictate an alternative geometry of the SVMM engine 22 without departing from the spirit of the described technology.
- FIG. 4 illustrates a diagram of an embodiment of the circuit layer 26 and the electrodes of the grid 28 .
- Circuit layer 26 can be implemented as a CMOS layer and can include components such as an input register 32 , an output register 34 , a column driver 36 , a row driver 38 , one or more digital to analog converters (DACs) 40 , one or more analog to digital converters (ADCs) 42 , amplifiers 46 , switches 44 and other components and circuitry as may be used to implement the functionality of the SVMM engine 22 in the circuit layer 26 .
- Electrodes of the grid 28 are shown for illustration purposes, but in some embodiments, the electrodes of the grid 28 are metal pillars grown above the circuit layer 26 and may not be a part of the circuit layer 26 .
- Mesh 30 while not shown in FIG. 4 , is built above the electrodes of the grid 28 and provides random electrical connections between those electrodes as described in relation to FIG. 3 .
- Electrodes of the grid 28 can be connected to the column driver 36 and row driver 38 .
- the column and row drivers 36 and 38 include circuitry (e.g., logic gates, high and low power supply rails, etc.) to provide various voltages to the electrodes of the grid 28 .
- the row and column drivers 36 and 38 can provide one or more bias voltages in the range above the RESET-voltage and below the SET-voltage to enable writing voltages and/or reading currents to or from one or more electrodes of the grid 28 .
- Column and row drivers 36 and 38 can receive a training signal with respect to one or more electrodes of the grid 28 .
- the column and/or row drivers 36 and 38 can provide a training voltage pulse above the SET-voltage or below the RESET-voltage to adjust the resistances at the electrode-mesh interfaces. If the training signal for one or more electrodes within the grid 28 is OFF, the column and/or row drivers 36 and 38 would not apply voltages above the SET-voltage or voltages below the RESET-voltage.
- the SVMM engine 22 operates at a virtual ground (mid-supply) and when the training signal is ON, a train of pulse voltages are sent to a transistor gate that connects one or more electrodes of the grid 28 to high power supply rail (Vdd) or to the low power supply rail (ground).
- the column or row drivers 36 and 38 can receive one or more control signals indicating in which direction (e.g., high or low) the resistances at interfaces of the electrodes of the grid 28 and mesh 30 should be moved.
- the circuit layer 26 can be designed to enable addressing each electrode of the grid 28 individually or it can be designed to address multiple electrodes of the grid 28 in parallel for efficiency purposes and to save on-chip area consumed by the circuit layer 26 and components therein.
- the SVMM engine 22 can receive digital input signals (e.g., at predetermined intervals, intermittently or at random) from a variety of sources and depending on the application in which the SVMM engine 22 is used.
- Digital input signals can include sensor input data, mathematical image parameters representing physical phenomena, artificial intelligence input data, training input data, still photographs, frames of video images, intervals of speech and any other input signal for the purposes of vector-matrix multiplication.
- One or more DACs 40 can be used to convert the digital input signals to analog voltages that can be sourced on the electrodes of the grid 28 .
- One or more ADCs 42 can convert the analog output currents to digital signals, which can be outputted in the output register 34 and transmitted off-chip for further processing or other tasks.
- the SVMM engine 22 can be configured such that each electrode of the grid 28 can be an input or an output node, as opposed to devices where only the edge nodes can be input or output nodes. For efficiency purposes, multiple electrodes of the grid 28 can be set in parallel as input electrodes and the remaining electrodes can be read as output electrodes.
- the circuit layer 26 can be designed with switches 44 , which can connect an electrode of the grid 28 to a DAC 40 making that electrode available to receive an input signal. The switch 44 can be toggled to connect an electrode of the grid 28 to an ADC 42 , making the electrode available as an output electrode.
- multiple electrodes of the grid 28 can be used as input electrodes (by for example, appropriately positioning the switches 44 to DACs 40 or via other techniques known to persons of ordinary skill in the art).
- the remaining electrodes of the grid 28 can be used as output electrodes (e.g., the remaining columns).
- the size of input and output electrode sets can be permanent (e.g., by permanent connections in lieu of the switches 44 ) or be flexible (e.g., by an input/output selector signal controlling the switches 44 individually or in batches) or a combination of permanent and flexible.
- the circuit layer 26 is configured with the columns A and B used for input signals and columns C and D used for reading output signals.
- Each electrode of the grid 28 is connected to an amplifier 46 in the circuit layer 26 .
- the amplifier 46 can be configured as a buffer or sample-and-hold (SAH) amplifier if its corresponding electrode is to be an input node.
- SAH sample-and-hold
- the amplifier 46 can also be configured as a transimpedance amplifier to sense a current output if its corresponding electrode is to be an output node.
- a controller 48 can configure the amplifiers 46 as input or output amplifiers, individually or in batches. Controller 48 can also coordinate other functions of the circuits in the circuit layer 26 , such as controlling the switches 44 via input/output selector signal, configuring the amplifiers 46 , and various functionality related to column and row drivers 36 and 38 as described above. The controller 48 can also manage the timing of various operations and components of the SVMM engine 22 , such as the timing of feeding input vectors from input register 32 into the input electrodes of the grid 28 , the timing of reading output currents from the output electrodes of the grid 28 and the timing of other functions and components. The controller 48 can include circuitry such as short-term and/or long-term memory, storage, one or more processors, clock signal and other components to perform its function.
- one or more electrodes can be in training mode, where the resistance at the electrode-mesh interface is to be adjusted.
- training is used because in some applications, such as neural network training, the interface resistances can be adjusted based on training algorithms in neural networks (e.g., gradient descent) to construct an effective matrix of conductances.
- the individual resistances at electrode-mesh interfaces may not be known, nonetheless using training algorithms and observing input/output pairs, the resistances can be adjusted up and down until an effective matrix of conductances is formed by the resistances of the collection of electrode-mesh interfaces.
- one or more electrodes can be in WRITE mode, where an appropriate amount of voltage bias is applied from the column and/or row drivers 36 and 38 , and an input voltage value is sourced at the one or more electrodes via one or more corresponding DACs 40 .
- one or more electrodes can be in READ mode, where an appropriate amount of voltage bias is applied from the column and/or row drivers 36 and 38 , and an output current is read from the one or more electrodes via one or more corresponding ADCs 42 .
- Columns A and B are assigned as input electrodes and columns C and D are assigned as output electrodes.
- the amplifiers 46 in columns A and B are configured as SAH amplifiers and the amplifiers 46 in columns C and D are configured as transimpedance amplifiers capable of sensing current.
- Switches 44 in columns A and B connect amplifiers 46 in Columns A and B to DACs 40 .
- the Switches 44 in columns C and D connect the amplifiers 46 in columns C and D to the ADCs 42 of columns C and D.
- Column and row drivers 36 and 38 provide appropriate biasing voltages to the electrodes to enable the electrodes in columns A and B for WRITE mode and enable electrodes in columns C and D for READ mode.
- One or more DACs 40 convert a first vector of digital input signals received in input register 32 to analog signals and place them on the amplifiers 46 in column A.
- the amplifiers 46 in column A configured as SAH amplifiers, hold the input voltages on the electrodes in column A.
- one or more DACs 40 convert a second vector of digital input signals received in input register 32 to analog signals and place them on the amplifiers 46 in column B.
- the amplifiers 46 in column B configured as SAH amplifiers, hold the input voltages on the electrodes in column B. The process can continue if additional columns are used for input and until input columns are fed.
- the controller 48 can begin scanning and reading output currents at the ADCs 42 in columns C and D and outputting the result into output register 34 .
- Vdd is above the SET-voltage and ground voltage is below the RESET-voltage.
- the SVMM engine 22 and its circuit layer 26 can be configured where individual electrodes can be used as input/output nodes.
- half of the electrodes in the grid 28 can be used as input electrodes and half of the electrodes in the grid 28 can be used as output electrodes.
- Other divisions of electrodes as inputs or outputs are also possible depending on the application.
- the controller 48 can scan through the output nodes and read the first output current (I 1 ) via an amplifier 46 configured as a current-sensing amplifier and use an ADC 42 to convert the output current I 1 to a digital output and update the output register 34 . Then the next output current 12 is read from the second output electrode via the amplifier 46 of the second output electrode, configured as a current-sensing amplifier. The process continues until the output current of the last output electrode is read and the output register 34 is updated. The nodes (or neurons if the SVMM engine 22 is used in neural networks) are traversed once.
- the SVMM engine 22 can be effectively used in many applications, where the matrix of conductances of the SVMM engine 22 can be adjusted in various directions in a manner that optimizes a desirable function, for example, an activation function in a neural network and in other contexts.
- Other computational tasks can also utilize the SVMM engine 22 . Examples include, various image processing tasks, where images are matrix data representing physical phenomena, such as speech, weather, temperature, or other data structures as may exist in financial data, radar data, sensor data, still photographs, video image frames and other applications.
- the SVMM engine 22 utilizing a sparse network of conductances, offers a substantial performance advantage compared to devices using dense network of conductances.
- Sparse networks are similar to the way human brain functions. In sparse networks, not every information node is connected to all other information nodes. Only some information nodes are connected. The sparsity is believed to enable superior computational ability with economical use of area and resources compared to the more expensive networks, such as dense networks, where all information nodes are connected. Additionally, dot-product engines utilizing dense networks have proven expensive and complicated to design and operate due to the need to precisely control the resistances of the conductance matrix and difficulties in electrical or mechanical control of nanoscale material.
- the sparse matrix of conductances of SVMM engine 22 offers a higher performance to area ratio compared to devices utilizing dense networks for matrix multiplication.
- input and output nodes or neurons in the context of neural networks
- input and output nodes or neurons in dense networks exist only in lateral edges of the network. This allows the SVMM engine 22 a quadratic scaling advantage compared to devices using dense networks. As one increases the number of neurons, the SVMM engine 22 can maintain the same density of neurons, while devices using dense networks start to lose density.
- the SVMM engine 22 can be implemented as part of a computer system, a graphics processing unit (GPU), a hardware accelerator or as part of an artificial intelligence processor, a machine learning processor, a memory interconnect or other devices where matrix operations are used.
- GPU graphics processing unit
- an artificial intelligence processor e.g., a machine learning processor
- memory interconnect e.g., a graphics interconnect
- FIG. 5 illustrates a flow chart of a method 50 of sparse vector-matrix multiplication according to an embodiment.
- the method can be implemented in hardware using embodiments of FIGS. 3 and 4 .
- the method 50 starts at the step 52 .
- the method continues to the step 54 by providing a plurality of electrodes on a silicon substrate.
- the method then moves to the step 56 by forming a layer of randomly arranged coaxial nanowires on the plurality of electrodes.
- the method then moves to the step 58 by receiving a plurality of digital input signals.
- the method moves to the step 60 by converting the plurality of digital input signals to a plurality of analog input signals.
- the method then moves to the step 62 by writing the plurality of analog input signals on an input set of the plurality of electrodes.
- the method then moves to the step 64 by reading from an output set of the plurality of electrodes a plurality of analog output signals.
- the method then moves to the step 66 by converting the plurality of analog output signals to a plurality of digital output signals.
- the method then moves to the step 68 by outputting the plurality of digital output signals.
- the method ends at the step 70 .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Neurology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Power Engineering (AREA)
- Semiconductor Memories (AREA)
- Complex Calculations (AREA)
Abstract
Description
- This application claims the benefit of priority of U.S. Provisional Application No. 62/653,194 filed on Apr. 5, 2018 entitled “Analog Processor for Sparse Vector-Matrix Multiplication,” content of which is incorporated herein by reference in its entirety and should be considered a part of this specification.
- This invention relates generally to computer hardware, and in particular to accelerators designed for performing efficient matrix operations in fields such as artificial intelligence and memory devices.
- Matrix operations are used in a variety of modern computing tasks. Many physical phenomena can be represented by one or more matrices of numerical values and processed in modern computers. For example, still photographs, video image frames, sensor output data, an interval of speech, financial transaction data, autonomous driving sensor data, and many other physical objects or parameters can be represented by one or more matrices of numerical values suitable for processing, manipulation and operation in modern computers. While general-purpose computing hardware can be used to perform matrix operations, the characteristics of matrix data and matrix operations can make them good candidates for designing hardware customized to more efficiently process matrix workloads and matrix operations compared to general-purpose computers. One form of matrix operation frequently used in modern computing tasks is digital vector-matrix multiplication.
- In conventional digital vector-matrix multiplication, a vector of input values is provided along with a matrix of parameter values. The multiplication of the two results in a single vector output. If the size of the input and output vectors are both n, the computational complexity of this operation scales as O(n2).
- An alternative style of vector-matrix multiplication with crossbar arrays and associated hardware uses the concept of in-memory computing and is limited only by the speed at which data can be loaded into the array. This results in O(n) scaling in computation time. These architectures are commonly known as dot-product engines (DPEs). However, in order to maintain this O(n) scaling, DPEs require O(n2) spatial resources on-chip for storing the parameter values in dense crossbar matrices. Thus, deploying existing DPEs, in some cases, require an undesirable trade-off between computation time efficiency and on-chip area.
- Consequently, there is a need for devices and methods of operating them that allow vector-matrix multiplication, which also scale as O(n) in computation time and spatial resources, as opposed to scaling quadratically in chip area, as the case maybe in traditional DPEs.
- In one aspect of the invention, a system of sparse vector-matrix multiplication is disclosed. The system includes: a silicon substrate; a circuit layer formed in or on the substrate; a plurality of electrodes formed on the circuit layer; and a mesh formed randomly on the plurality of electrodes, wherein the circuit layer is configured to: receive a plurality of digital input signals; convert the plurality of digital input signals to a plurality of analog input signals; write the plurality of analog input signals on an input set of the plurality of electrodes; read from an output set of the plurality of electrodes a plurality of analog output signals, convert the plurality of analog output signals to a plurality of digital output signals, and output the plurality of digital output signals.
- In some embodiments, the mesh includes coaxial nanowires having a metal core wrapped in two-terminal non-volatile memory (NVM) material.
- In one embodiment, the non-volatile memory material includes a voltage-controlled resistance.
- In another embodiment, the circuit layer includes: an input register configured to receive the plurality of the digital input signals; one or more digital to analog converters configured to convert the plurality of digital input signals to a plurality of analog input signals; one or more analog to digital converters configured to convert the plurality of analog output signals to the plurality of digital output signals; and an output register configured to receive and store the plurality of digital output signals.
- In one embodiment, the circuit layer further includes a column driver and a row driver configured to selectively provide biasing voltages and/or training voltages to the plurality of the electrodes.
- In another embodiment, the plurality of analog input signals include voltages and the plurality of analog output signals include currents, or vice versa.
- In some embodiments, the plurality of analog input signals include voltages, the plurality of analog output signals include currents, and the circuit layer further includes: a plurality of amplifiers coupled to the plurality of electrodes, wherein amplifiers coupled to the input set of the plurality of electrodes are configured as a sample-and-hold (SAH) amplifier and configured to write the plurality of analog input signals to the input set, and amplifiers coupled to the output set of the plurality of the electrodes are configured as current-sensing amplifiers and configured to read the plurality of analog output signals.
- In one embodiment, the plurality of electrodes include neurons in a neural network layer.
- In another embodiment, the plurality of electrodes and the randomly formed mesh include a matrix of conductances.
- In some embodiments, the matrix of conductances is tunable using one or more of temperature-driven phase-change memory mechanisms, unipolar resistive switching, and bipolar memristive mechanisms.
- In another aspect of the invention, a method of sparse vector-matrix multiplication is disclosed. The method includes: providing a plurality of electrodes on a silicon substrate; forming a layer of randomly arranged coaxial nanowires on the plurality of electrodes; receiving a plurality of digital input signals; converting the plurality of digital input signals to a plurality of analog input signals; writing the plurality of analog input signals on an input set of the plurality of electrodes; reading from an output set of the plurality of electrodes a plurality of analog output signals; converting the plurality of analog output signals to a plurality of digital output signals; and outputting the plurality of digital output signals.
- In one embodiment, the coaxial nanowires include a metal core wrapped in two-terminal non-volatile memory (NVM) material.
- In another embodiment, the NVM material includes one or more of a voltage-controlled resistance, memristor, phase-change material (PCM), and resistive random-access-memory (ReRAM) material.
- In some embodiments, the method further includes: selectively providing biasing voltages to the plurality of the electrodes to enable writing voltages into or reading currents from the plurality of the electrodes.
- In one embodiment, voltage-controlled resistances are formed at intersections of the plurality of the electrodes and the randomly arranged coaxial nanowires and the method further comprises selectively providing training voltages to the plurality of the electrodes to adjust the voltage-controlled resistances.
- In some embodiments, the method further includes receiving a training signal indicating which electrodes in the plurality of the electrodes are to be applied the training voltages.
- In one embodiment, the plurality of the electrodes include neurons in a neural network layer.
- In another embodiment, the plurality of the electrodes and the layer of randomly arranged coaxial nanowires form a matrix of conductances and the conductances are tuned by performing gradient descent.
- In some embodiments, the plurality of analog input signals include voltages and the plurality of analog output signals include currents.
- In one embodiment, the input and output sets each comprise half of the electrodes of the plurality of electrodes.
- These drawings and the associated description herein are provided to illustrate specific embodiments of the invention and are not intended to be limiting.
-
FIG. 1 illustrates a diagram of a matrix in a dot-product engine used to perform vector-matrix multiplication. -
FIG. 2 illustrates a diagram of a coaxial nanowire, which can be utilized in building high efficiency computing hardware. -
FIG. 3 illustrates a diagram of a sparse vector-matrix multiplication (SVMM) engine according to an embodiment. -
FIG. 4 illustrates a diagram of an embodiment of the circuit layer and the electrodes of the embodiment ofFIG. 3 . -
FIG. 5 illustrates a flow chart of a method of sparse vector-matrix multiplication according to an embodiment. - The following detailed description of certain embodiments presents various descriptions of specific embodiments of the invention. However, the invention can be embodied in a multitude of different ways as defined and covered by the claims. In this description, reference is made to the drawings where like reference numerals may indicate identical or functionally similar elements.
- Unless defined otherwise, all terms used herein have the same meaning as are commonly understood by one of skill in the art to which this invention belongs. All patents, patent applications and publications referred to throughout the disclosure herein are incorporated by reference in their entirety. In the event that there is a plurality of definitions for a term herein, those in this section prevail.
- Definitions
- The term “about” as used herein refers to the ranges of specific measurements or magnitudes disclosed. For example, the phrase “about 10” means that the number stated may vary as much as 1%, 3%, 5%, 7%, 10%, 15% or 20%. Therefore, at the variation range of 20% the phrase “about 10” means a range from 8 to 12.
- When the terms “one”, “a” or “an” are used in the disclosure, they mean “at least one” or “one or more”, unless otherwise indicated.
- The term “processor” can refer to various microprocessors, controllers, and/or hardware and software optimized for loading and executing software programming instructions or processors including graphics processing units (GPUs) optimized for handling high volume matrix data related to image processing.
- The term “conductance” refers to the degree by which a component conducts electricity. Conductance can be calculated as the ratio of the current that flows through the component to the potential difference present across the component. Conductance is the reciprocal of the resistance and is measured in siemens.
- The term “dense” in the context of matrix multiplication engines described herein can refer to engines where there is an electrical connection or path from each input to each output node of the matrix multiplication engine.
- The term “sparse” in the context of matrix multiplication engines described herein can refer to an engine where not all possible or available connections are made between input and output nodes of the matrix multiplication engine.
- Dense Matrix Multiplication Engine
- One example of hardware specialized for performing vector-matrix multiplication is a dot-product-engine based on a crossbar architecture.
-
FIG. 1 illustrates a diagram of amatrix 20 in a dot-product engine used to perform vector-matrix multiplication.Matrix 20 utilizes a crossbar array architecture and includes horizontal input voltage lines intersecting vertical output current lines. As an example, the input/output voltage/current lines can be neurons in a neural network layer, whenmatrix 20 is used to perform vector-matrix multiplication in the context of neural networks. The input voltage lines and output current lines are made of conductive metal material. At intersections of input voltage lines and output current lines, a material made of non-volatile memory (NVM) 21 connects the input voltage lines to the intersecting output current lines. In some implementations, this is achieved via lithographically patterning electrode lines (horizontal and vertical lines) to sandwich an NVM-type material 21. The vector of input voltages is applied on the input voltage lines. The output current at each column is determined by the sum of currents from each intersection of that column with input voltage lines and determined by applying Kirchhoff s current law (KCL) for each intersection. Thematrix 20 is partially formed byNVM material 21 whose resistances are controllable by applying an appropriate voltage. Therefore, a matrix of parameter values (e.g., matrix of weights in a layer of a neural network) can be constructed in thematrix 20 by adjusting the intersection resistances to match the matrix of parameter values of a desired computation. - Performance of Dense Matrix Multiplication Engine
- The dot-product
engine utilizing matrix 20 can be characterized as a dense array structure, where each input and output are connected. Notably, the chip area required to implement thematrix 20 scales quadratically relative to the number of input and output neurons it provides. In other words, input/output neurons and chip area needed to implement thematrix 20 scale at different rates. While input/output neurons scale linearly (on the edges of matrix 20), the chip area needed to implement vector-matrix multiplication of those additional neurons grows quadratically (in the area of the matrix 20). - Coaxial Nanowires
- Recent discoveries of new materials have the potential to revolutionize computing hardware and approaches to designing hardware tasked with executing software. As computing hardware designed only based on traditional silicon material approach its physical limit of performance, designs based on new material alone or in combination with silicon-based circuits promise greater efficiency in hardware. The discovery and ability to build nanoscale material, such as nanowires and materials with desirable mechanical, or electrical properties promise advancements and improvements in computing methods and devices, including hardware customized and optimized for performing matrix multiplication.
-
FIG. 2 illustrates a diagram of acoaxial nanowire 10, which as will be described can be utilized in building high efficiency computing hardware. Thecoaxial nanowire 10 includes ametal core 12 wrapped in a two-terminal non-volatile memory (NVM)material 14. Thecoaxial nanowire 10 touches twometal electrodes - The NVM material is a two-terminal device, whose resistance is controlled by voltages applied above or below some threshold voltages across the two terminals. For example, when the
electrode 16 applies a voltage above a positive threshold voltage (SET-voltage) to theNVM material 14, theNVM material 14 may undergo dielectric breakdown and one or more conductive filaments are formed through it, thereby lowering its electrical resistance and increasing its conductivity. Subsequently, the electrical connection between theelectrodes conductive NVM material 14 and themetal core 12. When theelectrode 16 applies a voltage below a negative voltage threshold (RESET-voltage) to theNVM material 14, the dielectric breakdown process is reversed, the filaments dissolve away and the electrical resistance of theNVM material 14 reverts to its original value or some other lower resistance, thereby weakening the electrical connection between theelectrodes NVM material 14 is transformed to low resistance state (LRS) and for voltages below the RESET-voltage, theNVM material 14 is transformed to high resistance state (HRS). In other words,coaxial nanowire 10 forms a memory device at the intersection of its contact with an electrode. The resistance at the interface is dependent upon previously applied voltage (if the previous voltage was above the SET-voltage or below the RESET voltage). - Examples of
NVM material 14 includes, memristors, phase-change material (PCM), resistive random-access-memory (ReRAM) material, or any other material whose resistance is voltage-controlled, including any material which retains a resistance in response to an applied voltage with respect to one or more threshold voltages. - Sparse Vector-Matrix Multiplication Engine
- An application of the
coaxial nanowire 10 can be explored in the context of hardware designed to perform matrix multiplication. Matrix multiplication is used in many modern computing tasks, such as artificial intelligence (AI), machine learning, neural network, neural network training, various transforms (e.g., Discrete Fourier Transform), and others. The non-volatile and controllable memory properties of the shell ofcoaxial nanowire 10 can be exploited to make hardware that can efficiently perform matrix multiplication. A form of matrix multiplication used in modern computing tasks is digital vector-matrix multiplication, where a vector of input values is multiplied by a matrix of parameter values. The multiplication yields an output vector. Thecoaxial nanowire 10 can be used to construct the matrix of parameter values. Parameter values can be any parameter values used in various computing tasks, for example weights in a layer of neural network. A vector of input values can be multiplied by a matrix of weights producing an output vector of the layer. - To design computing hardware that can perform vector-matrix multiplication, one can represent the digital input values with a vector of analog input voltages and a matrix of conductances can represent the matrix of parameter values. The Ohm's law equation (I=VG, where I is current, V is voltage and G is conductance) can be used to obtain an analog output vector of currents representing the output of multiplication of the input values by the matrix of parameter values. The analog output vector can be converted to digital output values and outputted.
- Compared to the dot-product engine of
FIG. 1 , an alternative engine for performing vector-matrix multiplication can use electrodes, distributed over a chip area, sparsely connected withcoaxial nanowires 10, where electrodes can distribute input and output nodes over the chip area where they exist as opposed to only the lateral edges of a crossbar array as is the case in the dot-product engine ofFIG. 1 . A network of distributed electrodes sparsely-connected withcoaxial nanowires 10 can construct a conductance matrix, which can be used as a parameter matrix in desired computations. A subset of the electrodes can be used to feed a vector of input voltages and the complementary subset of the electrodes can be probed to read a vector of output currents. The output vector of currents is the result of vector-matrix multiplication of the vector of input voltages with the matrix of conductances according to Ohm's law. -
FIG. 3 illustrates a diagram of a sparse vector-matrix multiplication (SVMM)engine 22. TheSVMM engine 22 includes asilicon substrate 24, control circuitry within acircuit layer 26, for example, a complementary metal-oxide-semiconductor (CMOS) layer, a grid ofelectrodes 28 and a randomly formedmesh 30 ofcoaxial nanowires 10 deposited on top of thegrid 28.Mesh 30 is placed above or formed on top of theelectrode grid 28, providing physical contact between themesh 30 and the top ofelectrode grid 28. Alternatively, the electrodes of thegrid 28 can be grown through themesh 30 as pillars of metal. Thecoaxial nanowires 10 deposited randomly on top of the electrodes of thegrid 28 can provide electrical connections between the electrodes that they contact. Consequently, thecoaxial nanowires 10 sparsely connect the electrodes of thegrid 28. The strength of the electrical connections between the electrodes can be modulated based on increasing or decreasing the resistances of thecoaxial nanowires 10. - In a training mode of the
SVMM engine 22, the circuitry in thecircuit layer 26 can be used to apply a SET-voltage or a RESET-voltage to some or all of thecoaxial nanowires 10 in themesh 30 via electrodes in thegrid 28. The electrical resistances of thecoaxial nanowires 10 inmesh 30 can increase or decrease depending on the voltages they receive via the electrodes in thegrid 28, thereby strengthening or weakening the electrical connections between the electrodes of thegrid 28. Because thecoaxial nanowires 10 inmesh 30 are randomly formed, they can create random electrical connections between the electrodes in thegrid 28 via the NVM-type material and the metal cores of thenanowires 10. Thus, the electrodes of thegrid 28 are sparsely connected via thecoaxial nanowires 10 ofmesh 30. - The
grid 28, sparsely connected with themesh 30 forms a sparsely connected matrix of conductances, which can be used for vector-matrix multiplication. A vector of input voltages can be applied to a subset of the electrodes in the grid 28 (the input electrodes) and the remainder of the electrodes (the output electrodes) can be used to read an output vector of currents. In this arrangement, the output vector of currents can represent the output of a vector-matrix multiplication of the vector of input voltages with sparsely connected matrix of conductances formed by thegrid 28 according to Ohm's law. - In various applications, the resistances formed at the intersection of the electrodes of the
grid 28 and themesh 30 can be adjusted by tuning or fitting to known sets of input/output pairs until a useful matrix of conductances is formed. In other words, although the matrix of conductances formed by theSVMM engine 22 is made of unknown or random resistances, formed by random connections between electrodes of thegrid 28 viacoaxial nanowires 10 ofmesh 30, the conductances can be adjusted by applying a combination of SET-voltages and/or RESET-voltages to the electrodes of thegrid 28 and observing the outputs. Various fitting techniques and algorithms may be used to determine the direction by which the electrode-mesh interface resistances should be adjusted. - The interface resistances, corresponding to the matrix of conductances formed by
grid 28 andmesh 30, can be adjusted through a variety of means, including using voltage pulses at the electrodes of thegrid 28 to switch or nudge the resistances according to temperature-driven phase-change memory mechanisms, unipolar resistive switching, or bipolar memristive mechanisms. These techniques can be used to tune the values of the conductance matrix to a task, for instance as a content-addressable memory (CAM), a neural network layer, or as a more general memory interconnect. Examples of algorithms, which can be used in connection with theSVMM engine 22, can be found in International Patent Application No. PCT/US2018/033669, filed on May 21, 2018 and titled, “DEEP LEARNING IN BIPARTITE MEMRISTIVE NETWORKS.” In one embodiment, gradient descent learning can be used to tune the conductance matrix of theSVMM engine 22. - The shape, number and geometry of the
grid 28 can be modified based on implementation. In some embodiments, the electrodes need not be in a grid format. Various design and implementation considerations may dictate an alternative geometry of theSVMM engine 22 without departing from the spirit of the described technology. -
FIG. 4 illustrates a diagram of an embodiment of thecircuit layer 26 and the electrodes of thegrid 28.Circuit layer 26 can be implemented as a CMOS layer and can include components such as aninput register 32, anoutput register 34, acolumn driver 36, arow driver 38, one or more digital to analog converters (DACs) 40, one or more analog to digital converters (ADCs) 42,amplifiers 46, switches 44 and other components and circuitry as may be used to implement the functionality of theSVMM engine 22 in thecircuit layer 26. Electrodes of thegrid 28 are shown for illustration purposes, but in some embodiments, the electrodes of thegrid 28 are metal pillars grown above thecircuit layer 26 and may not be a part of thecircuit layer 26.Mesh 30, while not shown inFIG. 4 , is built above the electrodes of thegrid 28 and provides random electrical connections between those electrodes as described in relation toFIG. 3 . - Electrodes of the
grid 28 can be connected to thecolumn driver 36 androw driver 38. The column androw drivers grid 28. In one embodiment, the row andcolumn drivers grid 28. Column androw drivers grid 28. If the training signal is ON, the column and/orrow drivers grid 28 is OFF, the column and/orrow drivers - In some embodiments, the
SVMM engine 22 operates at a virtual ground (mid-supply) and when the training signal is ON, a train of pulse voltages are sent to a transistor gate that connects one or more electrodes of thegrid 28 to high power supply rail (Vdd) or to the low power supply rail (ground). The column orrow drivers grid 28 andmesh 30 should be moved. - The
circuit layer 26 can be designed to enable addressing each electrode of thegrid 28 individually or it can be designed to address multiple electrodes of thegrid 28 in parallel for efficiency purposes and to save on-chip area consumed by thecircuit layer 26 and components therein. - The
SVMM engine 22 can receive digital input signals (e.g., at predetermined intervals, intermittently or at random) from a variety of sources and depending on the application in which theSVMM engine 22 is used. Digital input signals can include sensor input data, mathematical image parameters representing physical phenomena, artificial intelligence input data, training input data, still photographs, frames of video images, intervals of speech and any other input signal for the purposes of vector-matrix multiplication. - One or
more DACs 40 can be used to convert the digital input signals to analog voltages that can be sourced on the electrodes of thegrid 28. One or more ADCs 42 can convert the analog output currents to digital signals, which can be outputted in theoutput register 34 and transmitted off-chip for further processing or other tasks. - The
SVMM engine 22 can be configured such that each electrode of thegrid 28 can be an input or an output node, as opposed to devices where only the edge nodes can be input or output nodes. For efficiency purposes, multiple electrodes of thegrid 28 can be set in parallel as input electrodes and the remaining electrodes can be read as output electrodes. Thecircuit layer 26 can be designed withswitches 44, which can connect an electrode of thegrid 28 to aDAC 40 making that electrode available to receive an input signal. Theswitch 44 can be toggled to connect an electrode of thegrid 28 to anADC 42, making the electrode available as an output electrode. For efficiency purposes multiple electrodes of the grid 28 (e.g., one or more columns of them) can be used as input electrodes (by for example, appropriately positioning theswitches 44 to DACs 40 or via other techniques known to persons of ordinary skill in the art). The remaining electrodes of thegrid 28 can be used as output electrodes (e.g., the remaining columns). In some embodiments, the size of input and output electrode sets can be permanent (e.g., by permanent connections in lieu of the switches 44) or be flexible (e.g., by an input/output selector signal controlling theswitches 44 individually or in batches) or a combination of permanent and flexible. - For illustrations purposes and as an example, the
circuit layer 26 is configured with the columns A and B used for input signals and columns C and D used for reading output signals. Each electrode of thegrid 28 is connected to anamplifier 46 in thecircuit layer 26. Theamplifier 46 can be configured as a buffer or sample-and-hold (SAH) amplifier if its corresponding electrode is to be an input node. Theamplifier 46 can also be configured as a transimpedance amplifier to sense a current output if its corresponding electrode is to be an output node. - In one embodiment, a
controller 48 can configure theamplifiers 46 as input or output amplifiers, individually or in batches.Controller 48 can also coordinate other functions of the circuits in thecircuit layer 26, such as controlling theswitches 44 via input/output selector signal, configuring theamplifiers 46, and various functionality related to column androw drivers controller 48 can also manage the timing of various operations and components of theSVMM engine 22, such as the timing of feeding input vectors from input register 32 into the input electrodes of thegrid 28, the timing of reading output currents from the output electrodes of thegrid 28 and the timing of other functions and components. Thecontroller 48 can include circuitry such as short-term and/or long-term memory, storage, one or more processors, clock signal and other components to perform its function. - Modes of Operations on the Electrodes
- Various operations may be performed on the electrodes of the
grid 28 depending on the operation mode. In one mode, one or more electrodes can be in training mode, where the resistance at the electrode-mesh interface is to be adjusted. The term “training” is used because in some applications, such as neural network training, the interface resistances can be adjusted based on training algorithms in neural networks (e.g., gradient descent) to construct an effective matrix of conductances. During such operations, the individual resistances at electrode-mesh interfaces may not be known, nonetheless using training algorithms and observing input/output pairs, the resistances can be adjusted up and down until an effective matrix of conductances is formed by the resistances of the collection of electrode-mesh interfaces. - In another mode, one or more electrodes can be in WRITE mode, where an appropriate amount of voltage bias is applied from the column and/or
row drivers corresponding DACs 40. - In another mode, one or more electrodes can be in READ mode, where an appropriate amount of voltage bias is applied from the column and/or
row drivers corresponding ADCs 42. - Example Operation of
SVMM Engine 22 - An example and configuration of the
SVMM engine 22 is now described in relation toFIG. 4 . Columns A and B are assigned as input electrodes and columns C and D are assigned as output electrodes. Theamplifiers 46 in columns A and B are configured as SAH amplifiers and theamplifiers 46 in columns C and D are configured as transimpedance amplifiers capable of sensing current.Switches 44 in columns A and B connectamplifiers 46 in Columns A and B to DACs 40. TheSwitches 44 in columns C and D connect theamplifiers 46 in columns C and D to theADCs 42 of columns C and D. Column androw drivers - One or
more DACs 40 convert a first vector of digital input signals received in input register 32 to analog signals and place them on theamplifiers 46 in column A. Theamplifiers 46 in column A, configured as SAH amplifiers, hold the input voltages on the electrodes in column A. Next, one ormore DACs 40 convert a second vector of digital input signals received in input register 32 to analog signals and place them on theamplifiers 46 in column B. Theamplifiers 46 in column B, configured as SAH amplifiers, hold the input voltages on the electrodes in column B. The process can continue if additional columns are used for input and until input columns are fed. - When input voltages are written into the electrodes in columns A and B, the
controller 48 can begin scanning and reading output currents at theADCs 42 in columns C and D and outputting the result intooutput register 34. - The process above can be repeated with the addition of the step of column and
row drivers - The
SVMM engine 22 and itscircuit layer 26 can be configured where individual electrodes can be used as input/output nodes. As an example, half of the electrodes in thegrid 28 can be used as input electrodes and half of the electrodes in thegrid 28 can be used as output electrodes. Other divisions of electrodes as inputs or outputs are also possible depending on the application. ADAC 40 writes an input voltage (V1) on an input electrode, throughamplifier 46, configured as a buffer or SAH amplifier, so V1 is held on the electrode. Then, theDAC 40 writes the next input voltage (V2) at the next input electrode via theamplifier 46 of the next input electrode. The process continues until the last input electrode receives an input voltage (Vm), where m=half the electrodes in thegrid 28. By now, the multiplication has already taken place. The voltages have been multiplied by the electrode-mesh interface resistances to yield the output currents at the output electrodes via Ohm's law. - When writing voltages on the input electrodes are concluded, the
controller 48 can scan through the output nodes and read the first output current (I1) via anamplifier 46 configured as a current-sensing amplifier and use anADC 42 to convert the output current I1 to a digital output and update theoutput register 34. Then the next output current 12 is read from the second output electrode via theamplifier 46 of the second output electrode, configured as a current-sensing amplifier. The process continues until the output current of the last output electrode is read and theoutput register 34 is updated. The nodes (or neurons if theSVMM engine 22 is used in neural networks) are traversed once. - Applications of SVMM and Analysis of Performance
- The
SVMM engine 22 can be effectively used in many applications, where the matrix of conductances of theSVMM engine 22 can be adjusted in various directions in a manner that optimizes a desirable function, for example, an activation function in a neural network and in other contexts. Other computational tasks can also utilize theSVMM engine 22. Examples include, various image processing tasks, where images are matrix data representing physical phenomena, such as speech, weather, temperature, or other data structures as may exist in financial data, radar data, sensor data, still photographs, video image frames and other applications. - The
SVMM engine 22, utilizing a sparse network of conductances, offers a substantial performance advantage compared to devices using dense network of conductances. Sparse networks are similar to the way human brain functions. In sparse networks, not every information node is connected to all other information nodes. Only some information nodes are connected. The sparsity is believed to enable superior computational ability with economical use of area and resources compared to the more expensive networks, such as dense networks, where all information nodes are connected. Additionally, dot-product engines utilizing dense networks have proven expensive and complicated to design and operate due to the need to precisely control the resistances of the conductance matrix and difficulties in electrical or mechanical control of nanoscale material. - Additionally, the sparse matrix of conductances of
SVMM engine 22 offers a higher performance to area ratio compared to devices utilizing dense networks for matrix multiplication. In the sparse network of theSVMM engine 22 input and output nodes (or neurons in the context of neural networks) can exist over the area of the conductance matrix of the engine. By contrast, input and output nodes (or neurons) in dense networks exist only in lateral edges of the network. This allows the SVMM engine 22 a quadratic scaling advantage compared to devices using dense networks. As one increases the number of neurons, theSVMM engine 22 can maintain the same density of neurons, while devices using dense networks start to lose density. - In terms of hardware, the
SVMM engine 22 can be implemented as part of a computer system, a graphics processing unit (GPU), a hardware accelerator or as part of an artificial intelligence processor, a machine learning processor, a memory interconnect or other devices where matrix operations are used. -
FIG. 5 illustrates a flow chart of amethod 50 of sparse vector-matrix multiplication according to an embodiment. The method can be implemented in hardware using embodiments ofFIGS. 3 and 4 . Themethod 50 starts at thestep 52. The method continues to thestep 54 by providing a plurality of electrodes on a silicon substrate. The method then moves to thestep 56 by forming a layer of randomly arranged coaxial nanowires on the plurality of electrodes. The method then moves to thestep 58 by receiving a plurality of digital input signals. The method moves to thestep 60 by converting the plurality of digital input signals to a plurality of analog input signals. The method then moves to thestep 62 by writing the plurality of analog input signals on an input set of the plurality of electrodes. The method then moves to thestep 64 by reading from an output set of the plurality of electrodes a plurality of analog output signals. The method then moves to thestep 66 by converting the plurality of analog output signals to a plurality of digital output signals. The method then moves to thestep 68 by outputting the plurality of digital output signals. The method ends at thestep 70. - Persons of ordinary skill in the art can appreciate that electrical properties can be defined in terms of a multitude of electrical parameters. For example, while the systems and methods described above may have been illustrated and explained in terms of conductances, voltages and currents, these parameters are related, and a person of ordinary skill in the art can readily explain or design the same described systems or perform the same methods in terms of resistances instead of conductances, currents instead of voltages, and voltages instead of currents. For example, because current and voltage are related with Ohm's law, currents instead of voltages can be used to write into an electrode and voltages read at the output.
Claims (20)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/376,169 US10430493B1 (en) | 2018-04-05 | 2019-04-05 | Systems and methods for efficient matrix multiplication |
US16/543,426 US10990651B2 (en) | 2018-04-05 | 2019-08-16 | Systems and methods for efficient matrix multiplication |
US17/217,776 US20210216610A1 (en) | 2018-04-05 | 2021-03-30 | Systems and methods for efficient matrix multiplication |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862653194P | 2018-04-05 | 2018-04-05 | |
US16/376,169 US10430493B1 (en) | 2018-04-05 | 2019-04-05 | Systems and methods for efficient matrix multiplication |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/543,426 Continuation US10990651B2 (en) | 2018-04-05 | 2019-08-16 | Systems and methods for efficient matrix multiplication |
Publications (2)
Publication Number | Publication Date |
---|---|
US10430493B1 US10430493B1 (en) | 2019-10-01 |
US20190311018A1 true US20190311018A1 (en) | 2019-10-10 |
Family
ID=68063847
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/376,169 Active US10430493B1 (en) | 2018-04-05 | 2019-04-05 | Systems and methods for efficient matrix multiplication |
US16/543,426 Active US10990651B2 (en) | 2018-04-05 | 2019-08-16 | Systems and methods for efficient matrix multiplication |
US17/217,776 Pending US20210216610A1 (en) | 2018-04-05 | 2021-03-30 | Systems and methods for efficient matrix multiplication |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/543,426 Active US10990651B2 (en) | 2018-04-05 | 2019-08-16 | Systems and methods for efficient matrix multiplication |
US17/217,776 Pending US20210216610A1 (en) | 2018-04-05 | 2021-03-30 | Systems and methods for efficient matrix multiplication |
Country Status (5)
Country | Link |
---|---|
US (3) | US10430493B1 (en) |
EP (1) | EP3776271A4 (en) |
JP (1) | JP7130766B2 (en) |
KR (1) | KR102449941B1 (en) |
WO (1) | WO2019195660A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021167798A1 (en) * | 2020-02-18 | 2021-08-26 | Rain Neuromorphics Inc. | Memristive device |
WO2021262730A1 (en) * | 2020-06-25 | 2021-12-30 | Rain Neuromorphics Inc. | Lithographic memristive array |
US11216728B2 (en) * | 2018-06-25 | 2022-01-04 | Postech Academy-Industry Foundation | Weight matrix circuit and weight matrix input circuit |
US20230007220A1 (en) * | 2018-06-07 | 2023-01-05 | Micron Technology, Inc. | Apparatus and method for image signal processing |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA3090431A1 (en) | 2018-02-26 | 2019-08-29 | Orpyx Medical Technologies Inc. | Resistance measurement array |
EP3776271A4 (en) * | 2018-04-05 | 2022-01-19 | Rain Neuromorphics Inc. | Systems and methods for efficient matrix multiplication |
US11132423B2 (en) * | 2018-10-31 | 2021-09-28 | Hewlett Packard Enterprise Development Lp | Partition matrices into sub-matrices that include nonzero elements |
US12008475B2 (en) * | 2018-11-14 | 2024-06-11 | Nvidia Corporation | Transposed sparse matrix multiply by dense matrix for neural network training |
US11184446B2 (en) | 2018-12-05 | 2021-11-23 | Micron Technology, Inc. | Methods and apparatus for incentivizing participation in fog networks |
US20200194501A1 (en) * | 2018-12-13 | 2020-06-18 | Tetramem Inc. | Implementing phase change material-based selectors in a crossbar array |
US11256778B2 (en) | 2019-02-14 | 2022-02-22 | Micron Technology, Inc. | Methods and apparatus for checking the results of characterized memory searches |
US11327551B2 (en) | 2019-02-14 | 2022-05-10 | Micron Technology, Inc. | Methods and apparatus for characterizing memory devices |
US12118056B2 (en) * | 2019-05-03 | 2024-10-15 | Micron Technology, Inc. | Methods and apparatus for performing matrix transformations within a memory array |
JP7062617B2 (en) * | 2019-06-26 | 2022-05-06 | 株式会社東芝 | Arithmetic logic unit and arithmetic method |
US10867655B1 (en) | 2019-07-08 | 2020-12-15 | Micron Technology, Inc. | Methods and apparatus for dynamically adjusting performance of partitioned memory |
US11449577B2 (en) | 2019-11-20 | 2022-09-20 | Micron Technology, Inc. | Methods and apparatus for performing video processing matrix operations within a memory array |
CN111026700B (en) * | 2019-11-21 | 2022-02-01 | 清华大学 | Memory computing architecture for realizing acceleration and acceleration method thereof |
US11853385B2 (en) | 2019-12-05 | 2023-12-26 | Micron Technology, Inc. | Methods and apparatus for performing diversity matrix operations within a memory array |
KR20210071471A (en) * | 2019-12-06 | 2021-06-16 | 삼성전자주식회사 | Apparatus and method for performing matrix multiplication operation of neural network |
CN113094791B (en) * | 2021-04-13 | 2024-02-20 | 笔天科技(广州)有限公司 | Building data analysis processing method based on matrix operation |
CN115424646A (en) * | 2022-11-07 | 2022-12-02 | 上海亿铸智能科技有限公司 | Memory and computation integrated sparse sensing sensitive amplifier and method for memristor array |
Family Cites Families (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR0185757B1 (en) | 1994-02-14 | 1999-05-15 | 정호선 | Learning method of choas circular neural net |
JPH09185596A (en) | 1996-01-08 | 1997-07-15 | Ricoh Co Ltd | Coupling coefficient updating method in pulse density type signal processing network |
JP2003263624A (en) * | 2002-03-07 | 2003-09-19 | Matsushita Electric Ind Co Ltd | Learning arithmetic circuit for neural network device |
US7392230B2 (en) | 2002-03-12 | 2008-06-24 | Knowmtech, Llc | Physical neural network liquid state machine utilizing nanotechnology |
US7219085B2 (en) * | 2003-12-09 | 2007-05-15 | Microsoft Corporation | System and method for accelerating and optimizing the processing of machine learning techniques using a graphics processing unit |
WO2008042900A2 (en) | 2006-10-02 | 2008-04-10 | University Of Florida Research Foundation, Inc. | Pulse-based feature extraction for neural recordings |
WO2009134291A2 (en) * | 2008-01-21 | 2009-11-05 | President And Fellows Of Harvard College | Nanoscale wire-based memory devices |
US20120036919A1 (en) * | 2009-04-15 | 2012-02-16 | Kamins Theodore I | Nanowire sensor having a nanowire and electrically conductive film |
US8050078B2 (en) | 2009-10-27 | 2011-11-01 | Hewlett-Packard Development Company, L.P. | Nanowire-based memristor devices |
US8433665B2 (en) | 2010-07-07 | 2013-04-30 | Qualcomm Incorporated | Methods and systems for three-memristor synapse with STDP and dopamine signaling |
KR20140071813A (en) | 2012-12-04 | 2014-06-12 | 삼성전자주식회사 | Resistive Random Access Memory Device formed on Fiber and Manufacturing Method of the same |
BR112016029682A2 (en) * | 2014-06-19 | 2018-07-10 | The Univ Of Florida Research Foundation Inc | neural networks of memristive nanofibers. |
US10198691B2 (en) | 2014-06-19 | 2019-02-05 | University Of Florida Research Foundation, Inc. | Memristive nanofiber neural networks |
JP6594945B2 (en) * | 2014-07-07 | 2019-10-23 | プロヴェナンス アセット グループ エルエルシー | Detection device and manufacturing method thereof |
US9934463B2 (en) * | 2015-05-15 | 2018-04-03 | Arizona Board Of Regents On Behalf Of Arizona State University | Neuromorphic computational system(s) using resistive synaptic devices |
CN108780492B (en) * | 2016-02-08 | 2021-12-14 | 斯佩罗设备公司 | Analog coprocessor |
EP3262571B1 (en) * | 2016-03-11 | 2022-03-02 | Hewlett Packard Enterprise Development LP | Hardware accelerators for calculating node values of neural networks |
US10762426B2 (en) | 2016-08-12 | 2020-09-01 | Beijing Deephi Intelligent Technology Co., Ltd. | Multi-iteration compression for deep neural networks |
US10171084B2 (en) * | 2017-04-24 | 2019-01-01 | The Regents Of The University Of Michigan | Sparse coding with Memristor networks |
US20180336470A1 (en) | 2017-05-22 | 2018-11-22 | University Of Florida Research Foundation, Inc. | Deep learning in bipartite memristive networks |
US11538989B2 (en) * | 2017-07-31 | 2022-12-27 | University Of Central Florida Research Foundation, Inc. | 3-D crossbar architecture for fast energy-efficient in-memory computing of graph transitive closure |
EP3776271A4 (en) * | 2018-04-05 | 2022-01-19 | Rain Neuromorphics Inc. | Systems and methods for efficient matrix multiplication |
-
2019
- 2019-04-05 EP EP19781189.6A patent/EP3776271A4/en active Pending
- 2019-04-05 KR KR1020207027083A patent/KR102449941B1/en active IP Right Grant
- 2019-04-05 JP JP2020551826A patent/JP7130766B2/en active Active
- 2019-04-05 WO PCT/US2019/025961 patent/WO2019195660A1/en active Application Filing
- 2019-04-05 US US16/376,169 patent/US10430493B1/en active Active
- 2019-08-16 US US16/543,426 patent/US10990651B2/en active Active
-
2021
- 2021-03-30 US US17/217,776 patent/US20210216610A1/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230007220A1 (en) * | 2018-06-07 | 2023-01-05 | Micron Technology, Inc. | Apparatus and method for image signal processing |
US11991488B2 (en) * | 2018-06-07 | 2024-05-21 | Lodestar Licensing Group Llc | Apparatus and method for image signal processing |
US11216728B2 (en) * | 2018-06-25 | 2022-01-04 | Postech Academy-Industry Foundation | Weight matrix circuit and weight matrix input circuit |
WO2021167798A1 (en) * | 2020-02-18 | 2021-08-26 | Rain Neuromorphics Inc. | Memristive device |
US11450712B2 (en) | 2020-02-18 | 2022-09-20 | Rain Neuromorphics Inc. | Memristive device |
US12069869B2 (en) | 2020-02-18 | 2024-08-20 | Rain Neuromorphics Inc. | Memristive device |
WO2021262730A1 (en) * | 2020-06-25 | 2021-12-30 | Rain Neuromorphics Inc. | Lithographic memristive array |
US11599781B2 (en) | 2020-06-25 | 2023-03-07 | Rain Neuromorphics Inc. | Lithographic memristive array |
JP7525656B2 (en) | 2020-06-25 | 2024-07-30 | レイン・ニューロモーフィックス・インコーポレーテッド | Lithographic Memristive Arrays |
Also Published As
Publication number | Publication date |
---|---|
US10990651B2 (en) | 2021-04-27 |
JP2021518615A (en) | 2021-08-02 |
US10430493B1 (en) | 2019-10-01 |
WO2019195660A1 (en) | 2019-10-10 |
JP7130766B2 (en) | 2022-09-05 |
KR102449941B1 (en) | 2022-10-06 |
EP3776271A1 (en) | 2021-02-17 |
US20210216610A1 (en) | 2021-07-15 |
US20200042572A1 (en) | 2020-02-06 |
EP3776271A4 (en) | 2022-01-19 |
KR20200124705A (en) | 2020-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10430493B1 (en) | Systems and methods for efficient matrix multiplication | |
CN110352436B (en) | Resistance processing unit with hysteresis update for neural network training | |
US9466362B2 (en) | Resistive cross-point architecture for robust data representation with arbitrary precision | |
CN111433792B (en) | Counter-based resistance processing unit of programmable resettable artificial neural network | |
Sebastian et al. | Tutorial: Brain-inspired computing using phase-change memory devices | |
US20240170060A1 (en) | Data processing method based on memristor array and electronic apparatus | |
US11157810B2 (en) | Resistive processing unit architecture with separate weight update and inference circuitry | |
US20190122105A1 (en) | Training of artificial neural networks | |
US20180285721A1 (en) | Neuromorphic device including a synapse having a variable resistor and a transistor connected in parallel with each other | |
US11650751B2 (en) | Adiabatic annealing scheme and system for edge computing | |
Musisi-Nkambwe et al. | The viability of analog-based accelerators for neuromorphic computing: a survey | |
CN108154225B (en) | Neural network chip using analog computation | |
Wei et al. | Emerging Memory-Based Chip Development for Neuromorphic Computing: Status, Challenges, and Perspectives | |
KR20170117861A (en) | Neural Network Systems | |
Le et al. | CIMulator: a comprehensive simulation platform for computing-in-memory circuit macros with low bit-width and real memory materials | |
Zhou et al. | Synchronous Unsupervised STDP Learning with Stochastic STT-MRAM Switching | |
CN114143412B (en) | Image processing method and image processing apparatus | |
Chen | Design of resistive synaptic devices and array architectures for neuromorphic computing | |
Tu et al. | A novel programming circuit for memristors | |
CN114761973A (en) | Capacitive processing unit | |
Busygin et al. | Memory Device Based on Memristor-Diode Crossbar and Control Cmos Logic for Spiking Neural Network Hardware |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
AS | Assignment |
Owner name: RAIN NEUROMORPHICS INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KENDALL, JACK;REEL/FRAME:049767/0779 Effective date: 20190716 |
|
AS | Assignment |
Owner name: RAIN NEUROMORPHICS INC., CALIFORNIA Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE INVENTOR'S NAME PREVIOUSLY RECORDED ON REEL 049767 FRAME 0779. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNOR:KENDALL, JACK DAVID;REEL/FRAME:049963/0278 Effective date: 20190730 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 4 |