CN117813653A

CN117813653A - Output circuit of analog neural memory for deep learning artificial neural network

Info

Publication number: CN117813653A
Application number: CN202180100959.0A
Authority: CN
Inventors: H·V·特兰; T·乌
Original assignee: Silicon Storage Technology Inc
Current assignee: Silicon Storage Technology Inc
Priority date: 2021-08-02
Filing date: 2021-11-13
Publication date: 2024-04-02

Abstract

Various embodiments of an output circuit for an analog neural memory in a deep learning artificial neural network are disclosed. In some embodiments, a common mode circuit is used with differential cells w+ and W-together storing weights W. The common mode circuit can utilize a current source, a variable resistor, or a transistor as part of the structure for introducing the common mode voltage bias.

Description

Output circuit of analog neural memory for deep learning artificial neural network

Priority statement

The present application claims priority from U.S. provisional patent application No. 63/228,529 entitled "Output Circuitry for Analog Neural Memory in a Deep Learning Artificial Neural Network (output circuitry for analog neural memory in deep learning artificial neural network)" filed on month 8 of 2021 and U.S. patent application No. 17/521,772 entitled "Output Circuitry for Analog Neural Memory in a Deep Learning Artificial Neural Network (output circuitry for analog neural memory in deep learning artificial neural network)" filed on month 11 of 2021.

Technical Field

Various embodiments of an output circuit for an analog neural memory in a deep learning artificial neural network are disclosed.

Background

Artificial neural networks model biological neural networks (the central nervous system of animals, particularly the brain), and are used to estimate or approximate functions that may depend on a large number of inputs and are generally unknown. Artificial neural networks typically include interconnected "neuron" layers that exchange messages with each other.

Fig. 1 shows an artificial neural network, wherein circles represent inputs or layers of neurons. The connections (called synapses) are indicated by arrows and have numerical weights that can be adjusted empirically. This allows the neural network to adapt to inputs and to learn. Typically, a neural network includes multiple layers of inputs. There are typically one or more intermediate layers of neurons, and an output layer of neurons that provide the output of the neural network. Neurons at each level make decisions based on data received from synapses, either individually or collectively.

One of the major challenges in developing artificial neural networks for high performance information processing is the lack of adequate hardware technology. In practice, the actual neural network relies on a large number of synapses, thereby achieving high connectivity between neurons, i.e. very high computational parallelism. In principle, such complexity may be achieved by a digital supercomputer or a cluster of dedicated graphics processing units. However, these methods are also very energy efficient in addition to high costs compared to biological networks, which consume less energy mainly because they perform analog calculations with low accuracy. CMOS analog circuits have been used for artificial neural networks, but most CMOS implemented synapses are too bulky given the large number of neurons and synapses.

Applicant previously disclosed an artificial (analog) neural network that utilizes one or more non-volatile memory arrays as synapses in U.S. patent application No. 15/594,439, which is incorporated herein by reference. The non-volatile memory array operates as an analog neural memory. The neural network device includes a first plurality of synapses configured to receive a first plurality of inputs and generate a first plurality of outputs therefrom, and a first plurality of neurons configured to receive the first plurality of outputs. The first plurality of synapses includes a plurality of memory cells, wherein each of the memory cells includes: a source region and a drain region formed in the semiconductor substrate in spaced apart relation, wherein the channel region extends between the source region and the drain region; a floating gate disposed over and insulated from the first portion of the channel region; and a non-floating gate disposed over and insulated from the second portion of the channel region. Each memory cell of the plurality of memory cells is configured to store a weight value corresponding to a plurality of electrons on the floating gate. The plurality of memory cells are configured to multiply the first plurality of inputs by the stored weight values to generate a first plurality of outputs.

Nonvolatile memory cell

Nonvolatile memory is well known. For example, U.S. patent 5,029,130 ("the' 130 patent"), which is incorporated herein by reference, discloses an array of split gate non-volatile memory cells, which is a type of flash memory cell. Such a memory cell 210 is shown in fig. 2. Each memory cell 210 includes a source region 14 and a drain region 16 formed in a semiconductor substrate 12 with a channel region 18 therebetween. A floating gate 20 is formed over and insulated from (and controlling the conductivity of) a first portion of the channel region 18 and over a portion of the source region 14. The word line terminal 22 (which is typically coupled to a word line) has a first portion disposed over and insulated (and controlling the conductivity of) a second portion of the channel region 18, and a second portion extending upward and over the floating gate 20. The floating gate 20 and the word line terminal 22 are insulated from the substrate 12 by a gate oxide. Bit line 24 is coupled to drain region 16.

The memory cell 210 is erased by placing a high positive voltage on the word line terminal 22, where electrons are removed from the floating gate, which causes electrons on the floating gate 20 to tunnel through the intermediate insulator from the floating gate 20 to the word line terminal 22 via fowler-nordheim (FN) tunneling.

The memory cell 210 is programmed by Source Side Injection (SSI) with hot electrons (where electrons are placed on the floating gate) by placing a positive voltage on the word line terminal 22 and a positive voltage on the source region 14. The electron flow will flow from the drain region 16 to the source region 14. When the electrons reach the gap between the word line terminal 22 and the floating gate 20, the electrons will accelerate and become heated. Some of the heated electrons will be injected onto the floating gate 20 through the gate oxide due to electrostatic attraction from the floating gate 20.

The memory cell 210 is read by placing a positive read voltage on the drain region 16 and the word line terminal 22 (which turns on the portion of the channel region 18 under the word line terminal). If the floating gate 20 is positively charged (i.e., electrons are erased), the portion of the channel region 18 under the floating gate 20 is also turned on and current will flow through the channel region 18, which is sensed as an erased or "1" state. If the floating gate 20 is negatively charged (i.e., programmed by electrons), the portion of the channel region under the floating gate 20 is mostly or completely turned off and no (or little) current will flow through the channel region 18, which is sensed as a programmed state or "0" state.

Table 1 shows typical voltage and current ranges that may be applied to terminals of memory cell 110 for performing read, erase, and program operations:

table 1: operation of flash memory cell 210 of FIG. 3

	WL	BL	SL
				Reading	2-3V	0.6-2V	0V
Erasing	About 11-13V	0V	0V
				Programming	1-2V	10.5-3μA	9-10V

Other split gate memory cell configurations are known as other types of flash memory cells. For example, fig. 3 shows a four-gate memory cell 310 that includes a source region 14, a drain region 16, a floating gate 20 over a first portion of a channel region 18, a select gate 22 (typically coupled to a word line WL) over a second portion of the channel region 18, a control gate 28 over the floating gate 20, and an erase gate 30 over the source region 14. Such an arrangement is described in U.S. Pat. No. 6,747,310, which is incorporated herein by reference for all purposes. Here, all gates are non-floating gates except for the floating gate 20, meaning that they are electrically connected or capable of being electrically connected to a voltage source. Programming is performed by heated electrons from the channel region 18 that inject themselves into the floating gate 20. The erasure is performed by electrons tunneled from the floating gate 20 to the erase gate 30.

Table 2 shows typical voltage and current ranges that may be applied to terminals of memory cell 310 for performing read, erase, and program operations:

Table 2: operation of flash memory cell 310 of FIG. 3

	WL/SG	BL	CG	EG	SL
						Reading	1.0-2V	0.6-2V	0-2.6V	0-2.6V	0V
Erasing	-0.5V/0V	0V	0V/-8V	8-12V	0V
						Programming	1V	0.1-1μA	8-11V	4.5-9V	4.5-5V

Fig. 4 shows a tri-gate memory cell 410, which is another type of flash memory cell. Memory cell 410 is identical to memory cell 310 of fig. 3, except that memory cell 410 does not have a separate control gate. The erase operation (and thus the erase by using the erase gate) and the read operation are similar to those of fig. 3 except that the control gate bias is not applied. Without the control gate bias, the programming operation is also completed and as a result, a higher voltage must be applied to the source line during the programming operation to compensate for the lack of control gate bias.

Table 3 shows typical voltage and current ranges that may be applied to the terminals of memory cell 410 for performing read, erase, and program operations:

table 3: operation of flash memory cell 410 of FIG. 4

	WL/SG	BL	EG	SL
					Reading	0.7-2.2V	0.6-2V	0-2.6V	0V
Erasing	-0.5V/0V	0V	11.5V	0V
					Programming	1V	0.2-3μA	4.5V	7-9V

Fig. 5 shows a stacked gate memory cell 510, which is another type of flash memory cell. Memory cell 510 is similar to memory cell 210 of fig. 2, except that floating gate 20 extends over the entire channel region 18, and control gate 22 (which will be coupled to a word line here) extends over floating gate 20, separated by an insulating layer (not shown). Erasing occurs by FN tunneling of electrons from the FG to the substrate, programming occurs by Channel Hot Electron (CHE) injection at the region between the channel 18 and the drain region 16, by electron flow from the source region 14 toward the drain region 16, and read operations similar to memory cell 210 with a higher control gate voltage.

Table 4 shows typical voltage ranges that may be applied to terminals of memory cell 510 and substrate 12 for performing read, erase, and program operations:

table 4: operation of flash memory cell 510 of FIG. 5

	CG	BL	SL	Substrate and method for manufacturing the same
					Reading	2-5V	0.6-2V	0V	0V
Erasing	-8 to-10V/0V	FLT	FLT	8-10V/15-20V
					Programming	8-12V	3-5V	0V	0V

The methods and apparatus described herein may be applied to other non-volatile memory technologies such as, but not limited to, FINFET split gate flash memory or stacked gate flash memory, NAND flash memory, SONOS (silicon-oxide-nitride-oxide-silicon, charge trapped in nitride), MONOS (metal-oxide-nitride-oxide-silicon, metal charge trapped in nitride), reRAM (resistive ram), PCM (phase change memory), MRAM (magnetic ram), feRAM (ferroelectric ram), CT (charge trapping) memory, CN (carbon tube) memory, OTP (dual-level or multi-level one-time programmable), and CeRAM (associated electronic ram), among others.

To utilize a memory array comprising one of the above types of non-volatile memory cells in an artificial neural network, two modifications have been made. First, the circuitry is configured such that each memory cell can be programmed, erased, and read individually without adversely affecting the memory states of other memory cells in the array, as explained further below. Second, continuous (analog) programming of the memory cells is provided.

In particular, the memory state (i.e., the charge on the floating gate) of each memory cell in the array can be continuously changed from a fully erased state to a fully programmed state independently and with minimal disturbance to other memory cells. In another embodiment, the memory state (i.e., the charge on the floating gate) of each memory cell in the array can be changed continuously from a fully programmed state to a fully erased state and vice versa, independently and with minimal disturbance to other memory cells. This means that the cell storage device is analog, or at least can store one of many discrete values (such as 16 or 64 different values), which allows very accurate and individual tuning of all cells in the memory array, and which makes the memory array ideal for storing and fine tuning the synaptic weights of the neural network.

Neural network employing non-volatile memory cell array

Figure 6 conceptually illustrates a non-limiting example of a neural network utilizing a non-volatile memory array, according to an embodiment of the present invention. This example uses a non-volatile memory array neural network for facial recognition applications, but any other suitable application may also be implemented using a non-volatile memory array-based neural network.

For this example, S0 is an input layer that is a 32×32 pixel RGB image with 5-bit precision (i.e., three 32×32 pixel arrays for each color R, G and B, respectively, each pixel being 5-bit precision). The synapse CB1 from input layer S0 to layer C1 applies a different set of weights in some cases, shared weights in other cases, and scans the input image with a 3 x 3 pixel overlap filter (kernel), shifting the filter by 1 pixel (or more than 1 pixel as indicated by the model). Specifically, values of 9 pixels in the 3 x 3 portion of the image (i.e., referred to as filters or kernels) are provided to synapse CB1, where these 9 input values are multiplied by appropriate weights, and after summing the outputs of the multiplications, a single output value is determined and provided by the first synapse of CB1 for use in generating a pixel of one of the feature maps of layer C1. The 3 x 3 filter is then shifted one pixel to the right within the input layer S0 (i.e. adding the column of three pixels to the right and freeing the column of three pixels to the left), thereby providing the 9 pixel values in the newly located filter to the synapse CB1 where they are multiplied by the same weight and the second single output value is determined by the associated synapse. This process continues until the 3 x 3 filter scans all three colors and all bits (precision values) over the entire 32 x 32 pixel image of the input layer S0. The process is then repeated using different sets of weights to generate different feature maps for layer C1 until all feature maps for layer C1 are calculated.

At layer C1, in this example, there are 16 feature maps, each feature map having 30x30 pixels. Each pixel is a new feature pixel extracted from the product of the input and kernel, so each feature map is a two-dimensional array, so in this example, layer C1 is made up of a 16-layer two-dimensional array (bearing in mind that the layers and arrays referenced herein are logical relationships, not necessarily physical relationships, i.e., the array need not be oriented to a physical two-dimensional array). Each of the 16 feature maps in layer C1 is generated from one of sixteen different sets of synaptic weights applied to the filter scan. The C1 feature map may all relate to different aspects of the same image feature, such as boundary recognition. For example, a first mapping (generated using a first set of weights, shared for all scans used to generate the first mapping) may identify rounded edges, a second mapping (generated using a second set of weights different from the first set of weights) may identify rectangular edges, or aspect ratios of certain features, and so on.

Before going from layer C1 to layer S1, an activation function P1 (pooling) is applied that pools values from consecutive non-overlapping 2X 2 regions in each feature map. The purpose of the pooling function P1 is to average the neighboring locations (or alternatively a max function may be used) for example to reduce the dependency of the edge locations and to reduce the data size before entering the next stage. At layer S1, there are 16 15×15 feature maps (i.e., sixteen different arrays of 15×15 pixels per feature map). Synapse CB2 from layer S1 to layer C2 scans the mapping in layer S1 with a 4×4 filter, where the filter is shifted by 1 pixel. At layer C2, there are 22 12 x 12 feature maps. Before going from layer C2 to layer S2, an activation function P2 (pooling) is applied that pools values from consecutive non-overlapping 2 x 2 regions in each feature map. At layer S2, there are 22 6 x 6 feature maps. An activation function (pooling) is applied to the synapse CB3 from layer S2 to layer C3, where each neuron in layer C3 is connected to each map in layer S2 via a respective synapse of CB 3. At layer C3, there are 64 neurons. The synapse CB4 from layer C3 to output layer S3 connects C3 completely to S3, i.e., each neuron in layer C3 connects to each neuron in layer S3. The output at S3 includes 10 neurons, with the highest output neuron determining the class. For example, the output may indicate an identification or classification of the content of the original image.

The synapses of each layer are implemented using an array or a portion of an array of non-volatile memory cells.

Fig. 7 is a block diagram of an array that may be used for this purpose. Vector-matrix multiplication (VMM) array 32 includes nonvolatile memory cells and serves as synapses between one layer and the next (such as CB1, CB2, CB3, and CB4 in fig. 6). Specifically, VMM array 32 includes nonvolatile memory cell array 33, erase gate and word gate decoder 34, control gate decoder 35, bit line decoder 36, and source line decoder 37, which decode the respective inputs of nonvolatile memory cell array 33. Inputs to VMM array 32 may come from erase gate and word gate decoder 34 or from control gate decoder 35. In this example, the source line decoder 37 also decodes the output of the nonvolatile memory cell array 33. Alternatively, the bit line decoder 36 may decode the output of the nonvolatile memory cell array 33.

The nonvolatile memory cell array 33 serves two purposes. First, it stores weights to be used by VMM array 32. Next, the nonvolatile memory cell array 33 effectively multiplies the input by the weight stored in the nonvolatile memory cell array 33 and each output line (source line or bit line) adds them to produce an output that will be the input of the next layer or the input of the final layer. By performing the multiply and add functions, the non-volatile memory cell array 33 eliminates the need for separate multiply and add logic circuits and is also power efficient because of its in-situ memory computation.

The outputs of the non-volatile memory cell array 33 are provided to a differential summer (such as a summing op-amp or summing current mirror) 38 that sums the outputs of the non-volatile memory cell array 33 to create a single value for the convolution. The differential summer 38 is arranged for performing a summation of positive and negative weights.

The output values of the differential summer 38 are then summed and provided to an activation function block 39 that modifies the output. The activation function block 39 may provide sigmoid, tanh, or ReLU functions. The modified output value of the activation function block 39 becomes an element of the feature map as the next layer (e.g., layer C1 in fig. 6) and is then applied to the next synapse to produce the next feature map layer or final layer. Thus, in this example, the nonvolatile memory cell array 33 constitutes a plurality of synapses (which receive their inputs from an existing neuron layer or from an input layer such as an image database), and the summing operational amplifier 38 and the activation function block 39 constitute a plurality of neurons.

The input to VMM array 32 (WLx, EGx, CGx and optionally BLx and SLx) in fig. 7 may be an analog level, a binary level, or a digital bit (in which case a DAC is provided to convert the digital bit to the appropriate input analog level), and the output may be an analog level, a binary level, or a digital bit (in which case an output ADC is provided to convert the output analog level to a digital bit).

Fig. 8 is a block diagram illustrating the use of a multi-layer VMM array 32 (here labeled VMM arrays 32a,32b,32c,32d, and 32 e). As shown in fig. 8, the input (denoted as Inputx) is converted from digital to analog by digital-to-analog converter 31 and provided to input VMM array 32a. The converted analog input may be a voltage or a current. The input D/a conversion of the first layer may be accomplished by using a function or LUT (look-up table) that maps the input Inputx to the appropriate analog level of the matrix multiplier of input VMM array 32a. Input conversion may also be accomplished by an analog-to-analog (a/a) converter to convert external analog input to mapped analog input to input VMM array 32a.

The output produced by input VMM array 32a is provided as input to the next VMM array (hidden level 1) 32b, which in turn generates output provided as input to the next VMM array (hidden level 2) 32c, and so on. Each layer of VMM array 32 serves as a distinct layer of synapses and neurons of a Convolutional Neural Network (CNN). Each VMM array 32a,32b,32c,32d, and 32e may be an independent physical non-volatile memory array, or multiple VMM arrays may utilize different portions of the same non-volatile memory array, or multiple VMM arrays may utilize overlapping portions of the same physical non-volatile memory array. The example shown in fig. 8 contains five layers (32 a,32b,32c,32d,32 e): an input layer (32 a), two hidden layers (32 b,32 c) and two fully connected layers (32 d,32 e). Those of ordinary skill in the art will appreciate that this is merely exemplary and that, instead, the system may include more than two hidden layers and more than two fully connected layers.

Vector-matrix multiplication (VMM) array

Fig. 9 illustrates a neuronal VMM array 900 that is particularly suited for use with the memory cell 310 shown in fig. 3 and that serves as a synapse and component for neurons between an input layer and the next layer. VMM array 900 includes a memory array 901 of non-volatile memory cells and a reference array 902 of non-volatile reference memory cells (at the top of the array). Alternatively, another reference array may be placed at the bottom.

In VMM array 900, control gate lines (such as control gate line 903) extend in a vertical direction (thus reference array 902 is orthogonal to control gate line 903 in the row direction) and erase gate lines (such as erase gate line 904) extend in a horizontal direction. Here, inputs to VMM array 900 are provided on control gate lines (CG 0, CG1, CG2, CG 3), and outputs of VMM array 900 appear on source lines (SL 0, SL 1). In one embodiment, only even rows are used, and in another embodiment, only odd rows are used. The currents placed on the respective source lines (SL 0, SL1, respectively) perform a summation function of all currents from the memory cells connected to that particular source line.

As described herein for neural networks, the non-volatile memory cells of VMM array 900 (i.e., memory cells 310 of VMM array 900) are preferably configured to operate in a subthreshold region.

Biasing the non-volatile reference memory cells and non-volatile memory cells described herein in weak inversion (subthreshold region):

Ids＝Io*e ^(Vg-Vth)/nVt ＝w*Io*e ^(Vg)/nVt ，

where w=e ^(-Vth)/nVt

Where Ids is the drain-to-source current; vg is the gate voltage on the memory cell; vth is the threshold voltage of the memory cell; vt is thermal voltage=k×t/q, where k is boltzmann constant, T is temperature in kelvin, andq is an electron charge; n is the slope factor=1+ (Cdep/Cox), where cdep=the capacitance of the depletion layer and Cox is the capacitance of the gate oxide layer; io is the memory cell current at the gate voltage equal to the threshold voltage, io and (Wt/L) u Cox (n-1) Vt ² In proportion, where u is carrier mobility and Wt and L are the width and length of the memory cell, respectively.

For an I-to-V logarithmic converter that converts an input current to an input voltage using memory cells (such as reference memory cells or peripheral memory cells) or transistors:

Vg＝n*Vt*log[Ids/wp*Io]

where wp is the w of the reference memory cell or the peripheral memory cell.

For a memory array used as a vector matrix multiplier VMM array with current inputs, the output current is:

Iout＝wa*Io*e ^(Vg)/nVt i.e.

Iout＝(wa/wp)*Iin＝W*Iin

W＝e ^{(Vthp-Vtha)/nVt}

Here wa = w of each memory cell in the memory array.

Vthp is the effective threshold voltage of the peripheral memory cells and Vtha is the effective threshold voltage of the main (data) memory cells. Note that the threshold voltage of the transistor is a function of the substrate body bias voltage, and the substrate body bias voltage, represented as Vsb, can be modulated to compensate for various conditions at such temperatures. The threshold voltage Vth can be expressed as:

Vth＝Vth0+γ(SQRT|Vsb-2*φF)-SQRT|2*φF|)

where Vth0 is the threshold voltage with zero substrate bias, phif is the surface potential, and γ is the bulk effect parameter.

A word line or control gate may be used as an input to a memory cell for an input voltage.

Alternatively, the flash memory cells of the VMM array described herein may be configured to operate in a linear region:

Ids＝β*(Vgs-Vth)*Vds；β＝u*Cox*Wt/L

W＝α(Vgs-Vth)

meaning that the weight W in the linear region is proportional to (Vgs-Vth)

Word lines or control gates or bit lines or source lines may be used as inputs to memory cells operating in the linear region. Bit lines or source lines may be used as the outputs of the memory cells.

For an I-V linear converter, memory cells (e.g., reference memory cells or peripheral memory cells) or transistors operating in the linear region may be used to linearly convert an input/output current to an input/output voltage.

Alternatively, the memory cells of the VMM array described herein may be configured to operate in the saturation region:

Ids＝1/2*β*(Vgs-Vth) ² ；β＝u*Cox*Wt/L

Wα(Vgs-Vth) ² meaning the weights W and (Vgs-Vth) ² Proportional to

The word line, control gate, or erase gate may be used as an input to a memory cell operating in the saturation region. The bit line or source line may be used as an output of the output neuron.

Alternatively, the memory cells of the VMM array described herein may be used for all regions of each layer or layers of the neural network, or a combination thereof (subthreshold, linear, or saturated regions).

Other embodiments of VMM array 32 of fig. 7 are described in U.S. patent No. 10,748,630, which is incorporated herein by reference. As described herein, a source line or bit line may be used as a neuron output (current summing output).

Fig. 10 shows a neuronal VMM array 1000 that is particularly suited for use with the memory cell 210 shown in fig. 2 and that serves as a synapse between an input layer and the next layer. VMM array 1000 includes a memory array 1003 of non-volatile memory cells, a reference array 1001 of first non-volatile reference memory cells, and a reference array 1002 of second non-volatile reference memory cells. The reference arrays 1001 and 1002 arranged in the column direction of the array are used to convert current inputs flowing into the terminals BLR0, BLR1, BLR2, and BLR3 into voltage inputs WL0, WL1, WL2, and WL3. In effect, the first non-volatile reference memory cell and the second non-volatile reference memory cell are diode connected by multiplexer 1014 (only partially shown) with a current input flowing into them. The reference cell is tuned (e.g., programmed) to a target reference level. The target reference level is provided by a reference microarray matrix (not shown).

The memory array 1003 serves two purposes. First, it stores the weights that VMM array 1000 will use on its corresponding memory cells. Second, the memory array 1003 effectively multiplies the inputs (i.e., the current inputs provided in terminals BLR0, BLR1, BLR2, and BLR 3), which are converted to input voltages by reference arrays 1001 and 1002 to provide to word lines WL0, WL1, WL2, and WL 3), by weights stored in the memory array 1003, and then adds all the results (memory cell currents) to produce outputs on the corresponding bit lines (BL 0-BLN) that will be inputs of the next layer or inputs of the final layer. By performing the multiply and add functions, the memory array 1003 eliminates the need for separate multiply and add logic circuits and is also power efficient. Here, voltage inputs are provided on word lines (WL 0, WL1, WL2, and WL 3), and outputs appear on the respective bit lines (BL 0-BLN) during a read (infer) operation. The current placed on each of the bit lines BL0-BLN performs a summation function of the currents from all the non-volatile memory cells connected to that particular bit line.

Table 5 shows the operating voltages and currents for VMM array 1000. Columns in the table indicate voltages placed on the word lines for selected cells, word lines for unselected cells, bit lines for selected cells, bit lines for unselected cells, source lines for selected cells, and source lines for unselected cells. The rows indicate read, erase, and program operations.

Table 5: operation of VMM array 1000 of fig. 10：

WL

WL-not selected

BL

BL-unselected

SL

SL-unselected

Reading

1-3.5V

-0.5V/0V

0.6-2V(Ineuron)

0.6V-2V/0V

0V

Erasing

About 5-13V

0V

Programming

1-2V

-0.5V/0V

0.1-3uA

Vinh about 2.5V

4-10V

0-1V/FLT

Fig. 11 illustrates a neuronal VMM array 1100 that is particularly suited for use with the memory cell 210 shown in fig. 2 and that serves as a synapse and component for neurons between an input layer and a next layer. VMM array 1100 includes a memory array 1103 of non-volatile memory cells, a reference array 1101 of first non-volatile reference memory cells, and a reference array 1102 of second non-volatile reference memory cells. The reference arrays 1101 and 1102 extend in the row direction of the VMM array 1100. The VMM array is similar to VMM 1000 except that in VMM array 1100, the word lines extend in the vertical direction. Here, the inputs are provided on the word lines (WLA 0, WLB0, WLA1, WLB2, WLA2, WLB2, WLA3, WLB 3), and the outputs appear on the source lines (SL 0, SL 1) during the read operation. The currents placed on each source line perform a summation function of all currents from the memory cells connected to that particular source line.

Table 6 shows the operating voltages and currents for VMM array 1100. Columns in the table indicate voltages placed on the word lines for selected cells, word lines for unselected cells, bit lines for selected cells, bit lines for unselected cells, source lines for selected cells, and source lines for unselected cells. The rows indicate read, erase, and program operations.

Table 6: operation of VMM array 1100 of fig. 11

WL

WL-not selected

BL

BL-unselected

SL

SL-unselected

Reading

1-3.5V

-0.5V/0V

0.6-2V

0.6V-2V/0V

About 0.3-1V (Ineuron)

0V

Erasing

About 5-13V

0V

SL-suppression (about 4-8V)

Programming

1-2V

-0.5V/0V

0.1-3uA

Vinh about 2.5V

4-10V

0-1V/FLT

Fig. 12 illustrates a neuronal VMM array 1200 that is particularly suited for use with the memory cell 310 shown in fig. 3 and that serves as a synapse and component for neurons between an input layer and a next layer. VMM array 1200 includes a memory array 1203 of non-volatile memory cells, a reference array 1201 of first non-volatile reference memory cells, and a reference array 1202 of second non-volatile reference memory cells. Reference arrays 1201 and 1202 are used to convert current inputs flowing into terminals BLR0, BLR1, BLR2, and BLR3 to voltage inputs CG0, CG1, CG2, and CG3. In effect, the first non-volatile reference memory cell and the second non-volatile reference memory cell are diode connected through multiplexer 1212 (only partially shown) with current inputs flowing therein through BLR0, BLR1, BLR2, and BLR 3. The multiplexers 1212 each include a respective multiplexer 1205 and a cascode transistor 1204 to ensure that the voltage on the bit line (such as BLR 0) of each of the first and second non-volatile reference memory cells is constant during a read operation. The reference cell is tuned to a target reference level.

The memory array 1203 serves two purposes. First, it stores weights to be used by VMM array 1200. Second, memory array 1203 effectively multiplies the inputs (current inputs provided to terminals BLR0, BLR1, BLR2, and BLR 3), reference arrays 1201 and 1202 convert these current inputs to input voltages to provide to control gates (CG 0, CG1, CG2, and CG 3)) by weights stored in the memory array, and then adds all the results (cell currents) to produce outputs that appear at BL0-BLN and will be inputs to the next or final layers. By performing the multiplication and addition functions, the memory array eliminates the need for separate multiplication and addition logic circuits and is also power efficient. Here, inputs are provided on control gate lines (CG 0, CG1, CG2 and CG 3) and outputs appear on bit lines (BL 0-BLN) during a read operation. The currents placed on each bit line perform a summation function of all currents from the memory cells connected to that particular bit line.

VMM array 1200 enables unidirectional tuning of non-volatile memory cells in memory array 1203. That is, each nonvolatile memory cell is erased and then partially programmed until the desired charge on the floating gate is reached. If too much charge is placed on the floating gate (such that the wrong value is stored in the cell), the cell is erased and the sequence of partial programming operations resumes. As shown, two rows sharing the same erase gate (such as EG0 or EG 1) are erased together (which is referred to as a page erase), and thereafter, each cell is partially programmed until the desired charge on the floating gate is reached.

Table 7 shows the operating voltages and currents for VMM array 1200. Columns in the table indicate voltages placed on word lines for selected cells, word lines for unselected cells, bit lines for selected cells, bit lines for unselected cells, control gates for selected cells, control gates for unselected cells in the same sector as the selected cells, control gates for unselected cells in a different sector than the selected cells, erase gates for unselected cells, source lines for selected cells, source lines for unselected cells. The rows indicate read, erase, and program operations.

Table 7: operation of VMM array 1200 of fig. 12

Fig. 13 illustrates a neuronal VMM array 1300 that is particularly suited for use with the memory cell 310 shown in fig. 3 and that serves as a synapse and component for neurons between an input layer and the next layer. VMM array 1300 includes a memory array 1303 of non-volatile memory cells, a reference array 1301 of first non-volatile reference memory cells, and a reference array 1302 of second non-volatile reference memory cells. EG lines EGR0, EG1, and EGR1 extend vertically, while CG lines CG0, CG1, CG2, and CG3 and SL lines WL0, WL1, WL2, and WL3 extend horizontally. VMM array 1300 is similar to VMM array 1400, except that VMM array 1300 implements bi-directional tuning, where each individual cell can be completely erased, partially programmed, and partially erased as needed to achieve the desired amount of charge on the floating gate due to the use of individual EG lines. As shown, reference arrays 1301 and 1302 convert the input currents in terminals BLR0, BLR1, BLR2, and BLR3 to control gate voltages CG0, CG1, CG2, and CG3 to be applied to the memory cells in the row direction (through the action of the diode-connected reference cells via multiplexer 1314). The current outputs (neurons) are in bit lines BL0-BLN, where each bit line sums all the current from the nonvolatile memory cell connected to that particular bit line.

Table 8 shows the operating voltages and currents for VMM array 1300. Columns in the table indicate voltages placed on word lines for selected cells, word lines for unselected cells, bit lines for selected cells, bit lines for unselected cells, control gates for selected cells, control gates for unselected cells in the same sector as the selected cells, control gates for unselected cells in a different sector than the selected cells, erase gates for unselected cells, source lines for selected cells, source lines for unselected cells. The rows indicate read, erase, and program operations.

Table 8: operation of VMM array 1300 of fig. 13

Fig. 22 shows a neuronal VMM array 2200 that is particularly suited for use with the memory cell 210 shown in fig. 2 and that serves as a synapse and component for neurons between an input layer and the next layer. In VMM array 2200, INPUT is INPUT ₀ 、…、INPUT _N Respectively bit lines BL ₀ 、…、BL _N Is received at the upper part and OUTPUTs OUTPUT ₁ 、OUTPUT ₂ 、OUTPUT ₃ And OUTPUT ₄ Respectively at source lines SL ₀ 、SL ₁ 、SL ₂ And SL (SL) ₃ And (5) generating.

Fig. 23 illustrates a neuronal VMM array 2300 that is particularly suited for use with the memory cell 210 shown in fig. 2 and that serves as a synapse and component for neurons between an input layer and a next layer. In this example, INPUT is INPUT ₀ 、INPUT ₁ 、INPUT ₂ And INPUT ₃ Respectively at source lines SL ₀ 、SL ₁ 、SL ₂ And SL (SL) ₃ Is received at the upper part and OUTPUTs OUTPUT ₀ 、…、OUTPUT _N Bit line BL ₀ 、…、BL _N And (5) generating.

Fig. 24 shows a neuronal VMM array 2400 that is particularly suited for use with the memory unit 210 shown in fig. 2 and that serves as a synapse and component for neurons between an input layer and a next layer. In this example, INPUT is INPUT ₀ 、…、INPUT _M Respectively at word line WL ₀ 、…、WL _M Is received at the upper part and OUTPUTs OUTPUT ₀ 、…、OUTPUT _N Bit line BL ₀ 、…、BL _N And (5) generating.

Fig. 25 illustrates a neuronal VMM array 2500 that is particularly suited for use with the memory cell 310 shown in fig. 3 and that serves as a synapse and component for neurons between an input layer and the next layer. In this example, INPUT is INPUT ₀ 、…、INPUT _M Respectively at word line WL ₀ 、…、WL _M Is received at the upper part and OUTPUTs OUTPUT ₀ 、…、OUTPUT _N Bit line BL ₀ 、…、BL _N And (5) generating.

Fig. 26 illustrates a neuronal VMM array 2600 that is particularly suited for use with memory cell 410 shown in fig. 4 and that serves as a synapse and component for neurons between an input layer and the next layer. In this example, INPUT is INPUT ₀ 、…、INPUT _n Respectively at vertical control grid lines CG ₀ 、…、CG _N Is received at the upper part and OUTPUTs OUTPUT ₁ And OUTPUT ₂ At the source line SL ₀ And SL (SL) ₁ And (5) generating.

Fig. 27 illustrates a neuronal VMM array 2700 that is particularly suited for use with the memory cell 410 shown in fig. 4 and that serves as a synapse and component for neurons between an input layer and the next layer. In this example, INPUT is INPUT ₀ To INPUT _N Are received on the gates of bit line control gates 2701-1, 2701-2 through 2701- (N-1) and 2701-N, respectively, which are coupled to bit line BL, respectively ₀ To BL _N . Exemplary outputOUTPUT ₁ And OUTPUT ₂ At the source line SL ₀ And SL (SL) ₁ And (5) generating.

Fig. 28 illustrates a neuronal VMM array 2800 that is particularly suited for use with memory cell 310 shown in fig. 3, memory cell 510 shown in fig. 5, and memory cell 710 shown in fig. 7, and that serves as a synapse and component for neurons between an input layer and the next layer. In this example, INPUT is INPUT ₀ 、…、INPUT _M On word line WL ₀ 、…、WL _M Is received at the upper part and OUTPUTs OUTPUT ₀ 、…、OUTPUT _N Respectively bit lines BL ₀ 、…、BL _N And (5) generating.

Fig. 29 illustrates a neuronal VMM array 2900 that is particularly suited for use with memory cell 310 shown in fig. 3, memory cell 510 shown in fig. 5, and memory cell 710 shown in fig. 7, and that serves as a synapse and component for neurons between an input layer and the next layer. In this example, INPUT is INPUT ₀ To INPUT _M At the control grid line CG ₀ To CG _M And is received. OUTPUT ₀ 、…、OUTPUT _N Respectively at vertical source line SL ₀ 、…、SL _N Upper generation in which each source line SL _i Coupled to the source lines of all memory cells in column i.

Fig. 30 shows a neuronal VMM array 3000 that is particularly suited for use with memory cell 310 shown in fig. 3, memory cell 510 shown in fig. 5, and memory cell 710 shown in fig. 7, and that serves as a synapse and component for neurons between an input layer and the next layer. In this example, INPUT is INPUT ₀ To INPUT _M At the control grid line CG ₀ To CG _M And is received. OUTPUT ₀ 、…、OUTPUT _N Respectively at vertical bit line BL ₀ 、…、BL _N Upper generation in which each bit line BL _i Bit lines coupled to all memory cells in column i.

Long and short term memory

The prior art includes a concept called Long Short Term Memory (LSTM). LSTM cells are commonly used in neural networks. LSTM allows the neural network to remember information for a predetermined arbitrary time interval and use that information in subsequent operations. Conventional LSTM cells include cells, input gates, output gates, and forgetting gates. The three gates regulate the flow of information into and out of the cell and the time interval during which information is remembered in the LSTM. The VMM is particularly useful in LSTM units.

Fig. 14 shows an exemplary LSTM 1400. LSTM 1400 in this example includes units 1401, 1402, 1403, and 1404. Unit 1401 receives an input vector x ₀ And generates an output vector h ₀ And cell state vector c ₀ . Unit 1402 receives an input vector x ₁ Output vector (hidden state) h from unit 1401 ₀ And cell state c from cell 1401 ₀ And generates an output vector h ₁ And cell state vector c ₁ . Unit 1403 receives an input vector x ₂ Output vector (hidden state) h from unit 1402 ₁ And cell state c from cell 1402 ₁ And generates an output vector h ₂ And cell state vector c ₂ . Unit 1404 receives an input vector x ₃ Output vector (hidden state) h from unit 1403 ₂ And cell state c from cell 1403 ₂ And generates an output vector h ₃ . Additional cells may be used and an LSTM with four cells is merely an example.

Fig. 15 shows an exemplary implementation of an LSTM cell 1500 that may be used for cells 1401, 1402, 1403, and 1404 in fig. 14. The LSTM unit 1500 receives the input vector x (t), the cell state vector c (t-1) from the previous cell, and the output vector h (t-1) from the previous cell, and generates the cell state vector c (t) and the output vector h (t).

LSTM unit 1500 includes sigmoid function devices 1501, 1502, and 1503, each applying a number between 0 and 1 to control how much each component in the input vector is allowed to pass through to the output vector. LSTM unit 1500 further includes tanh devices 1504 and 1505 for applying a hyperbolic tangent function to the input vectors, multiplier devices 1506, 1507, and 1508 for multiplying the two vectors together, and an adding device 1509 for adding the two vectors together. The output vector h (t) may be provided to the next LSTM cell in the system or it may be accessed for other purposes.

Fig. 16 shows an LSTM cell 1600, which is an example of a specific implementation of LSTM cell 1500. For the convenience of the reader, the same numbers are used in LSTM cell 1600 as LSTM cell 1500. The Sigmoid function devices 1501, 1502, and 1503 and the tanh device 1504 each include a plurality of VMM arrays 1601 and an activation function block 1602. Thus, it can be seen that the VMM array is particularly useful in LSTM cells used in certain neural network systems. Multiplier devices 1506, 1507 and 1508 and summing device 1509 are implemented in digital or analog form. The activation function block 1602 may be implemented digitally or analog.

An alternative form of LSTM cell 1600 (and another example of a specific implementation of LSTM cell 1500) is shown in fig. 17. In fig. 17, sigmoid function devices 1501, 1502, and 1503 and tanh device 1504 share the same physical hardware (VMM array 1701 and activation function block 1702) in a time-multiplexed manner. LSTM unit 1700 further includes multiplier device 1703 that multiplies two vectors together, adder device 1708 that adds two vectors together, tanh device 1505 (which includes activation function block 1702), register 1707 that stores value i (t) when output from sigmoid function block 1702, register 1704 that stores value f (t) x c (t-1) when output from multiplier device 1703 through multiplexer 1710, register 1705 that stores value i (t) x u (t) when output from multiplier device 1703 through multiplexer 1710, and register 1706 that stores values o (t) x c to (t) when output from multiplier device 1703 through multiplexer 1710, and multiplexer 1709.

LSTM unit 1600 contains multiple sets of VMM arrays 1601 and corresponding activation function blocks 1602, and LSTM unit 1700 contains only one set of VMM arrays 1701 and activation function blocks 1702, which are used to represent multiple layers in an embodiment of LSTM unit 1700. LSTM unit 1700 will require less space than LSTM unit 1600 because, compared to LSTM unit 1600, LSTM unit 1700 requires only 1/4 of its space for the VMM and the active function blocks.

It will also be appreciated that the LSTM unit will typically include multiple VMM arrays, each requiring functionality provided by some circuit blocks outside the VMM array, such as a summer and an activate function block and a high voltage generation block. Providing separate circuit blocks for each VMM array would require a lot of space within the semiconductor device and would be somewhat inefficient. Thus, the embodiments described below reduce the circuitry required outside of the VMM array itself.

Gate-controlled recursion unit

Emulated VMM implementations may be used in Gated Recursive Unit (GRU) systems. The GRU is a gating mechanism in a recurrent neural network. The GRU is similar to the LSTM except that the GRU cells generally contain fewer components than the LSTM cells.

Fig. 18 illustrates an exemplary GRU 1800. The GRU 1800 in this example includes units 1801, 1802, 1803, and 1804. Unit 1801 receives an input vector x ₀ And generates an output vector h ₀ . Unit 1802 receives an input vector x ₁ Output vector h from unit 1801 ₀ And generates an output vector h ₁ . Unit 1803 receives an input vector x ₂ And an output vector (hidden state) h from unit 1802 ₁ And generates an output vector h ₂ . Unit 1804 receives an input vector x ₃ And an output vector (hidden state) h from unit 1803 ₂ And generates an output vector h ₃ . Additional units may be used and a GRU having four units is merely an example.

Fig. 19 illustrates an exemplary implementation of a GRU unit 1900 that may be used with units 1801, 1802, 1803, and 1804 of fig. 18. The GRU unit 1900 receives the input vector x (t) and the output vector h (t-1) from the previous GRU unit, and generates the output vector h (t). The GRU unit 1900 includes sigmoid function devices 1901 and 1902, each of which applies a number between 0 and 1 to components from an output vector h (t-1) and an input vector x (t). The GRU unit 1900 further comprises a tanh device 1903 for applying a hyperbolic tangent function to the input vectors, a plurality of multiplier devices 1904, 1905 and 1906 for multiplying the two vectors together, an adding device 1907 for adding the two vectors together, and a complementary device 1908 for subtracting the input from 1 to generate an output.

Fig. 20 shows a GRU unit 2000, which is an example of a specific implementation of a GRU unit 1900. For the convenience of the reader, the same numbers as for the GRU unit 1900 are used in the GRU unit 2000. As shown in fig. 20, sigmoid function devices 1901 and 1902 and tanh device 1903 each include a plurality of VMM arrays 2001 and an activation function block 2002. Thus, it can be seen that the VMM array is particularly useful in GRU cells used in certain neural network systems. Multiplier devices 1904, 1905 and 1906, summing device 1907 and complementary device 1908 are implemented digitally or analog. The activation function block 2002 may be implemented in a digital manner or in an analog manner.

An alternative form of the GRU unit 2000 (and another example of a specific implementation of the GRU unit 1900) is shown in fig. 21. In fig. 21, the GRU unit 2100 utilizes a VMM array 2101 and an activation function block 2102 which, when configured as a sigmoid function, applies a number between 0 and 1 to control how much each component in the input vector is allowed to pass through to the output vector. In fig. 21, sigmoid function devices 1901 and 1902 and tanh device 1903 share the same physical hardware (VMM array 2101 and activation function block 2102) in a time-division multiplexed manner. The GRU unit 2100 further includes a multiplier device 2103 that multiplies two vectors together, an adder device 2105 that adds two vectors together, a complementary device 2109 that subtracts an input from 1 to generate an output, a multiplexer 2104, a register 2106 that holds a value h (t-1) r (t) when it is output from the multiplier device 2103 through the multiplexer 2104, a register 2107 that holds a value h (t-1) z (t) when it is output from the multiplier device 2103 through the multiplexer 2104, and a register 2108 that holds a value h (t) x (1-z (t)) when it is output from the multiplier device 2103 through the multiplexer 2104.

The GRU unit 2000 contains multiple sets of VMM arrays 2001 and activation function blocks 2002, and the GRU unit 2100 contains only one set of VMM arrays 2101 and activation function blocks 2102, which are used to represent multiple layers in an embodiment of the GRU unit 2100. The GRU unit 2100 would require less space than the GRU unit 2000 because, compared to the GRU unit 2000, the GRU unit 2100 would only require 1/3 of its space for the VMM and the activate function blocks.

It will also be appreciated that a GRU system will typically include multiple VMM arrays, each requiring functionality provided by some circuit block outside the VMM array (such as a summer and an activate function block and a high voltage generation block). Providing separate circuit blocks for each VMM array would require a lot of space within the semiconductor device and would be somewhat inefficient. Thus, the embodiments described below reduce the circuitry required outside of the VMM array itself.

The input to the VMM array may be an analog level, a binary level, a pulse, a time modulated pulse, or a digital bit (in which case a DAC is required to convert the digital bit to an appropriate input analog level), and the output may be an analog level, a binary level, a timing pulse, a pulse, or a digital bit (in which case an output ADC is required to convert the output analog level to a digital bit).

Generally, for each memory cell in the VMM array, each weight W may be implemented by a single memory cell or by a differential cell or by two hybrid memory cells (average of 2 cells). In the case of a differential cell, two memory cells are required to implement the weight W as a differential weight (w=w+ -W-). In two hybrid memory cells, two memory cells are needed to implement the weight W as an average of the two cells.

Fig. 31 shows a VMM system 3100. In some embodiments, weights W stored in the VMM array are stored as differential pairs w+ (positive weights) and W- (negative weights), where w= (w+) - (W-). In VMM system 3100, half of the bit lines are designated as w+ lines, i.e., bit lines connected to memory cells that will store a positive weight w+, and the other half of the bit lines are designated as W-lines, i.e., bit lines connected to memory cells that implement a negative weight W-. The W-lines are interspersed between the W + lines in an alternating fashion. The subtraction operation is performed by summing circuits (such as summing circuits 3101 and 3102) that receive current from the w+ line and the W-line. The output of the W + line and the output of W-are combined together to give effectively W = W + -W-for each (W +, W-) cell pair of all (W +, W-) line pairs. Although described above with respect to W-lines interspersed between w+ lines in an alternating fashion, in other implementations, w+ lines and W-lines may be located anywhere in the array.

Fig. 32 shows another embodiment. In the VMM system 3210, a positive weight w+ is implemented in the first array 3211 and a negative weight W-is implemented in the second array 3212, the second array 3212 being separate from the first array, and the resulting weights being appropriately combined together by the summing circuit 3213.

Fig. 33 shows VMM system 3300, where weights W stored in the VMM array are stored as differential pairs w+ (positive weights) and W- (negative weights), where w= (w+) - (W-). VMM system 3300 includes an array 3301 and an array 3302. Half of the bit lines in each of the arrays 3301 and 3302 are designated as w+ lines, i.e., bit lines connected to memory cells that will store a positive weight w+, and the other half of the bit lines in each of the arrays 3301 and 3302 are designated as W-lines, i.e., bit lines connected to memory cells that implement a negative weight W-. The W-lines are interspersed between the W + lines in an alternating fashion. The subtraction operation is performed by summing circuits (such as summing circuits 3303, 3304, 3305, and 3306) that receive current from the w+ line and the W-line. The outputs from the w+ lines and the W-outputs of each array 3301, 3302 are combined together, respectively, to give effectively w=w+ -W-for each (w+, W-) cell pair of all (w+, W-) line pairs. In addition, the W values from each array 3301 and 3302 may be further combined by summing circuits 3307 and 3308 such that each W value is the result of the W value from array 3301 minus the W value from array 3302, meaning that the final result from summing circuits 3307 and 3308 is one of the two differences.

Each nonvolatile memory cell used in an analog neural memory system is to be erased and programmed to maintain a very specific and accurate amount of charge (i.e., number of electrons) in the floating gate. For example, each floating gate must hold one of N different values, where N is the number of different weights that can be indicated by each cell. Examples of N include 16, 32, 64, 128, and 256.

Similarly, a read operation must be able to accurately discern N different levels.

There is a need for improved output blocks in VMM systems that are able to quickly and accurately receive outputs from an array and discern the values represented by those outputs.

Disclosure of Invention

Drawings

Fig. 1 is a schematic diagram showing an artificial neural network.

Fig. 2 illustrates a split gate flash memory cell of the prior art.

Fig. 3 shows another prior art split gate flash memory cell.

Fig. 4 shows another prior art split gate flash memory cell.

Fig. 5 shows another prior art split gate flash memory cell.

FIG. 6 is a schematic diagram illustrating different levels of an exemplary artificial neural network utilizing one or more non-volatile memory arrays.

Fig. 7 is a block diagram illustrating a vector-matrix multiplication system.

FIG. 8 is a block diagram illustrating an exemplary artificial neural network utilizing one or more vector-matrix multiplication systems.

Fig. 9 shows another embodiment of a vector-matrix multiplication system.

Fig. 10 shows another embodiment of a vector-matrix multiplication system.

Fig. 11 shows another embodiment of a vector-matrix multiplication system.

Fig. 12 shows another embodiment of a vector-matrix multiplication system.

Fig. 13 shows another embodiment of a vector-matrix multiplication system.

Fig. 14 illustrates a prior art long and short term memory system.

FIG. 15 illustrates exemplary cells for use in a long and short term memory system.

Fig. 16 illustrates one embodiment of the exemplary unit of fig. 15.

Fig. 17 shows another embodiment of the exemplary unit of fig. 15.

Fig. 18 shows a prior art gating recursive unit system.

Fig. 19 shows an exemplary cell for use in a gated recursive cell system.

Fig. 20 shows one embodiment of the exemplary unit of fig. 19.

Fig. 21 shows another embodiment of the exemplary unit of fig. 19.

Fig. 22 shows another embodiment of a vector-matrix multiplication system.

Fig. 23 shows another embodiment of a vector-matrix multiplication system.

Fig. 24 shows another embodiment of a vector-matrix multiplication system.

Fig. 25 shows another embodiment of a vector-matrix multiplication system.

Fig. 26 shows another embodiment of a vector-matrix multiplication system.

Fig. 27 shows another embodiment of a vector-matrix multiplication system.

Fig. 28 shows another embodiment of a vector-matrix multiplication system.

Fig. 29 shows another embodiment of a vector-matrix multiplication system.

Fig. 30 shows another embodiment of a vector-matrix multiplication system.

Fig. 31 shows another embodiment of a vector-matrix multiplication system.

Fig. 32 shows another embodiment of a vector-matrix multiplication system.

Fig. 33 shows another embodiment of a vector-matrix multiplication system.

Fig. 34 shows another embodiment of a vector-matrix multiplication system.

Fig. 35A, 35B, 35C, 35D, 35E, and 35F show embodiments of output blocks.

Fig. 36 shows another embodiment of an output block.

Fig. 37A and 37B illustrate another embodiment of an output block.

Fig. 38A and 38B illustrate another embodiment of an output block.

Fig. 39 shows a variable resistor replica.

Fig. 40 shows one embodiment of a current-to-voltage converter.

Fig. 41 shows a differential output amplifier.

Fig. 42 shows an offset calibration method.

Fig. 43 shows another offset calibration method.

Detailed Description

The artificial neural network of the present invention utilizes a combination of CMOS technology and a non-volatile memory array.

VMM system overview

Fig. 34 shows a block diagram of a VMM system 3400. The VMM system 3400 includes a VMM array 3401, a row decoder 3402, a high voltage decoder 3403, a column decoder 3404, a bit line driver 3405, an input circuit 3406, an output circuit 3407, a control logic part 3408, and a bias generator 3409. The VMM system 3400 further includes a high voltage generation block 3410 that includes a charge pump 3411, a charge pump regulator 3412, and a high voltage analog precision level generator 3413. The VMM system 3400 further includes (program/erase, or weight tune) an algorithm controller 3414, analog circuitry 3415, a control engine 3416 (which may include special functions such as arithmetic functions, activation functions, embedded microcontroller logic, etc., without limitation), and test control logic components 3417. The systems and methods described below can be implemented in VMM system 3400.

The input circuit 3406 may include circuitry such as a DAC (digital-to-analog converter), DPC (digital-to-pulse converter, digital-to-time modulated pulse converter), AAC (analog-to-analog converter, such as a current-to-voltage converter, logarithmic converter), PAC (pulse-to-analog level converter), or any other type of converter. The input circuit 3406 may be capable of implementing a normalized, linear or nonlinear up/down scaling function or an arithmetic function. The input circuit 3406 is capable of implementing a temperature compensation function for the input level. The input circuit 3406 may be capable of implementing an activation function, such as ReLU or sigmoid. The output circuit 3407 can include circuitry such as an ADC (analog-to-digital converter for converting the neuron analog output to digital bits), AAC (analog-to-analog converter such as a current-to-voltage converter, a logarithmic converter), APC (analog-to-pulse converter, analog-to-time modulated pulse converter), or any other type of converter.

The output circuit 3407 may be capable of implementing an activation function, such as a rectified linear activation function (ReLU) or sigmoid. The output circuit 3407 can implement statistical normalization, regularization, up/down scaling/gain functions, statistical rounding, or arithmetic functions (e.g., addition, subtraction, division, multiplication, shifting, logarithm) on the neuron outputs. The output circuit 3407 is capable of implementing a temperature compensation function on the neuron output or the array output (such as the bit line output) in order to keep the power consumption of the array approximately constant or to improve the accuracy of the array (neuron) output, such as by keeping the IV slope approximately the same.

Fig. 35A shows an output block 3500. The output block 3500 includes: current-to-voltage converters (ITV, with differential inputs and differential outputs) 3501-1 to 3501-i, where i is the number of bit line w+ and W-pairs received by output block 3500; a multiplexer 3502; sample and hold circuits 3503-1 to 3503-i; a channel multiplexer 3504; and an analog-to-digital converter (ADC) 3505. The output block 3500 receives differential weight outputs w+ and W-from the bit line pairs in the array and ultimately generates a digital output DOUTx representing the output of one of the bit line pairs (e.g., w+ and W-lines) from ADC 3505 (an ADC with differential inputs).

Current-to-voltage (ITV) converters 3501-1 through 3501-i each receive analog bit line current signals blw+ and BLw, which are bit line outputs generated in response to inputs and stored w+ and W-weights, respectively, and convert them to respective differential voltages itvo+ and ITVO-.

The differential voltages itvo+ and ITVO-are then received by a multiplexer 3502, which time multiplexes the outputs from the current-to-voltage converters 3501-1 through 3501-i to sample and hold (S/H) circuits 3503-1 through 3503k, where k may be the same or different from i.

The S/H circuits 3503-1 through 3503-k each sample the differential voltage they receive and hold it as a differential output.

The channel multiplexer 3504 receives the control signal to select one of the bit lines w+ and W-channels (i.e., one of the bit line pairs) and outputs the differential voltage held by the corresponding sample and hold circuit 3503 to the ADC 3505, which converts the analog differential voltage output by the corresponding sample and hold circuit 3503 to a set of digital bits DOUTx. A single S/H3503 may be shared across multiple ITV converters 3501. The ADC 3505 can operate multiple ITV converters in a time division multiplexed manner. Each S/H3503 may be a capacitor alone or a capacitor followed by a buffer (e.g., an operational amplifier).

The ITV converter 3501 can include output current neuron circuits 3700, 3750, 3800, or 3820 from fig. 37A, 37B, 38A, and 38B, respectively, in combination with the current-to-voltage converter 4000 in fig. 40. In this case, the inputs to ITV converter 3501 would be two current inputs (such as blw+ and blw+ in fig. 35A-35E, 37A, 37B, 38A, or 38B), and the outputs of the ITV converter would be differential outputs (such as VOP and VON in fig. 40 or itvo+ and itvo+ in fig. 35A-35D).

The ADC 3505 may have a hybrid ADC architecture, meaning that it has more than one ADC architecture to perform the conversion. For example, if DOUTx is an 8-bit output, then ADC 3505 may include an ADC sub-architecture for generating bits B7-B4 and another ADC sub-architecture for generating bits B3-B0 from differential inputs itvsh+ and ITVSH-. That is, the ADC circuit 3505 may include a plurality of ADC sub-architectures.

Alternatively, an ADC sub-architecture may be shared between all channels, while another ADC sub-architecture is not shared between all channels.

In another embodiment, channel multiplexer 3504 and ADC 3505 may be eliminated, and the output may alternatively be an analog differential voltage from S/H3503, which may be buffered by an op-amp. For example, the use of analog voltages may be implemented in fully analog neural networks (i.e., neural networks that do not require digital outputs or digital inputs for a neural memory array).

Fig. 35B shows an output block 3550. The output block includes current-to-voltage converters (ITV) 3551-1 through 3551-i, where i is the number of bit lines w+ and W-pairs received by the output block 3550; multiplexer 3552; differential to single ended converters (Diff to S converters) 3553-1 to 3553-k; sample and hold circuits 3554-1 through 3554-k (wherein k is the same or different from i); a channel multiplexer 3555; and an analog-to-digital converter (ADC) 3556. The Diff-to-S converter 3553 is used to convert the differential outputs (i.e., itvomx+ and ITVOMX-) of the ITV 3551 signal provided from multiplexer 3552 to a single-ended output itvsomx+. Then, the single ended output itvsomx+ is input to S/H3554, multiplexer 3555, and ADC 3556.

Fig. 35C shows an output block 3560. The output block 3560 includes: current-to-voltage converters (ITV) 3561-1 through 3561-i, where i is the number of bit lines w+ and W-pairs received by output block 3560; and differential input analog-to-digital converters (ADCs) 3566-1 through 3566-i.

Fig. 35D shows an output block 3570. The output block 3570 includes: current-to-voltage converters (ITV) 3571-1 through 3571-i, where i is the number of bit lines w+ and W-pairs received by output block 3570; and single input analog to digital converters (ADCs) 3576-1 through 3576-i. In this case, only one output of the differential output ITV is used, the ITV being used with a differential input and a single output.

Fig. 35E shows an output block 3580. The output block 3580 includes: current-to-voltage converters (ITV) 3581-1 through 3581-i, where i is the number of bit lines w+ and W-pairs received by output block 3580; and differential input analog-to-digital converters (ADCs) 3586-1 through 3586-i. The ITV blocks 3581-1 through 3581-I include common mode input circuits 3582-1 through 3582-I and differential operational amplifiers 3583-1 through 3583-I, respectively, wherein feedback is provided by variable resistors 3584-1 through 3584-I and 3585-1 through 3585-I, respectively.

Fig. 35F shows an output block 3590 that can be used for the common mode input circuits 3582-1 through 3582-i in fig. 35E. The output block 3591 includes two equal variable current sources Ibias+ and Ibias-connected to two current inputs BLw+ and BLw-.

Fig. 36 shows an output block 3600. The output block includes: summing circuits 3601-1 through 3601-i (e.g., current mirror circuits), where i is the number of bit line BLw+ and BLw-pairs received by output block 3600; current-to-voltage converter circuits (ITV) 3602-1 to 3602-i; a multiplexer 3603; sample and hold circuits 3604-1 through 3604-k (where k is the same or different from i); a channel multiplexer 3605 and an ADC 3606. The output block 3600 receives the differential weight outputs BLw+ and BLw from the bit line pairs in the array and ultimately generates a digital output DOUTx from the ADC 3606, representing the output of one of the bit line pairs at a time.

The current summing circuits 3601-1 through 3601-i each receive current from a pair of bit lines, subtract the BLw-value from the BLw+ value, and output the result as a summed current IWO.

The current-to-voltage converters 3602-1 through 3602-i receive the output sum current IWO and convert the respective sum currents into differential voltages itvo+ and ITVO-, which are then received by the multiplexer 3603 and selectively provided to the sample-and-hold circuits 3604-1 through 3604-k. The differential voltage will be digitized (converted to digital output bits) by the differential input ADC (block 3606), which has various advantages such as input noise reduction (e.g., from clock feedthrough) and more accurate comparison operations (as in SAR ADCs).

Each sample and hold circuit 3604 receives differential voltages itvomx+ and ITVOMX-, samples the received differential voltages, and holds the differential voltages as differential voltage outputs osh+ and PSH-.

The channel multiplexer 3605 receives control signals to select one of the bit line pairs (i.e., channels blw+ and BLw-) and outputs the voltages held by the respective sample and hold circuits 3604 to a differential input ADC 3606 that converts the voltages to a set of digital bits as DOUTx.

Fig. 37A shows an output current neuron circuit 3700, which may optionally be included in the output block 3500 of fig. 35 or the output block 3600 of fig. 36.

The output current neuron circuit 3700 includes a first variable current source 3701, a second variable current source 3702, and a bias circuit 3703. Bias circuit 3703 generates control voltage Vbias based on a comparison of blw+ to VREF or BLW-to VREF. The first variable current source 3701 generates an output current ibias+ that is varied by the control voltage Vbias (i.e., the amount of output current ibias+ is responsive to the value of Vbias) and is coupled to the first bit line blw+. The second variable current source 3702 generates an output current Ibias-, which is varied by Vbias (i.e., the amount of output current Ibias-is responsive to the value of Vbias), and is coupled to the second bit line BLW-. BLW+ is selected by a column decoder (not shown) and receives a first current from the cell storing the W+ value during a read operation, and BLW-is selected by the column decoder and receives a second current from the cell storing the W-value during a read operation. The w+ value and the associated W-value represent the weight value W. The outputs Ibias+ and Ibias-of the current sources 3701 and 3702 are the same at any given time.

VREF is applied as the input common mode voltage to generate a Vbias voltage, controlling variable current sources 3701 and 3702 to apply the common mode voltage on BLW+ and BLW-, where the input common mode voltage acts as a reference read voltage on the bit line during a read operation. The outputs of the output current neuron circuit 3700 are iout+ and Iout-, which form a differential signal. Iout+ is the output current from the bit line BLW+ after Vbias has been applied to generate Ibias+ and Iout-is the output current from the bit line BLW-after Vbias has been applied to generate Ibias-, where Iout+=Ibias+—IBLW+ and Iout- =Ibias- -IBLW-.

Fig. 37B shows an output current neuron circuit 3750, which shows an embodiment of variable current sources 3701 and 3702 using PMOS transistors 3711 and 3712.

Fig. 38A shows an output current neuron circuit 3800, which may optionally be included in the output block 3500 of fig. 35, the output block 3550 of fig. 35B, or the output block 3600 of fig. 36.

The output current neuron circuit 3800 includes: a first variable resistor 3801 (first device) including a first terminal and a second terminal, the second terminal coupled to a bit line blw+ selected during a read operation; a second variable resistor 3802 (second device) including a third terminal and a fourth terminal, the fourth terminal coupled to a bit line BLW-, selected during a read operation, wherein blw+ is connected to a cell in the memory array storing a w+ value, and BLW-is connected to a cell in the memory array storing an associated W-value; a variable current source 3803; and a bias circuit operational amplifier 3804 that generates a bias voltage Vbias whose value represents the difference between blw+ (or alternatively, BLW-) and VREF. The first terminal of the first variable resistor 3801 and the third terminal of the second variable resistor 3802 are coupled to a variable current source 3803.

VREF is used to generate a Vbias voltage that is applied to variable current source 3803 to apply an input common mode voltage to bit lines BLW+ and BLW-, where the input common mode voltage acts as a read reference voltage on the bit lines during a read operation. The outputs of the output current neuron circuit 3800 are iout+ (first output current) from the first variable resistor 3801 and Iout- (second output current) from the second variable resistor 3802, which form differential current signals. Iout+ is the output current from bit line BLW+ after Vbias has been applied to generate Ibias, and Iout-is the output current from bit line BLW-after Vbias has been applied to generate Ibias, following Iout+=Ibias-IBLW+ and Iout- =Ibias-IBLW-.

Fig. 38B illustrates an output current neuron circuit 3820, which may optionally be included in the output block 3500 of fig. 35, the output block 3550 of fig. 35B, or the output block 3600 of fig. 36. The circuit is similar to that in fig. 38A except that the output of the operational amplifier 3804 is directly driven into both terminals of the two variable resistors 3801 and 3802.

Fig. 39 shows a variable resistor replica 3900 that may optionally be used in place of the variable resistor 3801 and/or variable resistor 3802 in fig. 38. The variable resistor replica 3900 includes an NMOS transistor 3901. One terminal of the NMOS transistor 3901 is coupled to the bias circuit 3804. The other terminal of NMOS transistor 3901 is coupled to BLW+ or BLW-. The gate of the NMOS transistor 3901 is coupled to a comparator 3902 that generates a control signal VGC that adjusts the resistance provided by the NMOS transistor 3901. Thus, the resistance of NMOS 3901 is=vref/IBIAS. By changing VREF or IBIAS, the equivalent resistance of NMOS 3901 can be changed.

Fig. 40 shows a current-voltage converter 4000 that can be used for the current-voltage converter 3501 in fig. 35A, the current-voltage converter 3511 in fig. 35B, or the current-voltage converter 3602 in fig. 36.

The current-voltage converter 4000 includes a differential amplifier 4001 configured as shown; variable integrating resistors 4002 and 4003; controlled switches 4004, 4005, 4006 and 4007; and variable sample and hold capacitors 4008 and 4009.

Current-to-voltage converter 4000 receives differential currents iout+ and IOUT-, and outputs voltages VOP and VON. Output voltage vop=iout+r and output voltage von=iout-R, where resistors 4002 and 4003 each have a value equal to R. Scaling of the output neurons is provided by the change in the values of resistors 4002 and 4003. For example, the resistors 4002 and 4004 may each be provided by the resistor replica circuit 3900. The capacitors 4008 and 4009 function as hold S/H capacitors to hold the output voltage once the resistors 4002 and 4003 and the input current are cut off. A control circuit (not shown) controls the opening and closing of switches 4004, 4005, 4006 and 4007 to provide an integration time.

In another mode of operation, variable capacitors 4008 and 4009 are used to integrate differential output currents iout+ and IOUT-. In this case, the resistors 4002 and 4003 are disabled (not used). The output voltage VOP is thus proportional to Iout +. Times. Time/C, and the output voltage VON is thus proportional to Iout-. Times. Time/C. The value Time is controlled by the pulse width T of pulse 4010. The C value is provided by capacitors 4008 and 4009. Scaling of the output neuron value is then provided by a change in pulse width T or a change in capacitance value of capacitors 4008 and 4009 in this example.

Differential currents iout+ and IOUT-originate from first bit line current blw+ and second bit line current BLW-. Iout+ and IOUT-have complementary values (one positive and the other negative with the same magnitude). The value IOUT = ((current of BLW) - (current of BLW))/2, and IOUT = ((current of BLW) - (current of BLW))/2). For example, if the current of blw+ is 1 μa and the current of BLW-is 31 μa, iout+ = (31 μa-1 μa)/2=15 μa and Iout- = -15 μa.

Fig. 41 shows a differential amplifier 4100, which may optionally be included in the output block 3500 of fig. 35A, the output block 3550 of fig. 35B, or the output block 3600 of fig. 36. The differential output amplifier 4100 includes PMOS transistors 4101, 4102, 4103, 4104, 4105, 4106, 4107, and 4108 and NMOS transistors 4109, 4110, 4111, 4112, and 4113, configured as shown. Differential output amplifier 4100 receives inputs VINP and VINN and generates outputs VOUTP and VOUTN. VPBIAS is applied to gates of PMOS transistors 4102, 4104, 4106, and 4108, and VNBIAS is applied to gates of NMOS transistors 4111 and 4113. If VINP > VINN, VOUTP will be high and VOUTN will be low. If VINP < VINN, VOUTP will be low and VOUTN will be high. The common mode feedback circuit for outputting the common mode is not shown.

Fig. 42 illustrates an offset calibration method 4200 for an output block (e.g., output block 3500, 3550, 3560, 3570, 3580, 3590, or 3600 described above). The method may be performed within a sub-circuit block of the output block, such as by an ITV block or by an ADC block.

First, a nominal bias is applied to the input node. The nominal bias may be a midpoint offset trim setting such as a 0 value or an average (such as an average of the target input ranges for the inputs of blw+ and BLw ") (step 4202).

Second, the increased offset trim setting is applied to one of the subcircuit blocks of the output block (such as the ITV or ADC). (step 4203).

Third, a new trimmed output value of the entire output block is measured and compared to the expected output value to see if the value is within the target value of the nominal output value (step 4204). If true, the method proceeds to step 4207. If not true, steps 4203 and 4204 are repeated, each time increasing the offset trim settings applied to the sub-circuit blocks until the new trimmed output value of the entire output block is within the expected output value, at which point the method proceeds to step 4207.

After a certain number of attempts (set by threshold T), if the new trimmed output value of the entire output block is not within the target of the expected output value, the offset trim setting is returned to the nominal offset trim setting, and then the offset trim setting is reduced from the nominal setting (step 4205).

A new trimmed output value of the entire output block is measured and compared to an expected output value of the entire output block to see if the value is within a target value of the expected output value (step 4206). If true, the method proceeds to step 4207. If not true, steps 4205 and 4206 are repeated, each time the offset trim setting applied to the input node is reduced until the new trimmed output value is within the target value of the expected output value, at which point the method proceeds to step 4207.

At step 4207, the trimmed value that brings the output value within the target value of the expected output value is stored as the stored trimmed value. The stored trim value is the trim value that will produce the smallest offset of the output block.

At step 4208, the stored trimmed values are optionally added as offsets to the sub-circuit blocks of the output block during each operation.

Accordingly, the offset calibration method 4200 performs a trimming operation on the entire output block by trimming the sub-circuit blocks of the output block.

Fig. 43 illustrates an offset calibration method 4300 for an output block (e.g., output block 3500, 3550, 3560, 3570, 3580, 3590, or 3600 described above). The method may be performed within a sub-circuit block, such as an ITV block, or by an ADC block.

First, a reference bias is applied to input nodes (e.g., inputs for BLw+ and BLw-) of a sub-circuit block of an output block (step 4301).

Next, the output value of the output block is measured and compared with the target offset value (step 4302).

If the measured output value > the target offset value, then the next offset trim value in the sequence of offset trim values is applied (step 4303) and step 4302 is repeated. Offset trimming is applied to one of the subcircuit blocks of the output block (such as an ITV or ADC).

Steps 4303 and 4302 are repeated until the measured output value < = target offset value, at which point the offset trim value is stored (step 4304). The offset trim value is the trim value that will produce an acceptable level of offset.

Optionally, the stored offset trim value is applied as a bias to the sub-circuit blocks of the output block during each operation (step 4305).

In alternative embodiments, the resistances of the variable resistors in fig. 35E or 40B are not equal. In this case, the output voltage or current from the ITV is proportional to the resistance value. For example, in FIG. 35E, resistor 3585-1 can be very large, then most of the current from the two bit lines (IBLw+ -IBLw-) will flow through resistor 3584-1. In another example of FIG. 35E, where resistor 3585-1 is turned off, then all current from both bit lines (IBLw+ -IBLw-) will flow through resistor 3584-1.

It should be noted that as used herein, the terms "above …" and "on …" are inclusive of "directly on …" (no intervening material, element or space disposed therebetween) and "indirectly on …" (intervening material, element or space disposed therebetween). Similarly, the term "adjacent" includes "directly adjacent" (no intermediate material, element, or space disposed therebetween) and "indirectly adjacent" (no intermediate material, element, or space disposed therebetween), "mounted to" includes "directly mounted to" (no intermediate material, element, or space disposed therebetween) and "indirectly mounted to" (intermediate material, element, or space disposed therebetween), and "electrically coupled to" includes "directly electrically coupled to" (no intermediate material or element electrically connecting elements together) and "indirectly electrically coupled to" (intermediate material or element electrically connecting elements together). For example, forming an element "over a substrate" can include forming the element directly on the substrate with no intervening material/element therebetween, and forming the element indirectly on the substrate with one or more intervening materials/elements therebetween.

Claims

1. An output current neuron circuit, the output current neuron circuit comprising:

A first bit line coupled to a w+ cell in the memory array and drawing a first current during a read operation;

a second bit line coupled to a W-cell in the memory array and drawing a second current during a read operation, wherein a difference between a value stored in the w+ cell and a value stored in the W-cell is a weight value W;

a bias circuit that generates a common mode bias voltage;

a first variable current source that applies a common mode bias current to the first bit line in response to the common mode bias voltage to generate a first output; and

a second variable current source that applies the common mode bias current to the second bit line in response to the common mode bias voltage to generate a second output;

wherein the first output is equal to the common mode bias current minus the first current and the second output is equal to the common mode bias current minus the second current.

2. The output current neuron circuit according to claim 1, wherein the first variable current source comprises a first PMOS transistor.

3. The output current neuron circuit according to claim 2, wherein the second variable current source comprises a second PMOS transistor.

4. An output current neuron circuit, the output current neuron circuit comprising:

a current source;

a bias circuit for applying a control voltage to the current source;

a first variable resistor comprising a first end and a second end, the first end coupled to the current source;

a second variable resistor comprising a third terminal and a fourth terminal, the third terminal coupled to the current source, the current source providing a bias current to the first and second variable resistors to generate a common mode voltage;

a first bit line coupled to the w+ cell during a read operation; and

a second bit line coupled to a W-cell during the read operation, wherein a difference between a value stored in the w+ cell and a value stored in the W-cell is a weight value W;

a first output coupled to the second end of the first variable resistor and the first bit line to provide a first output current; and

a second output coupled to the fourth terminal of the second variable resistor and the second bit line to provide a second output current, the first output and the second output forming a common mode differential current signal.

5. The circuit of claim 4, wherein the first variable resistor comprises an NMOS transistor, wherein a voltage applied to a gate of the NMOS transistor determines a resistance of the NMOS transistor.

6. The circuit of claim 5, wherein the second variable resistor comprises an NMOS transistor, wherein a voltage applied to a gate of the NMOS transistor determines a resistance of the NMOS transistor.

7. An output current neuron circuit, the output current neuron circuit comprising:

a first output node for receiving a first current from a memory array;

a second output node for receiving a second current from the memory array;

a bias circuit to generate a bias current;

a first device to generate a first output current equal to the bias current minus the first current; and

a second device to generate a second output current equal to the bias current minus the second current.

8. The output current neuron circuit according to claim 7, wherein the first output current is generated from a read operation of a bit line coupled to one or more w+ cells.

9. The output current neuron circuit according to claim 8, wherein the first output current is generated from a read operation of a bit line coupled to one or more W-cells.

10. An output current neuron circuit, the output current neuron circuit comprising:

a first output node for receiving a first current from a memory array;

a second output node for receiving a second current from the memory array;

a bias circuit to generate a bias voltage at a bias node;

a first variable resistor coupled between the bias node and the first output node;

a second variable resistor is coupled between the bias node and the second output node.

11. A current-to-voltage converter, the current-to-voltage converter comprising:

a first bit line for receiving a first current generated during a read operation of a w+ cell;

a second bit line for receiving a second current generated during a read operation of a W-cell, wherein a difference between a value stored in the w+ cell and a value stored in the W-cell is a weight value W; and

A differential amplifier for receiving the first current and the second current and generating a differential output voltage comprising a first voltage output and a second voltage output.

12. An output block, the output block comprising:

a plurality of current-to-voltage converters, each current-to-voltage converter receiving a bit line differential pair and generating a differential voltage output; and

a plurality of differential input analog-to-digital converters, each differential input analog-to-digital converter receiving a differential voltage output from one of the plurality of current-to-voltage converters and generating a set of digital output bits.

13. An output block, the output block comprising:

a plurality of current-to-voltage converters, each current-to-voltage converter receiving a bit line differential pair and generating a voltage output; and

a plurality of differential input analog-to-digital converters, each differential input analog-to-digital converter receiving a voltage output from one of the plurality of current-to-voltage converters and generating a set of digital output bits.

14. An output block, the output block comprising:

a current-to-voltage converter for receiving a bit line differential pair, the current-to-voltage converter comprising:

A differential operational amplifier comprising first and second inputs and first and second outputs, and the first and second inputs coupled to the bit line differential pair;

a first variable resistor coupled between the first input and the first output;

a second variable resistor coupled between the second input and the second output; and

a common-mode input circuit coupled between the first input and the second input; and

a differential input analog-to-digital converter for receiving the first output and the second output and generating a set of digital output bits.

15. The output block of claim 14, wherein the common-mode input circuit comprises a first variable current source coupled to the first input and a second variable current source coupled to the second input, the first and second variable current sources generating equal currents.

16. An output block, the output block comprising:

an output current neuron circuit, the output current neuron circuit comprising:

a second bit line coupled to a W-cell in the memory array and drawing a second current;

a first bias current coupled to the first bit line; and

a second bias current coupled to the second bit line, wherein the first bias current and the second bias current have the same value.

Which has the same value as the first bias current.

17. An output block, the output block comprising:

an output current neuron circuit, the output current neuron circuit comprising:

a first bit line coupled to a w+ cell in the memory array and drawing a first current during a read operation; and

a second bit line coupled to a W-cell in the memory array and drawing a second current during the read operation;

a first bias current coupled to the first bit line; and

a first output current proportional to a difference between the first current and the second current.

18. The output block of claim 17, wherein the first output current is equal to half the difference of the first current and the second current.

19. The output block of claim 17, further comprising a second output current complementary to the first output current.

20. An offset calibration method for an output block, the method comprising:

applying a nominal bias to an input node of a subcircuit block of said output block; and

an increased or decreased offset trim setting is applied to the subcircuit blocks within the output block until the output of the output block is within a threshold of a target value.

21. The method of claim 20, wherein the sub-circuit block is a current-voltage circuit.

22. The method of claim 20, wherein the sub-circuit block is an analog-to-digital converter circuit.

23. The method of claim 20, the method further comprising:

outputs from neurons are provided by the output block.

24. The method of claim 23, wherein the neuron is part of a neural memory array in a neural network.

25. An offset calibration method for an output block, the method comprising:

Measuring a new trimmed output of the output block in response to the increased offset trim setting;

comparing the new trimmed output to a nominal bias output, wherein:

repeating the applying, measuring and comparing steps when the new trimmed output is equal to the nominal offset output; and

storing the new trimmed output as a trim value when the new trimmed output is different from the nominal bias output; and

the trim value is applied to the subcircuit blocks within the output block during operation.

26. The method of claim 25, the method further comprising:

outputs from neurons are provided by the output block.

27. The method of claim 26, wherein the neuron is part of a neural memory array in a neural network.

28. An offset calibration method for an output block, the method comprising:

applying a nominal bias to an input node of a subcircuit block of said output block;

measuring a nominal offset output of the output block in response to the nominal offset;

applying a reduced offset trim setting to the input node;

Comparing the new trimmed output to a nominal bias output, wherein:

the trim value is applied to the sub-circuit block of the output block during operation.

29. The method of claim 28, the method further comprising:

outputs from neurons are provided by the output block.

30. The method of claim 29, wherein the neuron is part of a neural memory array in a neural network.

31. An offset calibration method for an output block, the method comprising:

applying an input value to an input node of a subcircuit block of said output block;

measuring an output value in response to the input value;

comparing the output value with a target offset value, wherein:

repeating the applying, measuring and comparing steps for a next input value when the output value exceeds the target offset value; and

storing the input value as a trim value when the output value is less than or equal to the target offset value; and

The trim value is applied to the sub-circuit blocks of the output block during operation of the output block.

32. The method of claim 31, the method further comprising:

outputs from neurons are provided by the output block.

33. The method of claim 32, wherein the neuron is part of a neural memory array in a neural network.