CN111931923B - Near memory computing system - Google Patents

Near memory computing system Download PDF

Info

Publication number
CN111931923B
CN111931923B CN202010373249.2A CN202010373249A CN111931923B CN 111931923 B CN111931923 B CN 111931923B CN 202010373249 A CN202010373249 A CN 202010373249A CN 111931923 B CN111931923 B CN 111931923B
Authority
CN
China
Prior art keywords
nonvolatile memory
processing circuit
terminal
node
units
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010373249.2A
Other languages
Chinese (zh)
Other versions
CN111931923A (en
Inventor
林春甫
林庆源
赖宗沐
陈志欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
eMemory Technology Inc
Original Assignee
eMemory Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by eMemory Technology Inc filed Critical eMemory Technology Inc
Publication of CN111931923A publication Critical patent/CN111931923A/en
Application granted granted Critical
Publication of CN111931923B publication Critical patent/CN111931923B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C16/00Erasable programmable read-only memories
    • G11C16/02Erasable programmable read-only memories electrically programmable
    • G11C16/04Erasable programmable read-only memories electrically programmable using variable threshold transistors, e.g. FAMOS
    • G11C16/0408Erasable programmable read-only memories electrically programmable using variable threshold transistors, e.g. FAMOS comprising cells containing floating gate transistors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C16/00Erasable programmable read-only memories
    • G11C16/02Erasable programmable read-only memories electrically programmable
    • G11C16/06Auxiliary circuits, e.g. for writing into memory
    • G11C16/10Programming or data input circuits
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C16/00Erasable programmable read-only memories
    • G11C16/02Erasable programmable read-only memories electrically programmable
    • G11C16/06Auxiliary circuits, e.g. for writing into memory
    • G11C16/26Sensing or reading circuits; Data output circuits
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/48Indexing scheme relating to groups G06F7/48 - G06F7/575
    • G06F2207/4802Special implementations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/48Indexing scheme relating to groups G06F7/48 - G06F7/575
    • G06F2207/4802Special implementations
    • G06F2207/4818Threshold devices
    • G06F2207/4824Neural networks
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C16/00Erasable programmable read-only memories
    • G11C16/02Erasable programmable read-only memories electrically programmable
    • G11C16/06Auxiliary circuits, e.g. for writing into memory
    • G11C16/10Programming or data input circuits
    • G11C16/14Circuits for erasing electrically, e.g. erase voltage switching circuits

Abstract

The invention discloses a near-memory computing system, which comprises a plurality of computing nodes. Each operation node receives a plurality of input signals and outputs operation result signals. Each operation node comprises a plurality of nonvolatile memory units and a processing circuit. Each nonvolatile memory cell stores a weight value in a write operation and outputs a weight signal according to the weight value in a read operation. The processing circuit is coupled to the plurality of nonvolatile memory cells. The processing circuit receives a plurality of input signals and calculates a plurality of weight signals generated by a plurality of nonvolatile memory units to generate operation result signals. The plurality of nonvolatile memory cells and the processing circuit are fabricated by different or the same processes.

Description

Near memory computing system
Technical Field
The present invention relates to a near-memory computing system, and more particularly, to a near-memory computing system capable of performing analog operations.
Background
As artificial intelligence (artificial intelligence, AI) is applied to more and more fields, and since algorithms of artificial intelligence require a large amount of operations, there is also an increasing demand for hardware suitable for the operations of artificial intelligence. In general, artificial intelligence systems must process a large number of operations in a short time to make accurate predictions or decisions, so data operations are typically processed in a parallel fashion.
Neural networks are a common type of artificial intelligence system. The neural network comprises a plurality of nodes, each node can receive data from other nodes to perform operation, and the operation result can be used by other nodes as input data. Because artificial intelligence operates on large amounts of data, the data is typically stored in external memory and stored in internal memory when needed. Thus, the bandwidth of accessing data will significantly affect the instruction cycle of the artificial intelligence system, and the energy consumption required to access data will also burden the artificial intelligence system.
Disclosure of Invention
An embodiment of the invention provides a near memory computing system. The near memory computing system comprises a plurality of computing nodes, wherein each computing node is used for receiving a plurality of input signals and outputting a computing result signal.
Each operation node comprises a plurality of nonvolatile memory units and a processing circuit. Each nonvolatile memory cell stores a weight value in a write operation and outputs a weight signal according to the weight value in a read operation. The processing circuit is coupled to the plurality of nonvolatile memory cells. The processing circuit receives a plurality of input signals and calculates a plurality of weight signals generated by a plurality of nonvolatile memory units to generate operation result signals.
The plurality of nonvolatile memory units and the processing circuit are arranged on two different chips, and the processing circuit and the plurality of nonvolatile memory units are directly coupled with each other between the two chips through face-to-face welding or a plurality of through silicon vias.
Another embodiment of the present invention provides a near memory computing system. The near memory computing system comprises a plurality of computing nodes, wherein each computing node is used for receiving a plurality of input signals and outputting a computing result signal.
Each operation node comprises a plurality of nonvolatile memory units and a processing circuit. Each nonvolatile memory cell stores a weight value in a write operation and outputs a weight signal according to the weight value in a read operation. The processing circuit is coupled to the plurality of nonvolatile memory cells. The processing circuit receives a plurality of input signals and calculates a plurality of weight signals generated by a plurality of nonvolatile memory units to generate operation result signals.
The near memory computing system is formed on a system-on-chip, and the processing circuit is directly coupled to the plurality of nonvolatile memory cells through connection structures on the chip.
Drawings
FIG. 1 is a schematic diagram of a near memory computing system according to an embodiment of the invention.
FIG. 2 is a schematic diagram of an operation node according to an embodiment of the present invention.
FIG. 3 is a diagram showing a connection structure between a nonvolatile memory cell and a processing circuit according to an embodiment of the present invention.
FIG. 4 is a schematic diagram of a nonvolatile memory cell according to an embodiment of the invention.
Wherein reference numerals are as follows:
10 near memory computing system
100 arithmetic node
L1 to LN layer
1101-110X nonvolatile memory cell
120 processing circuit
1221 to 122X operating units
CT: control transistor
CM current mirror
BD (BD: welding structure)
SIG WT1- To SIG WTX Weight signal
SIG WI1- To SIG WIX Weighted input signal
SIG IN1- To SIG INX Input signal
SW1, SW2 switch
T1A, T A transistor
SIG ctrl1 、SIG ctrl2 Control signal
VS1 first system voltage terminal
VS2 second system voltage terminal
I WT1 To I WTX Current flow
I W1 To I WX Weight current
C1 capacitance
CMP comparator
SIG CR Calculation result signal
Vref, reference voltage
CP1, CP2 chip
111 first selection transistor
112 floating gate transistor
113 second selection transistor
114 control assembly
115 purge assembly
SL source line
BL bit line
CL control line
EL clearing line
Detailed Description
FIG. 1 is a schematic diagram of a near memory computing system 10 according to an embodiment of the invention. Near memory system 10 may include a plurality of compute nodes 100. Each of the operation nodes 100 may receive a plurality of input signals and may output an operation result signal.
In some embodiments, the near memory computing system 10 may be applied in a neural network to perform operations of artificial intelligence. That is, the plurality of operation nodes 100 may be connected to each other through a hierarchical structure, so that an operation result signal generated by one operation node may be used as an input signal of another operation node. For example, in fig. 1, the operation node 100 may be divided into a plurality of layers L1 to LN, where N is an integer greater than 1. The operation node of the first layer L1 receives the input signal given by the system, and the operation node of the second layer L2 receives the operation result signal generated by the operation node of the first layer as the input signal.
Fig. 2 is a schematic diagram of an operation node 100 according to an embodiment of the present invention. In fig. 2, the operation node 100 may include a plurality of nonvolatile memory units 1101 to 110X and a processing circuit 120, where X is an integer greater than 1. In some embodiments, each of the nonvolatile memory cells 1101 to 110X may store a weight value in a write operation and may output a weight signal according to the stored weight value in a read operation.
The processing circuit 120 may be connected toReceiving input signal SIG IN1 To SIG INX And can be applied to input signal SIG IN1 To SIG INX And weight signals SIG generated by nonvolatile memory units 1101 to 110X WT1 To SIG WTX Performing an operation to generate an operation result signal SIG- CR . For example, the processing circuit 120 may convert each input signal SIG IN1 To SIG INX And weight signal SIG WT1 To SIG WTX Multiplying the corresponding weight signals and taking the sum of the products as an operation result signal SIG CR
In the prior art, the weight value is stored in the external memory, so the processing circuit 120 must access the weight value in the memory system through the bus. Because artificial intelligence operations require large amounts of data, accessing large amounts of data from memory often takes a lot of time, resulting in reduced operation performance. However, in the operation node 100. Since the processing circuit 120 can be directly coupled to the nonvolatile memory cells 1101 to 110X, the time required to access the memory data can be significantly shortened.
In addition, in some embodiments, the nonvolatile memory cells 1101-110X and the processing circuit 120 may be manufactured by different processes to achieve a better balance of cost and performance. Fig. 3 shows a connection structure between nonvolatile memory cells 1101 to 110X and processing circuit 120 according to an embodiment of the present invention. In fig. 3, the processing circuit 120 and the nonvolatile memory units 1101 to 110X may be provided on different chips CP1 and CP 2. In this case, the processing circuit 120 may be directly coupled to the processing circuit 120 and the nonvolatile memory units 1101 to 110X through a face-to-face solder structure (face-to-face bonding structures) BD between the chips CP1 and CP 2. That is, micro bumps (micro bumps) of the chips CP1 and CP2 may be directly soldered to each other.
However, in some embodiments, chip stack soldering techniques may also be used to connect the processing circuit 120 and the nonvolatile memory cells 1101-110X. For example, the processing circuit 120 and the nonvolatile memory units 1101-110X may be coupled to each other through-silicon vias (through silicon vias, TSVs) between the chips CP1 and CP 2. Furthermore, in some embodiments, the near memory computing system 10 may be formed in a system-on-a-chip (SOC), and the processing circuit 120 may be directly coupled to the nonvolatile memory cells 1101-110X through on-chip interconnect structures.
In addition to the fact that the near memory system 10 can reduce the time required for accessing the memory by directly coupling the processing circuit 120 and the nonvolatile memory units 1101 to 110X, the operation node 100 can also utilize the analog weight signal SIG generated by the nonvolatile memory units 1101 to 110X WT1 To SIG WTX Analog operations are performed to reduce the burden of complex operations.
In some embodiments, the nonvolatile memory cell 100 may be written to and have different write states to store different weight values. For example, in a write operation, each nonvolatile memory cell 1101-110X may be verified with a corresponding specific current or voltage according to the weight value to be stored. In this case, in the read operation, each of the nonvolatile memory cells 1101 to 110X will also output a weight signal having a specific current value or a specific voltage value according to the writing state thereof.
For example, in fig. 2, the weight signals SIG generated by the nonvolatile memory units 1101 to 110X WT1 To SIG WTX May be a current I WT1 To I WTX . In this case, if the weight value stored in the nonvolatile memory unit 1101 is greater than the weight value stored in the nonvolatile memory unit 110X, the weight signal SIG WT1 Is the current I of (2) WT1 Will be greater than the weight signal SIG WTX Is the current I of (2) WTX
In this case, the processing circuit 120 will be at current I WT1 To I WTX Form of receiving weight signal SIG WT1 To SIG WTX And can utilize weight signal SIG WT1 To SIG WTX To perform analog operations. For example, the processing circuit 120 may include a plurality of operation units 1221 to 122X. Each of the operation units 1221 to 122X may be coupled to a corresponding one of the nonvolatile memory units 1101 to 110XAnd (5) a meta. For example, the operation unit 1221 may be coupled to the nonvolatile memory unit 1101, and the operation unit 122X may be coupled to the nonvolatile memory unit 110X.
In fig. 2, each of the operation units 1221 to 122X may include a current mirror CM and a control transistor CT. The current mirror CM may be coupled to the corresponding nonvolatile memory cell and may replicate the weight signal generated by the corresponding nonvolatile memory cell to generate the weight current. For example, the current mirror CM of the operation unit 1221 may copy the weight signal SIG generated by the nonvolatile memory unit 1101 WT1 I.e. current I WT1 And generates a weighting current I W1 While the current mirror CM of the operation unit 122X can copy the weight signal SIG generated by the nonvolatile memory unit 110X WTX I.e. current I WTX And generates a weighting current I WX
In addition, the control transistor CT has a first end, a second end and a control end, wherein the first end of the control transistor CT can receive the weight current, the second end of the control transistor CT can output the weight input signal, and the control end of the control transistor CT can receive the input signal SIG IN1 To SIG INX Corresponding to the input signal. In some embodiments, each input signal SIG IN1 To SIG INX The duration of (2) may represent the input value to which it corresponds. For example, if the signal SIG is input IN1 Represented input values greater than input signal SIG INX Representative input value, input signal SIG IN1 The duration (pulse width) of the signal is longer than the input signal SIG INX Is not shown, is not shown.
In this case, during the calculation, the first terminal of the control transistor CT of the operation unit 1221 will receive the AND weight signal SIG WT1 Related weighting current I W1 The control terminal of the control transistor CT of the operation unit 1221 receives the input signal SIG IN1 So that the control transistor CT is turned on for the duration of its pulse. Therefore, it corresponds to the operation unit 1221 inputting the signal SIG -IN1 And weight signal SIG WT1 Multiplying to produce a weighted input signal SIG WI1 . Similarly, the operation unit 122X will be based on the input signal SIG -INX Weight signal SIG WTX Generates a weighted input signal SIG by the product of (1) WIX
In FIG. 2, the processing circuit 120 may further include a capacitor C1 to weight the input signal SIG WI1 To SIG WIX Integration is performed. The capacitor C1 has a first end and a second end, the first end of the capacitor C1 may be coupled to the second end of the control transistor CT of the operation unit 1221 to 122X, and the second end of the capacitor C1 may be coupled to the first system voltage end VS1. In this case, during the operation, the capacitor C1 will be weighted by the input signal SIG WI1 To SIG WIX And outputs a final voltage via the first terminal of the capacitor C1.
In addition, the processing circuit 120 may further include a comparator CMP for comparing the operation result signal SIG CR Compared with the reference voltage Vref, a pulse signal is generated for the next operation node to be directly used as an input signal. However, in some embodiments, the processing circuit 120 may include other components to generate the operation result signal SIG with a desired form according to the system requirements CR
Furthermore, in fig. 2, the current mirror CM may include transistors T1A and T2A and switches SW1 and SW2. The transistor T1A has a first terminal, a second terminal and a control terminal, the first terminal of the transistor T1A may be coupled to the second system voltage terminal VS2, and the second terminal of the transistor T1A may be coupled to the corresponding nonvolatile memory cell. The transistor T2A has a first terminal, a second terminal and a control terminal, the first terminal of the transistor T2A may be coupled to the second system voltage terminal VS2, the second terminal of the transistor T2A may be coupled to the first terminal of the control transistor CT, and the control terminal of the transistor T2A may be coupled to the control terminal of the transistor T1A. The switch SW1 has a first end, a second end and a control end, the first end of the switch SW1 is coupled to the second system voltage end VS2, the second end of the switch SW1 is coupled to the control end of the transistor T1A, and the control end of the switch SW1 receives the control signal SIG ctrl1 . The switch SW2 has a first end, a second end and a control end, the first end of the switch SW2 is coupled to the control end of the transistor T1A, and the second end of the switch SW2 is coupled to the transistor T1A, and a control terminal of the switch SW2 receives the control signal SIG ctrl2
In this case, the transistors T1A and T2A can be used to copy the weight signal generated by the corresponding nonvolatile memory cell, and the switches SW1 and SW2 can be used to control the operation timing to increase the operation accuracy.
For example, when the operation node 100 is in the standby mode, the nonvolatile memory units 1101 to 110X of the operation node 100 may be in a read state and ready to output the weight signal SIG WT1 To SIG WTX . However, in the standby mode, the switches SW1 of the operation units 1221 to 122X in the processing circuit 120 will be turned on, and the switches SW2 of the operation units 1221 to 122X in the processing circuit 120 will be turned off. In this case, the transistors T1A and T2A will be turned off, and the current mirror CT of the operation units 1221 to 122X will be disabled. In this way, the control transistors CT of the operation units 1221 through 122X do not receive the weighting current I even though the nonvolatile memory units 1101 through 110X are already in the read state W1 To I WX
In some embodiments, the input signal SIG is received at the processing circuit 120 IN1 To SIG INX And before starting the operation, the operation node 100 will first enter the preparation mode so that the weight signal SIG WT1 To SIG WTX Is the current I of (2) WT1 To I WTX Parasitic capacitances on the paths of nonvolatile memory cells 1101-110X to processing circuit 120 may be pre-charged first.
For example, when the operation node 100 is in the standby mode, the nonvolatile memory units 1101 to 110X of the operation node 100 may be in the read state, the switch SW1 in the operation units 1221 to 122X of the processing circuit 120 may be turned off, and the switch SW2 in the operation units 1221 to 122X of the processing circuit 120 may be turned on. In this case, the current mirror CM of the operation units 1221 to 122X of the processing circuit 120 is enabled, and the weighting signal SIG WT1 To SIG WTX Is the current I of (2) WT1 To I WTX Parasitic capacitances between nonvolatile memory cells 1101-110X and processing circuit 120 are precharged. However, the method is thatMeanwhile, since the control transistors CT of the operation units 1221 to 122X have not received the input signal SIG yet IN To SIG INX Therefore, the control transistor CT is still turned off, and the operation node 100 does not operate in the standby mode.
In some embodiments, the nonvolatile memory cells 1101 to 110X and the processing circuit 120 are disposed on different chips and can be coupled by soldering micro bumps, so that the paths coupling the nonvolatile memory cells 1101 to 110X have larger parasitic capacitance and require longer charging time. However, if the nonvolatile memory cells 1101-110X and the processing circuit 120 are disposed on the same chip and coupled by on-chip interconnect structures, the charge time required for parasitic capacitance may be relatively short. That is, the duration of the preparation mode can be determined according to the requirements of the actual conditions.
After the ready mode, the computing node will enter the operational mode to perform the operation. When the operation node is in the operation mode, the nonvolatile memory units 1101 to 110X of the operation node 100 may be in a read state, the switches SW1 of the operation units 1221 to 122X of the processing circuit 120 may be turned off, and the switches SW2 of the operation units 1221 to 122X of the processing circuit 120 may be turned on. In addition, the control transistors CT of the operation units 1221 to 122X of the processing circuit 120 are turned on for a corresponding length of time according to the received input signal, and the weighting current I generated by the current mirror CM of the operation units 1221 to 122X W1 To I WX The capacitor C1 will be charged during the time the control transistor CT is turned on.
In some embodiments, the timing of performing the operation is predictable because the operation result signal generated by one operation node is used as an input signal by another operation node. In this case, when there is an operation node in the operation mode, the next operation node may first enter the standby mode in preparation for performing the next operation. For example, in fig. 1, when the operation node 100 of the first layer L1 is in the operation mode, the operation nodes of the second layer L2 can enter the preparation mode at the same time. Furthermore, in some embodiments, if the operation node 100 requires a longer time to charge the parasitic capacitance, it may take a longer time to lengthen the preparation mode. In this case, when the operation node 100 of the first layer L1 is in the operation mode, the operation node of the third layer L3 may also enter the standby mode at the same time. In this way, the operation node of the third layer L3 can take a longer time to charge the parasitic capacitance before the operation node of the third layer L3 starts operation.
Fig. 4 is a schematic diagram of a nonvolatile memory cell 1101 according to an embodiment of the present invention. In some embodiments, nonvolatile memory cells 1101-110X may have the same structure. In fig. 4, the nonvolatile memory cell 1101 may include a first select transistor 111, a floating gate transistor 112, a second select transistor 113, a control component 114, and a clear component 115.
The first selection transistor 111 has a first terminal, a second terminal and a control terminal, the first terminal of the first selection transistor 111 may be coupled to the source line SL, and the control terminal of the first selection transistor 111 may be coupled to the selection gate line SGL. The floating gate transistor 112 has a first end, a second end, and a floating gate end, and the first end of the floating gate transistor 112 may be coupled to the second end of the first selection transistor 111. The second selection transistor 113 has a first terminal, a second terminal, and a control terminal, the first terminal of the second selection transistor 113 may be coupled to the second terminal of the floating gate transistor 112, the second terminal of the second selection transistor 113 may be coupled to the bit line BL, and the control terminal of the second selection transistor 113 may be coupled to the word line WL.
The control element 114 has a first end and a second end, the first end of the control element 114 may be coupled to the floating gate end of the floating gate transistor 112, and the second end of the control element 114 may be coupled to the control line CL. The clearing element 115 has a first end and a second end, the first end of the clearing element 115 may be coupled to the floating gate end of the floating gate transistor 112, and the second end of the clearing element 115 may be coupled to the clearing line EL. In some embodiments, the control element 114 and the cleaning element 115 may be capacitive elements and may be implemented using transistors. In this case, the first end of the control element 114, the second end of the clearing element 115 and the floating gate end of the floating gate transistor 112 may be coupled via the same crystalline silicon layer, and the second end of the control element 114 and the second end of the clearing element 115 may be the source/drain ends of the two transistors implementing the control element 114 and the clearing element 115.
In addition, since the coupling area of the control element 114 is large, the voltage at the floating gate terminal will be dominated by the control element 114. That is, the floating gate terminal may be coupled to the voltage of the control line CL via the control component 114. Thus, the control component 114 can be employed to perform a write operation.
For example, in a write operation, the control line CL may be at a write voltage, the select gate line SGL and the word line WL may be at an operating voltage, and the bit line BL and the source line may be at a reference voltage. The write voltage may be greater than the operating voltage, and the operating voltage may be greater than the reference voltage. In this case, the first and second selection transistors 111 and 113 are turned on, so that the first and second terminals of the floating gate transistor 112 receive the reference voltage, and the floating gate terminal of the floating gate transistor 112 is coupled to the write voltage. Thus, a large voltage differential across the floating gate transistor 112 will cause Fowler-Nordheim tunneling (FN tunneling) electrons to be trapped by the floating gate of the floating gate transistor 112, thereby changing the writing state of the nonvolatile memory cell 1101.
In addition, in a read operation, the source line SL, the control line CL, and the clear line EL may be at a reference voltage, the select gate line SGL and the word line WL may be at an operating voltage, and the bit line BL may be precharged to a read voltage slightly greater than the reference voltage. In this case, the first and second select transistors 111 and 113 will be turned on, while the nonvolatile memory cell 1101 will induce a corresponding magnitude of read current according to its write state.
In general, the more electrons are trapped by floating gate transistor 112, the less current is drawn by floating gate transistor 112 during a read operation. Therefore, in the write operation, the number of times of applying the write voltage can be adjusted according to the weight value to be stored, and after each application of the write voltage, the nonvolatile memory cell 1101 is verified with the corresponding current value, and if the corresponding current value is not reached, the write voltage is continuously applied until the nonvolatile memory cell 110 reaches the desired write state.
That is, if the weight value to be stored in the nonvolatile memory cell 1101 is smaller, the nonvolatile memory cell 1101 may be applied with a write voltage more times, so that the nonvolatile memory cell 1101 can generate a smaller current in a read operation.
Furthermore, the coupling area of the erasing element 115 is relatively small, and thus can be used to erase the written state of the nonvolatile memory cell 1101. For example, in a erase operation, the first and second terminals of the floating gate transistor 112 may receive a reference voltage, and the erase line EL may be at an erase voltage similar to the write voltage. In this case, the high voltage applied to the erase line EL will be able to re-direct the electrons to tunnel and release the electrons trapped by the floating gate of the floating gate transistor 112 through the erase element 115.
Although fig. 4 illustrates an example of the nonvolatile memory 1101 of the present invention, the present invention is not limited thereto, and the nonvolatile memory units 1101 to 110X may be implemented by other structures in other embodiments of the present invention. For example, the nonvolatile memory cells 1101-110X may be implemented using other write-many (Multiple Time Programmable, MTP) memory cells or variable resistive memory (Resistive Random Access Memory, RRAM) cells. In addition, the weight signal SIG WT1 To SIG WTX The non-volatile transistor cells 1101 to 1101X may be transmitted in a row of voltage signals or current signals in accordance with structural characteristics thereof.
In summary, the near-memory computing system provided by the embodiments of the invention can directly couple the processing circuit with the memory unit and can receive the analog signal from the memory unit to perform the analog operation. Thus, the burden of accessing external memory and performing complex operations can be significantly reduced when executing artificial intelligence algorithms.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A near memory computing system, comprising:
each operation node is used for receiving a plurality of input signals and outputting operation result signals, and each operation node comprises:
a plurality of nonvolatile memory cells, each for storing a weight value in a write operation and outputting a weight signal according to the weight value in a read operation; and
The processing circuit is coupled to the plurality of nonvolatile memory units and is used for receiving the plurality of input signals and calculating the plurality of input signals and a plurality of weight signals generated by the plurality of nonvolatile memory units to generate the calculation result signals;
the processing circuit and the nonvolatile memory units are arranged on two different chips, and are directly coupled with each other through face-to-face welding or a plurality of through silicon holes between the two chips, and the nonvolatile memory units and the processing circuit are manufactured by the same or different processes.
2. A near memory computing system, comprising:
each operation node is used for receiving a plurality of input signals and outputting operation result signals, and each operation node comprises:
a plurality of nonvolatile memory cells, each for storing a weight value in a write operation and outputting a weight signal according to the weight value in a read operation; and
The processing circuit is coupled to the plurality of nonvolatile memory units and is used for receiving the plurality of input signals and calculating the plurality of input signals and a plurality of weight signals generated by the plurality of nonvolatile memory units to generate the calculation result signals;
the near memory operation system is formed on a system-in-chip, and the processing circuit is directly coupled with the nonvolatile memory units through a connection structure on the chip, and the nonvolatile memory units and the processing circuit are manufactured by the same or different processes.
3. The near memory computing system of claim 1 or 2, wherein the plurality of weight signals generated by the plurality of non-volatile memory cells are current signals, and the processing circuit comprises:
a plurality of operation units, each operation unit including:
a current mirror coupled to a corresponding nonvolatile memory cell of the plurality of nonvolatile memory cells and configured to copy a weight signal generated by the corresponding nonvolatile memory cell to generate a weight current; and
A control transistor having a first terminal for receiving the weighted current, a second terminal for outputting a weighted input signal, and a control terminal for receiving a corresponding input signal of the plurality of input signals.
4. The near memory computing system of claim 3, wherein the processing circuit further comprises:
a capacitor having a first terminal coupled to the second terminals of the control transistors of the operation units and a second terminal coupled to the first system voltage terminal.
5. The near memory computing system of claim 3, wherein the current mirror comprises:
a first transistor having a first terminal coupled to a second system voltage terminal, a second terminal coupled to the corresponding nonvolatile memory cell, and a control terminal;
a second transistor having a first terminal coupled to the second system voltage terminal, a second terminal coupled to the first terminal of the control transistor, and a control terminal coupled to the control terminal of the first transistor;
a first switch having a first terminal coupled to the second system voltage terminal, a second terminal coupled to the control terminal of the first transistor, and a control terminal for receiving a first control signal; and
The second switch has a first terminal coupled to the control terminal of the first transistor, a second terminal coupled to the second terminal of the first transistor, and a control terminal for receiving a second control signal.
6. The near memory computing system of claim 5, wherein when a first computing node of the plurality of computing nodes is in a standby mode:
the plurality of nonvolatile memory cells in the first operation node are in a read state;
a plurality of first switches of a plurality of operation units in a processing circuit of the first operation node are turned on; and
A plurality of second switches of the plurality of operation units in the processing circuit of the first operation node are turned off.
7. The near memory computing system of claim 6, wherein when the first compute node of the plurality of compute nodes is in a ready mode:
the plurality of non-volatile memory cells in the first operational node are in the read state; the plurality of first switches of the plurality of operation units in the processing circuit of the first operation node are turned off; and
The plurality of second switches of the plurality of operation units in the processing circuit of the first operation node are turned on.
8. The near memory computing system of claim 7, wherein when the first compute node of the plurality of compute nodes is in operational mode:
the plurality of non-volatile memory cells in the first operational node are in the read state; the plurality of first switches of the plurality of operation units in the processing circuit of the first operation node are turned off;
the plurality of second switches of the plurality of operation units in the processing circuit of the first operation node are turned on; and
The control end of each control transistor receives a corresponding input signal of the plurality of input signals in the plurality of operation units of the processing circuit of the first operation node.
9. The near memory computing system of claim 8, wherein:
the plurality of operation nodes belong to a first layer, a second layer or a third layer which are different;
the first operation node belongs to a first layer; and
When the first operation node is in the operation mode, the operation node belonging to the second layer or the third layer is in the standby mode.
CN202010373249.2A 2019-05-13 2020-05-06 Near memory computing system Active CN111931923B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201962847246P 2019-05-13 2019-05-13
US62/847,246 2019-05-13
US16/826,278 US11062773B2 (en) 2019-05-13 2020-03-22 Near-memory computation system for analog computing
US16/826,278 2020-03-22

Publications (2)

Publication Number Publication Date
CN111931923A CN111931923A (en) 2020-11-13
CN111931923B true CN111931923B (en) 2024-03-19

Family

ID=70154357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010373249.2A Active CN111931923B (en) 2019-05-13 2020-05-06 Near memory computing system

Country Status (4)

Country Link
US (1) US11062773B2 (en)
EP (1) EP3739444A1 (en)
CN (1) CN111931923B (en)
TW (1) TWI723871B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11500613B2 (en) * 2020-02-06 2022-11-15 National Tsing Hua University Memory unit with multiply-accumulate assist scheme for multi-bit convolutional neural network based computing-in-memory applications and computing method thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1846302A (en) * 2003-09-15 2006-10-11 国际商业机器公司 Integrated electronic chip and interconnect device and process for making the same
CN1846218A (en) * 2003-09-09 2006-10-11 西麦恩公司 An artificial neural network
CN106372723A (en) * 2016-09-26 2017-02-01 上海新储集成电路有限公司 Neural network chip-based storage structure and storage method
CN108053848A (en) * 2018-01-02 2018-05-18 清华大学 Circuit structure and neural network chip

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10217045B2 (en) * 2012-07-16 2019-02-26 Cornell University Computation devices and artificial neurons based on nanoelectromechanical systems
US9501603B2 (en) * 2014-09-05 2016-11-22 International Business Machines Corporation Integrated circuit design changes using through-silicon vias
JP6697994B2 (en) * 2016-09-29 2020-05-27 ルネサスエレクトロニクス株式会社 Semiconductor device
US10991430B2 (en) * 2018-12-19 2021-04-27 Ememory Technology Inc. Non-volatile memory cell compliant to a near memory computation system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1846218A (en) * 2003-09-09 2006-10-11 西麦恩公司 An artificial neural network
CN1846302A (en) * 2003-09-15 2006-10-11 国际商业机器公司 Integrated electronic chip and interconnect device and process for making the same
CN106372723A (en) * 2016-09-26 2017-02-01 上海新储集成电路有限公司 Neural network chip-based storage structure and storage method
CN108053848A (en) * 2018-01-02 2018-05-18 清华大学 Circuit structure and neural network chip

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Circuit-Based Neural Network with Hybrid Learning of Backpropagation and Random Weight Change Algorithms;Changju Yang 等;《sensors》;1-18 *
AN ANALOG SELF-ORGANIZING NEURAL NElWORK CHIP;James R. Mann 等;《NIPS》;1-9 *
光子神经网络——重新定义AI芯片;白冰 等;《人工智能》;1-10 *

Also Published As

Publication number Publication date
TW202042237A (en) 2020-11-16
TWI723871B (en) 2021-04-01
US11062773B2 (en) 2021-07-13
CN111931923A (en) 2020-11-13
EP3739444A1 (en) 2020-11-18
US20200365209A1 (en) 2020-11-19

Similar Documents

Publication Publication Date Title
TWI716215B (en) Near-memory computation system and non-volatile memory cell
JP6047844B2 (en) Redundant sense amplifier memory
TWI733706B (en) Reducing verification checks when programming a memory device
TWI228256B (en) Flash memory device with stable source line regardless of bit line coupling and loading effect
TWI435325B (en) Random access memory with cmos-compatible nonvolatile storage element in series with storage capacitor
Zheng et al. Memristor-based ternary content addressable memory (mTCAM) for data-intensive computing
DE102021106147A1 (en) READING AMPLIFIER ARCHITECTURE FOR PROCESSES WITH LOW SUPPLY VOLTAGE
KR20160008875A (en) Semiconductor memory device and operating method thereof
TW202203007A (en) Time-shared compute-in-memory bitcell
CN114341772A (en) Power management in semiconductor memory
CN111931923B (en) Near memory computing system
CN101009138B (en) Programing method of flash memory
US10916303B2 (en) Resistive memory apparatus and method of operating a resistive memory apparatus
US20150340097A1 (en) Voltage generator and semiconductor memory device
JP2002100196A (en) Semiconductor memory
JP2022060145A (en) Multi-level ultra-low power inference engine accelerator
TWI270080B (en) Nonvolatile semiconductor storage device
TWI523013B (en) Tile-level snapback detection through coupling capacitor in a cross point array
TW200951952A (en) Random access memory with CMOS-compatible nonvolatile storage element
US20230230640A1 (en) Nonvolatile memory device including combined sensing node and cache read method thereof
JP2001085633A (en) Semiconductor having capacitive structure, charge pump circuit having capacitive structure, and semiconductor device having charge pump circuit
US20220108729A1 (en) Memory device
CN115099178A (en) Flash type FPGA wiring method considering programming interference
CN114846547A (en) Regulation of voltage generating system
JP2007172739A (en) Nonvolatile semiconductor memory apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant