CN114863964A - In-memory computing circuit, memory and equipment based on local multiply-integral addition structure - Google Patents

In-memory computing circuit, memory and equipment based on local multiply-integral addition structure Download PDF

Info

Publication number
CN114863964A
CN114863964A CN202210457204.2A CN202210457204A CN114863964A CN 114863964 A CN114863964 A CN 114863964A CN 202210457204 A CN202210457204 A CN 202210457204A CN 114863964 A CN114863964 A CN 114863964A
Authority
CN
China
Prior art keywords
memory
transistor
computing
sub
line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210457204.2A
Other languages
Chinese (zh)
Inventor
王琳方
窦春萌
叶望
安俊杰
李伟增
高行行
刘琦
李泠
刘明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Microelectronics of CAS
Original Assignee
Institute of Microelectronics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Microelectronics of CAS filed Critical Institute of Microelectronics of CAS
Priority to CN202210457204.2A priority Critical patent/CN114863964A/en
Publication of CN114863964A publication Critical patent/CN114863964A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/12Bit line control circuits, e.g. drivers, boosters, pull-up circuits, pull-down circuits, precharging circuits, equalising circuits, for bit lines
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C5/00Details of stores covered by group G11C11/00
    • G11C5/14Power supply arrangements, e.g. power down, chip selection or deselection, layout of wirings or power grids, or multiple supply levels
    • G11C5/147Voltage reference generators, voltage or current regulators; Internally lowered supply levels; Compensation for voltage drops
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1051Data output circuits, e.g. read-out amplifiers, data output buffers, data output registers, data output level conversion circuits
    • G11C7/1069I/O lines read out arrangements
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/16Storage of analogue signals in digital stores using an arrangement comprising analogue/digital [A/D] converters, digital memories and digital/analogue [D/A] converters 
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/18Bit line organisation; Bit line lay-out
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The present disclosure provides a memory computing circuit based on a local multiply-integral add structure, including: a plurality of sub-compute arrays, a plurality of word lines, a plurality of bit lines, a plurality of complementary bit lines, and a plurality of source lines; the sub-calculation array common word lines in each row are connected, and the sub-calculation array common bit lines, the common complementary bit lines and the common source lines in each column are connected; each computing unit in each sub-computing array comprises a first transistor, a first memory, a second transistor and a second memory; the grid electrode of the transistor is connected with the word line, and the source electrode of the transistor is connected with the source line; the drain electrode of the first transistor is connected with a first memory, and the other end of the first memory is connected with a bit line; the drain electrode of the second transistor is connected with a second memory, and the other end of the second memory is connected with a complementary bit line; the grid electrode of the third transistor is connected with a source line, the source electrode of the third transistor is grounded, and the drain electrode of the third transistor is connected with the source electrode of the fourth transistor; the grid of the fourth transistor is connected with the input line, and the drain of the fourth transistor is connected with the computing line through the switch. The disclosure also provides a memory and an electronic device.

Description

In-memory computing circuit, memory and equipment based on local multiply-integral addition structure
Technical Field
The present disclosure relates to the field of semiconductor integrated circuit technology, and in particular, to a memory computing circuit, a memory, and an electronic device based on a local multiply-bulk add structure.
Background
By effectively reducing power consumption and delay caused by frequent memory access, the memory computing technology based on the novel non-volatile memory is expected to greatly improve computing energy efficiency and computing power, so that hardware support is provided for computing tasks which take artificial intelligence as a representative and take data as a center. Among them, Resistive Random Access Memory (RRAM) has a great potential in terms of operation power consumption, integration density, and process compatibility. The resistive random access memory has the advantages of low power consumption, small delay, high density and high process compatibility as a novel non-volatile memory, and the non-volatile memory computing technology based on the resistive random access memory can effectively reduce data transfer between a processor and the memory and between memory levels (from non-volatile memory to memory), thereby greatly reducing power consumption and delay caused by the data transfer, and breaking through the bottleneck caused by a memory wall.
However, in the prior art, the 1T1R circuit has a small Resistance value in a Low Resistance State (LRS) of the RRAM, and a large output convergence current is generated in the calculation process, so that the power consumption overhead becomes high, and the calculation result of the 1T1R circuit is easily affected by non-ideal factors such as process fluctuation and Resistance drift, and has poor robustness.
Disclosure of Invention
In order to solve the above problems in the prior art, the present disclosure provides a memory computing circuit, a memory and an electronic device based on a local multiply-integral add structure, and aims to solve the technical problems of too large current convergence of the conventional 1T1R circuit, high integration level of the circuit, and reduction of power consumption of the circuit.
A first aspect of the present disclosure provides a memory computing circuit based on a local multiply-bulk add structure, including: a plurality of sub-compute arrays, a plurality of word lines, a plurality of bit lines, a plurality of complementary bit lines, and a plurality of source lines; the sub-calculation array common word lines in each row are connected, and the sub-calculation array common bit lines, the common complementary bit lines and the common source lines in each column are connected; wherein each sub-compute array comprises: a plurality of computing units, a third transistor, a fourth transistor and a CMOS switch; each computing unit adopts a 2T2R structure of a common source line, and comprises: the memory comprises a first transistor, a first memory, a second transistor and a second memory; the grid electrodes of the first transistor and the second transistor are connected with a word line, and the source electrodes are connected with a source line; the drain electrode of the first transistor is connected with a first memory, and the other end of the first memory is connected with a bit line; the drain electrode of the second transistor is connected with a second memory, and the other end of the second memory is connected with a complementary bit line; the grid electrode of the third transistor is connected with a source line, the source electrode of the third transistor is grounded, and the drain electrode of the third transistor is connected with the source electrode of the fourth transistor; the gate of the fourth transistor is connected to the input line and the drain is connected to the calculation line through a CMOS switch.
Further, the calculation mode of each sub-calculation array comprises a partial pressure calculation mode and a coupling calculation mode.
Furthermore, when each sub-computing array works in a partial pressure computing mode, a computing unit in each sub-computing array is selected, a word line of the computing unit is connected to a first bias voltage, a bit line is connected to a first reading voltage, a complementary bit line is grounded, and a source line is floated; and grounding the rest unselected bit lines and word lines in the sub-calculation array, and floating the source lines.
Further, when the resistance value of the first memory is far larger than that of the second transistor, the gate voltage of the third transistor is 0; when the resistance value of the first memory is far smaller than that of the second transistor, the grid voltage of the third transistor is the first reading voltage.
Furthermore, when each sub-computing array works in a coupling computing mode, a computing unit in each sub-computing array is selected, a word line of the computing unit is connected to a second bias voltage, one of a bit line and a complementary bit line is connected to a second reading voltage, and the other one of the bit line and the complementary bit line floats; the remaining unselected word lines in the sub-compute array are grounded and the bit lines are left floating.
Further, the second bias voltage and the second reading voltage are both pulse voltages, and the high level periods of the two are opposite.
Furthermore, the other end of the calculation line in each row of the sub-calculation array is connected with an analog-to-digital converter, and the analog-to-digital converter is used for converting the analog-to-digital signals output by each row of the sub-calculation array into digital signals to obtain the voltage value of the calculation line.
Furthermore, the first memory and the second memory are resistive random access memories or phase change memories, and the third transistor and the fourth transistor are NMOS transistors.
A second aspect of the present disclosure provides a memory comprising: a memory computation circuit based on a local multiply-bulk-add structure as provided in a first aspect of the present disclosure.
A third aspect of the present disclosure provides an electronic device, comprising: a memory as provided by the second aspect of the disclosure.
Compared with the prior art, the method has the following beneficial effects:
(1) the memory computing circuit provided by the disclosure adds in a mode of multiplication and charge sharing based on a voltage division/coupling mode, and has no large direct current in the computing process, so that the problem of large current convergence of the traditional 1T1R circuit is solved, and the power consumption of the circuit is reduced.
(2) When the structure is used for multiplication calculation based on a voltage division mode, the structure is essentially R L And R R With differential signalling between two memoriesAnd the influence of non-ideal factors such as device process fluctuation on the calculation accuracy is restrained by calculation.
(3) When the memory computing circuit realizes a coupling computing mode, transient voltage coupling is adopted for multiplication, direct current is further reduced, and circuit power consumption is reduced.
(4) The memory computing circuit adopts a charge sharing mode for addition, so that a weighted current convergence path (three stages of switch completion) is isolated, nonlinear errors can be effectively inhibited, and the computing parallelism is improved.
(5) And a 2T2R unit is adopted as a sub-calculation unit, and a local multiplication-integral addition circuit structure shared by a plurality of sub-arrays of the line is calculated, so that the circuit area overhead can be effectively reduced.
Drawings
For a more complete understanding of the present disclosure and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
FIG. 1A schematically illustrates a prior art 1T1R memory computing circuit;
FIG. 1B is a diagram schematically illustrating the result of multiply-accumulate operation of a prior art 1T1R memory computing circuit;
FIG. 2 schematically illustrates a schematic diagram of a memory computing circuit based on a local multiply-bulk-add structure, according to an embodiment of the disclosure;
FIG. 3 is a schematic diagram illustrating a voltage division calculation mode of an internal memory circuit according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram illustrating an operation mode and a voltage control scheme of a voltage division calculation mode in a memory coupled circuit according to an embodiment of the disclosure;
FIG. 5 is a schematic diagram illustrating an equivalent multiply accumulate mode in a memory computing circuit according to an embodiment of the disclosure;
fig. 6 schematically shows a diagram of multiply-accumulate results in multiply-accumulate mode according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
FIG. 1A schematically illustrates a prior art 1T1R memory computing circuit. As shown in fig. 1A, the memory computing circuit in the prior art adopts an array structure based on One-Transistor-One-resistance-change-element (1T 1R) cells, in which each RRAM cell is connected in series with a gate Transistor, One electrode of the RRAM cell is connected to a Bit Line (BL), the other electrode is connected to a drain of the Transistor, a Source Line (SL) of the Transistor is connected to a Source Line (Source Line, SL), and a gate is connected to a Word Line (WL).
In general, memory computation is implemented by using a memory array internal current weighting and summing method to perform Multiply-Accumulate (MAC) operation of input values and stored values in parallel. IN the 1T1R circuit, the INPUT value (INPUT, IN) can be expressed as a level applied by a wordline, and the stored value or Weight (Weight, W) is the resistance of a memory cell made of a device such as RRAM. From kirchhoff's law, the process of multiplying the analog input Value and the stored Value by the current signal of the read bit line in the analog domain is equivalent to the realization of matrix-vector multiplication, and the output current of the circuit is proportional to the calculation result of Multiply-Accumulate Value (MACV). The input values, the stored values, and the multiplication results are represented in table 1 below.
TABLE 11T 1R shows how to calculate the input values, stored values, and product structures in memory
Figure BDA0003619378760000051
In summary, the 1T1R memory computing circuit in the prior art has at least the following disadvantages: 1) because the resistance value of the RRAM Low Resistance State (LRS) is small, a large output convergence current can be generated in the calculation process, and thus the power consumption overhead becomes high; 2) and the circuit calculation result is easily influenced by non-ideal factors such as process fluctuation, resistance drift and the like, and the robustness is poor. 3) The multiplication and addition operation is expressed by adopting a weighted current convergence mode, when the current is large, the current is influenced by parasitic resistance and the like, a nonlinear error exists between the current and an ideal value, and the parallelism of the multiplication and addition operation is limited, as shown in fig. 1B.
In view of the above problem, an embodiment of the present disclosure provides a structure of a memory computing circuit based on a local multiply-bulk add structure, including: a plurality of sub-compute arrays, a plurality of word lines, a plurality of bit lines, a plurality of complementary bit lines, and a plurality of source lines; the sub-calculation array common word lines in each row are connected, and the sub-calculation array common bit lines, the common complementary bit lines and the common source lines in each column are connected; wherein each sub-compute array comprises: a plurality of computing units, a third transistor, a fourth transistor and a CMOS switch; each computing unit adopts a 2T2R structure of a common source line, and comprises: the memory comprises a first transistor, a first memory, a second transistor and a second memory; the grid electrodes of the first transistor and the second transistor are connected with a word line, and the source electrodes are connected with a source line; the drain electrode of the first transistor is connected with a first memory, and the other end of the first memory is connected with a bit line; the drain electrode of the second transistor is connected with a second memory, and the other end of the second memory is connected with a complementary bit line; the grid electrode of the third transistor is connected with a source line, the source electrode of the third transistor is grounded, and the drain electrode of the third transistor is connected with the source electrode of the fourth transistor; the gate of the fourth transistor is connected to the input line and the drain is connected to the calculation line through a CMOS switch.
The memory computing circuit structure based on the local multiplication-integral addition structure provided by the embodiment of the disclosure is a local multiplication-integral addition structure which performs addition by adopting a Two-Transistor Two-resistance-change unit (2T 2R) cascode array to perform multiplication and calculating line charge sharing, and the basic computing principle is that charging and discharging of local parasitic capacitance of a computing line are determined together with computing input through 2T2R voltage division or 1T1R voltage coupling, and then a final multiplication-addition computing result is obtained through sharing all parasitic capacitance charges on the computing line. Compared with a traditional calculation mode based on current convergence of the 1T1R structure, the embodiment of the disclosure reduces the circuit area, reduces the power consumption in the calculation process, and is beneficial to inhibiting nonlinear errors caused by non-ideal factors of devices, parasitic resistance and the like.
Fig. 2 schematically illustrates a structural diagram of a memory computing circuit based on a local multiply-bulk-add structure according to an embodiment of the disclosure.
As shown in fig. 2, the memory computing circuit based on the local multiply-bulk add structure includes: a plurality of sub-compute arrays, a plurality of word lines WL, a plurality of bit lines BL, a plurality of complementary bit lines BLB, and a plurality of source lines SL; the sub-computation array common word lines WL in each row are connected, and the sub-computation array common bit lines BL, the common complementary bit lines BLB and the common source lines SL in each column are connected.
In the embodiment of the present disclosure, as shown in fig. 2, an example of the memory computing circuit includes 64 sub-computing arrays in each row and 16 sub-computing arrays in total, the sub-computing arrays in the first row are respectively the sub-computing array 100, the sub-computing array 101, the. Wherein, each sub-computing array structure is consistent, that is, all the sub-computing array structures in the sub-computing arrays 100 to 1663 are consistent.
It should be noted that, the number of the sub-calculation arrays in each row is 64, and the number of the sub-calculation array rows in the memory calculation circuit is 16, which are only exemplary illustrations, in some other embodiments, the number of the sub-calculation arrays in each row may also be 128, 256, and the like, and the number of the sub-calculation array rows in the memory calculation circuit is 32 rows, 64 rows, and the like, which is not limited in the embodiment of the present disclosure.
The sub-compute array 100 structure is described in detail with other sub-compute array structures in keeping with it. The sub-compute array 100 includes: a plurality of calculation units 10, a third transistor 15, a fourth transistor 16, and a CMOS switch 17. Each computing unit 10 adopts a 2T2R structure of a common source line, and specifically includes: a first transistor 11, a first memory 12, a second transistor 13, and a second memory 14. The gates of the first transistor 11 and the second transistor 13 are connected to a word line WL, and the sources thereof are connected to a source line SL; the drain of the first transistor 11 is connected to the first memory 12, and the other end of the first memory 12 is connected to the bit line BL; the drain of the second transistor 13 is connected to the second memory 14, and the other end of the second memory 14 is connected to the complementary bit line BLB. The third transistor 15 has a gate connected to the source line SL, a source grounded, and a drain connected to the source of the fourth transistor 16; the gate of the fourth transistor 16 is connected to the input line IN and the drain is connected to the calculation line TBL via a CMOS switch 17. It should be noted that, the connection relationship between each compute unit in the sub-compute array 100 and its adjacent word line, bit line, complementary bit line and source line is as described above, and the embodiments of the disclosure are not repeated one by one.
As shown in fig. 2, the Sub-computation Array 100 is a Sub-Array (Sub Array) composed of 128 2T2R cells, that is, 128 2T2R cells formed by selecting 32 Sub-lines WL, 4 bit lines BL, 4 complementary bit lines BLB and 4 source lines SL, and the Sub-computation Array 100 includes 128 computation cells. Fig. 2 shows a schematic diagram of 32 2T2R cells formed by 32 sub-lines WL, 1 bit line BL, 1 complementary bit line BLB and 1 source line SL, and the other 2T2R cells are distributed in accordance with the schematic diagram (not shown).
It should be noted that the number of 2T2R cells in each sub-compute array is not limited to 128, and is merely an exemplary illustration and does not constitute a limitation of the embodiments of the present disclosure. The number of 2T2R cells in each sub-compute array may be 256 or more depending on the application, and embodiments of the present disclosure are not limited thereto.
As shown in fig. 2, a memory between a bit line BL and a source line SL adjacent to the bit line BL is denoted as R L I.e. denote the first memory 12 as R L (ii) a The memory between a source line SL and a complementary bit line BLB adjacent to the source line SL is denoted as R R I.e. denote the second memory 14 as R R . In the memory mode, each resistive random access device RRAM can be independently selected by a word line WL and a bit line BL for reading and writing. IN the compute mode, the source line SL IN each sub-compute array is connected to the gate of the third transistor 15 and the input line IN is connected to the gate of the fourth transistor 16. The drains of all fourth transistors 16 of the same row of sub-arrays are selected by CMOS switches 17 to be connected or not to a calculation line TBL, each of which is connected to an analog-to-digital converter (i.e. a subsequent stage ADC). The analog-to-digital converter ADC is used for converting the analog-to-digital signals output by each row of the sub-calculation arrays into digital signals to obtain voltage values of the calculation lines. In the embodiment of the disclosure, the 2T2R common source line structure is adopted for each compute sub-array, so that the area of the array can be effectively reduced.
Specifically, the transistors in each sub-compute array may be NMOS transistors, that is, the first transistor 11, the second transistor 13, the third transistor 15, and the fourth transistor 16 are NMOS transistors. In practical applications, when different resistance values of the high resistance state and the low resistance state are required, the first memory and the second memory may be resistive memories or phase change memories. When the first memory and the second memory are resistive random access memories, the memory can be made of metal oxide, sulfide or organic dielectric material; when the first and second memories are phase change memories, the material of the memories may be Ge 2 Sb 2 Ted 5 Phase change materials of (2), etc.
According to the embodiment of the disclosure, the calculation mode of each sub-calculation array has two multiplication calculation modes, specifically, a partial pressure calculation mode and a coupling calculation mode. In the partial voltage calculation mode, the corresponding word line WL, bit line BL, etc. are selected,The source line SL and the complementary bit line BLB can realize one 2T2R unit in the neutron calculation array by applying corresponding bias voltage so as to realize R of storing data through 2T2R L And R R The resistance value is expressed. Similarly, in the coupling calculation mode, by selecting the corresponding word line WL, bit line BL, source line SL and complementary bit line BLB, pulse voltages with different high level periods are applied to realize the coupling calculation.
Specifically, when each sub-computing array works in a partial pressure computing mode, one computing unit in each sub-computing array is selected, a word line WL of the computing unit is switched into a first bias voltage VDD, and a bit line BL is switched into a first reading voltage V read The complementary bit line BLB is grounded, and the source line SL of the selected computing unit floats; the remaining unselected bit lines WL and word lines BL in the sub-compute array are grounded, and the source line SL is floating, as shown in FIG. 3 a.
In the partial pressure calculation mode, the selected calculation unit is based on R L And R R Determines the voltage on the source line SL, i.e. determines the voltage on the gate 15 of the third transistor to be a stable V read Or 0. Specifically, when the first memory R is used L Resistance value (R) L High configuration) much larger than the second memory R R Resistance value (R) R Low configuration), the voltage at the gate of the third transistor 15 is a stable 0; when the resistance value R of the first memory 12 is L (R L Low configuration) is much smaller than the resistance R of the second memory 14 R (R R High configuration), the voltage at the gate of the third transistor 15 is stable V read . At this time, the drain of the fourth transistor 16 is precharged to the first bias voltage VDD, and if and only if the gate voltage of the third transistor 15 is V read When the gate voltage of the fourth transistor 16 is the input IN value (the magnitude of the gate voltage is equal to the first bias voltage VDD), the fourth transistor 16 is turned on, and the calculation result of the selected calculation unit is selected to be closed by the CMOS switch 17 and then input to the subsequent ADC through the calculation line TBL.
The coding method of the selected sub-calculation array in the partial pressure calculation mode is shown in the following table 2:
TABLE 22T 2R encoding method for partial pressure calculation
Figure BDA0003619378760000091
Wherein, if and only if the magnitude of the input IN voltage is the first bias voltage VDD, it is coded as 1; resistance value R of the first memory 12 L For low configuration LRS, the resistance R of the second memory 14 R The HRS is high configuration, the code of the computing unit is 1, the fourth transistor is in a conducting state at the moment, and the output code value is 1; otherwise the output code value is 0.
In the embodiment of the disclosure, the selected computing unit realizes the multiplication in a voltage division mode, which is essentially R L And R R The two memories are used for calculating the differential signal, and the influence of non-ideal factors such as device process fluctuation on the calculation accuracy is restrained.
According to the embodiment of the disclosure, when each sub-computing array works in the coupling computing mode, one computing unit in each sub-computing array is selected, and the word line WL of the computing unit is switched in the second bias voltage VDD; one of bit line BL and complementary bit line BLB is connected to second read voltage V read The other floating, i.e. bit line BL (or complementary bit line BLB) is supplied with a second reading voltage V read The adjacent complementary bitline BLB (or bitline BL) floats; the source line SL connected to the computing unit floats; the remaining unselected word lines WL in the sub-compute array are grounded, and bit lines BL are left floating.
In the coupled calculation mode, the selected calculation unit determines the voltage of the source line SL according to the resistance of the memory, i.e., the gate voltage of the third transistor 15 is transient V read Or 0. Specifically, as shown in FIG. 4a, the bit line BL is connected to the second reading voltage V read The adjacent BLB is floating when the first memory R is L When the resistance value of the third transistor 15 is high, the voltage on the gate of the third transistor is transient 0; when the first memory R L When the resistance value of the third transistor 15 is in the low configuration, the voltage on the gate of the third transistor is transient V read . The drain of the fourth transistor 16 is precharged to the second bias voltage VDD if and only if the gate voltage of the third transistor 15 is V read When the gate voltage of the fourth transistor 16 is the input IN (which is equal to the second bias voltage VDD), the fourth transistor 16 is turned on, and the calculation result of the selected calculation unit is selected to be closed by the CMOS switch 17 and then input to the subsequent ADC through the calculation line TBL.
The coding of the selected sub-compute array in the coupled compute mode is shown in table 3 below:
TABLE 31T 1R coupled computational encoding scheme
Figure BDA0003619378760000101
Wherein, if and only if the magnitude of the input IN voltage is the first bias voltage VDD, it is coded as 1; resistance value R of the first memory 12 L The code of the computing unit is LRS, the code of the computing unit is 1, the fourth transistor is in a conducting state at the moment, and the output code value is 1; otherwise the output code value is 0. Note that, in Table 3, bit line BL is connected to second read voltage V read The coding mode of the adjacent complementary bit line BLB floating space; conversely, when complementary bit line BLB receives second reading voltage V read The encoding method of the adjacent bit line BL is similar to that of the adjacent bit line BL when floating, and the embodiment of the disclosure is not described in detail again.
In the embodiment of the disclosure, when the selected computing unit implements the 1T1R coupled computing encoding mode, the second bias voltage VDD and the second reading voltage V applied in the operating mode read The input IN and source line SL vary as shown IN FIG. 4b, the bit line BL applies a brief VDD voltage, then the word line WL and the input line IN switch on the VDD voltage, the floating source line SL will be transiently coupled to V according to the resistance state of 1T1R read Near or held near 0. Specifically, the bit line BL is opposite to the high period of the word line WL and the input line IN, and the pressing time difference is IN the order of nanoseconds, such as 2ns or less. In the coupling calculation mode, transient voltage coupling is adopted for multiplication calculation, so that direct current is further reduced, and the power consumption of the circuit is reduced.
In the embodiment of the disclosure, the first bias voltage VDD and the coupling calculation mode in the partial voltage calculation modeThe second bias voltage VDD is equal in size, and the first read voltage V is in a partial voltage calculation mode read And a second read voltage V in coupled calculation mode read The magnitudes are electrically equal and the pulse voltage periods in the two modes are different, as shown in fig. 3b and fig. 4 b. Specifically, the magnitudes of the first bias voltage VDD and the second bias voltage VDD are related to the channel length of the transistor, for example, when the channel length of the transistor is 180nm, the magnitude of VDD is 1.8V; when the channel length is 28nm, the VDD is 0.9V; when the channel length is 10 nm-20 nm, the VDD can be about 0.8V. Read voltage V read The size is preferably less than 0.5V.
It should be noted that the bias voltage VDD and the read voltage V read The size is dependent on the particular transistor and memory performance parameters, which are not limited by embodiments of the present disclosure.
Fig. 5 schematically illustrates an equivalent multiply-accumulate mode in a memory computing circuit according to an embodiment of the disclosure.
As shown in FIG. 5, the calculation line TBL is divided into a plurality of partial calculation lines by the switch GSW (i.e., CMOS switch 17) of each sub-calculation array, and a parasitic capacitance C exists between each partial calculation line and the ground para In the circuit, the accumulation can be divided into 3 stages, including: a precharge phase, a sub-compute array discharge phase, and a charge share phase.
Specifically, 1), precharge phase: PRE is grounded GND, switches GSW are closed, and all parasitic capacitances on the calculation line TBL are charged to VDD by PM 1; 2) and a sub-calculation array discharge stage: the sub-compute array discharges. PRE is connected to VDD and switches GSW are all open. Parasitic capacitance C of local calculation line corresponding to each sub-calculation array para Will discharge to GND or keep VDD unchanged because of whether the third/fourth transistor is turned on or not; 3) and a charge sharing stage: switches GSW are all closed. Calculating all parasitic capacitances C on line TBL para And performing charge sharing to finally obtain the voltage of the calculation line TBL, and converting the voltage into a multiplication and accumulation result of a digital signal after the voltage is sent to a rear-stage ADC.
In embodiments of the present disclosure, the voltage V of the post-charge-sharing calculation line TBL TBL The following relationship is satisfied:
Figure BDA0003619378760000121
wherein, C para Represents the parasitic capacitance on each computation unit, and N represents the number of sub-computation arrays on one computation line TBL, which is 64 in the structure shown in fig. 2; n represents the number of parasitic capacitances on one calculation line TBL that are not discharged, i.e., the number of sub calculation arrays whose multiplication result is 0, i.e., the multiply-accumulate result MACV is the number of multiplication results 1, i.e., N-N.
As shown in fig. 6, when MACV is 64, all parasitic capacitances on the calculation line TBL are discharged to GND, and the minimum voltage of the calculation line TBL after charge sharing is 0. As the number of sub-compute arrays that have multiplication results of 0 increases (i.e., n increases), MACV decreases and the voltage on the compute line TBL after charge sharing increases. When MACV is 0, the TBL charge-shared voltage is still VDD. As can be seen from fig. 6, the simulation results substantially fit the formula shown in fig. 5, proving that the charge sharing based accumulation circuit functions correctly. The product value can be extended to 64 compared to fig. 1B, which is sufficient to demonstrate that the charge-sharing based accumulation circuit can effectively suppress non-linear errors.
Embodiments of the present disclosure also provide a memory including the memory computing circuit based on the local multiply-bulk add structure shown above.
Embodiments of the present disclosure also provide an electronic device including the above-described memory. The electronic device may be a mobile phone, a tablet computer, a wearable device, a desktop computer, a kiosk, etc., which are not limited to the embodiments of the disclosure.
Compared with the prior art, the memory computing circuit based on the local multiplication-integral addition structure provided by the embodiment of the disclosure at least has the following beneficial effects:
(1) the memory computing circuit provided by the disclosure adds in a mode of multiplication and charge sharing based on a voltage division/coupling mode, and has no large direct current in the computing process, so that the problem of large current convergence of the traditional 1T1R circuit is solved, and the power consumption of the circuit is reduced.
(2) The voltage division mode is used for multiplication calculation, basically, the RL and RR memories are used for calculating difference signals, and the influence of non-ideal factors such as device process fluctuation on the calculation accuracy is restrained.
(3) When the memory computing circuit realizes a coupling computing mode, transient voltage coupling is adopted for multiplication, direct current is further reduced, and circuit power consumption is reduced.
(4) The memory computing circuit adopts a charge sharing mode to perform addition, so that a weighted current convergence path (the switch completes the three stages) is isolated, the nonlinear error can be effectively inhibited, and the computing parallelism is improved.
(5) And a 2T2R unit is adopted as a sub-calculation unit, and a local multiplication-integral addition circuit structure shared by a plurality of sub-arrays of the line is calculated, so that the circuit area overhead can be effectively reduced.
While the disclosure has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the disclosure can be made to the extent not expressly recited in the disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
While the disclosure has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims and their equivalents. Accordingly, the scope of the present disclosure should not be limited to the above-described embodiments, but should be defined not only by the appended claims, but also by equivalents thereof.

Claims (10)

1. A memory computing circuit based on a local multiply-bulk-add architecture, comprising:
a plurality of sub-compute arrays, a plurality of word lines, a plurality of bit lines, a plurality of complementary bit lines, and a plurality of source lines; the sub-computing array common word lines in each row are connected, and the sub-computing array common bit lines, the common complementary bit lines and the common source lines in each column are connected;
wherein each sub-compute array comprises: a plurality of computing units, a third transistor, a fourth transistor and a CMOS switch; each computing unit adopts a 2T2R structure of a common source line, and comprises: the memory comprises a first transistor, a first memory, a second transistor and a second memory; the grid electrodes of the first transistor and the second transistor are connected with a word line, and the source electrodes of the first transistor and the second transistor are connected with a source line; the drain electrode of the first transistor is connected with the first memory, and the other end of the first memory is connected with a bit line; the drain electrode of the second transistor is connected with the second memory, and the other end of the second memory is connected with a complementary bit line; the grid electrode of the third transistor is connected with a source line, the source electrode of the third transistor is grounded, and the drain electrode of the third transistor is connected with the source electrode of the fourth transistor; the gate of the fourth transistor is connected to the input line and the drain is connected to the computation line through the CMOS switch.
2. The local multiply-and-bulk-add structure-based memory computing circuit of claim 1, wherein the computation mode of each sub-computation array comprises a partial voltage computation mode and a coupled computation mode.
3. The local multiply-and-bulk-add-structure-based memory computing circuit according to claim 2, wherein when each sub-computing array operates in a voltage division computing mode, a word line of a computing unit is connected to a first bias voltage, a bit line of the computing unit is connected to a first read voltage, a complementary bit line of the computing unit is connected to ground, and the source line is floated; and grounding the rest unselected bit lines and word lines in the sub-calculation array, and floating the source lines.
4. The memory calculation circuit according to claim 3, wherein when the resistance of the first memory is much larger than that of the second transistor, the gate voltage of the third transistor is 0; when the resistance value of the first memory is far smaller than that of the second transistor, the gate voltage of the third transistor is the first reading voltage.
5. The local multiply-and-whole-add structure-based memory computing circuit according to claim 2, wherein when each sub-computing array operates in the coupled computing mode, a word line of a computing unit is connected to a second bias voltage by selecting one computing unit in each sub-computing array, and a second read voltage is connected to one of a bit line and a complementary bit line, and the other is floating and the source line is floating; and grounding the rest unselected word lines in the sub-calculation array, and floating the bit lines.
6. The local multiply-bulk-add structure-based memory computing circuit of claim 5, wherein the second bias voltage and the second read voltage are both pulsed voltages with opposite high periods.
7. The local multiply-and-whole-add structure-based memory computing circuit according to claim 1, wherein the other end of the computing line in each row of the sub-computing array is connected to an analog-to-digital converter, and the analog-to-digital converter is configured to convert the analog electrical signal output by each row of the sub-computing array into a digital signal to obtain the voltage value of the computing line.
8. The local multiply-bulk-add-structure-based memory computing circuit of claim 1, wherein the first memory and the second memory are resistive memories or phase change memories, and the third transistor and the fourth transistor are NMOS transistors.
9. A memory, comprising:
the local multiply-bulk-add-structure-based in-memory computation circuit of any of claims 1 to 8.
10. An electronic device, comprising:
the memory of claim 9.
CN202210457204.2A 2022-04-27 2022-04-27 In-memory computing circuit, memory and equipment based on local multiply-integral addition structure Pending CN114863964A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210457204.2A CN114863964A (en) 2022-04-27 2022-04-27 In-memory computing circuit, memory and equipment based on local multiply-integral addition structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210457204.2A CN114863964A (en) 2022-04-27 2022-04-27 In-memory computing circuit, memory and equipment based on local multiply-integral addition structure

Publications (1)

Publication Number Publication Date
CN114863964A true CN114863964A (en) 2022-08-05

Family

ID=82633191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210457204.2A Pending CN114863964A (en) 2022-04-27 2022-04-27 In-memory computing circuit, memory and equipment based on local multiply-integral addition structure

Country Status (1)

Country Link
CN (1) CN114863964A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115691613A (en) * 2022-12-30 2023-02-03 北京大学 Charge type memory calculation implementation method based on memristor and unit structure thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115691613A (en) * 2022-12-30 2023-02-03 北京大学 Charge type memory calculation implementation method based on memristor and unit structure thereof
CN115691613B (en) * 2022-12-30 2023-04-28 北京大学 Charge type memory internal calculation implementation method based on memristor and unit structure thereof

Similar Documents

Publication Publication Date Title
US20220262424A1 (en) Compute in memory system
US11783875B2 (en) Circuits and methods for in-memory computing
US11568223B2 (en) Neural network circuit
CN112581996A (en) Time domain memory computing array structure based on magnetic random access memory
CN115039177A (en) Low power consumption in-memory compute bit cell
CN113467751B (en) Analog domain memory internal computing array structure based on magnetic random access memory
CN113342126B (en) Reconfigurable current mirror weighting circuit based on ReRAM
CN111816232A (en) Memory computing array device based on 4-tube storage structure
CN113257300B (en) Ferroelectric capacitor-based memory device
CN111833934B (en) Storage and calculation integrated ferroelectric memory and operation method thereof
CN114743580B (en) Charge sharing memory computing device
US20230132411A1 (en) Devices, chips, and electronic equipment for computing-in-memory
Sharma et al. AND8T SRAM macro with improved linearity for multi-bit in-memory computing
CN114863964A (en) In-memory computing circuit, memory and equipment based on local multiply-integral addition structure
CN113658628B (en) Circuit for DRAM nonvolatile memory internal calculation
CN114496010A (en) Analog domain near memory computing array structure based on magnetic random access memory
US20240086708A1 (en) Sram architecture for convolutional neural network application
CN115210810A (en) In-memory computational dynamic random access memory
CN116204490A (en) 7T memory circuit and multiply-accumulate operation circuit based on low-voltage technology
US7269077B2 (en) Memory architecture of display device and memory writing method for the same
CN115691613A (en) Charge type memory calculation implementation method based on memristor and unit structure thereof
TW202303382A (en) Compute-in-memory devices, systems and methods of operation thereof
CN116670763A (en) In-memory computation bit cell with capacitively coupled write operation
Bharti et al. Compute-in-memory using 6T SRAM for a wide variety of workloads
Luo et al. A FeRAM based volatile/non-volatile dual-mode buffer memory for deep neural network training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination