CN115831170A - High-flexibility storage computing array - Google Patents

High-flexibility storage computing array Download PDF

Info

Publication number
CN115831170A
CN115831170A CN202211396106.9A CN202211396106A CN115831170A CN 115831170 A CN115831170 A CN 115831170A CN 202211396106 A CN202211396106 A CN 202211396106A CN 115831170 A CN115831170 A CN 115831170A
Authority
CN
China
Prior art keywords
array
row
input
output
multiplexer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211396106.9A
Other languages
Chinese (zh)
Inventor
蔡一茂
杨韵帆
王宗巍
鲍盛誉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202211396106.9A priority Critical patent/CN115831170A/en
Publication of CN115831170A publication Critical patent/CN115831170A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Analogue/Digital Conversion (AREA)

Abstract

The invention provides a high-flexibility memory computing array, and belongs to the technical field of semiconductor non-volatile memories and memory computing. The word line direction of the 1T1R array is vertical to the input direction, the word line drives the word line used for controlling the 1T1R array to input power supply voltage or ground voltage so as to open or close a row of word lines, the input unit is provided with an input register, a voltage multiplexer and a row multiplexer, the voltage multiplexer selects one of a plurality of voltages generated by a linear voltage stabilizer according to the value of the input register as an input, the row multiplexer is connected with a source line of the 1T1R array, and a bit line of the 1T1R array is connected to a clamping circuit and an analog-to-digital converter through the row multiplexer in the output unit. The high-flexibility memory calculation array provided by the invention can save unnecessary power consumption, can realize multi-value input without a digital-to-analog converter, improves the calculation speed, and reduces the array starting times and the power consumption caused by the array starting times.

Description

High-flexibility storage computing array
Technical Field
The invention belongs to the technical field of Non-volatile Memory (Non-volatile Memory) and Memory computing (computer-In-Memory) In semiconductors (semiconductors) and CMOS (complementary metal oxide Semiconductor) ultra large scale integrated circuits (ULSI), and particularly relates to an array structure for performing Vector Matrix Multiplication (Vector Matrix Multiplication) by using a Non-volatile Memory array.
Background
With the development of artificial intelligence and deep learning technology, artificial neural networks are widely applied in the fields of natural language processing, image recognition, automatic driving, graph neural networks and the like. However, the increasing network size results in a large amount of energy being consumed in the transfer of data between memory and traditional computing devices such as CPUs and GPUs, which is known as the von neumann bottleneck. The calculation occupying the most dominant part in the artificial neural network algorithm is Vector Matrix Multiplication (Vector Matrix Multiplication). Memory-In-Memory (computer-In-Memory) based on Non-volatile Memory (Non-volatile Memory), weights are stored In Non-volatile Memory cells, and analog vector matrix multiplication is performed In an array, so that frequent transfer of data between the Memory and the computing cells is avoided, and the method is considered to be a promising approach for solving the von neumann bottleneck.
FIG. 1 is a schematic diagram of vector matrix multiplication based on an array of non-volatile devices. After the weight value is written into the nonvolatile memory device such as RRAM, PCRAM, MRAM, etc., the weight value is stored on the conductance value of the device. The device is organized into an array form, voltage is input from one end of the device to serve as input of vector matrix multiplication, calculation is conducted on the array through ohm law and kirchhoff law, and current obtained at the other end of the array is a summation result of the vector matrix multiplication. The device cells in the array may use 1R devices or 1T1R devices. The input may be a multi-value voltage input through a digital-to-analog converter (DAC), or a binary voltage input through a Buffer (Buffer). The summed result is typically read out using an analog-to-digital converter (ADC). Since the analog-to-digital converter area does not match the array cell area in length, a Multiplexer (MUX) is typically used to allow multiple columns in the array to share one analog-to-digital converter.
Since 1T1R avoids the write crosstalk problem, 1T1R devices are often used in larger arrays. In the convention nomenclature, the Line connecting the gates of the transistors is a Word Line (WL), the Line connecting the sources of the transistors is a Source Line (SL), and the Line connecting one end of the device is a Bit Line (BL). The conventional 1T1R array structure for vector matrix multiplication is shown in fig. 2 (a) or (b). In fig. 2 (a), all transistors are turned on by the word line, a voltage is input from the source line, and a sum of currents is read from the bit line. In fig. 2 (b), the same read voltage is input on the source line, and the on and off of a row controlled by the word line represents the input of "1" or "0", and the sum of the currents is still read from the bit line. The same feature of the two above array structures and other common array structures is that the word lines WL are parallel to the input direction, which is a design used for conventional memory array structures.
However, the conventional array structure has two problems in implementing in-memory computation: 1. if a multi-value voltage is input using a digital-to-analog converter, there are problems that the area of the digital-to-analog converter does not match the area of the array cell in the layout height, and one digital-to-analog converter is used for each row, which causes high consumption power. If a buffer is used to input a binary voltage, high-precision input can only be represented by inputting a pulse sequence for multiple times, which increases the delay of calculation, and the array calculation needs to be started for multi-bit input for multiple times, which increases the array starting times and the working times of an analog-to-digital converter, and also increases the power consumption. 2. In the conventional design in which the word lines are parallel to the input direction, when the physical matrix size is larger than the matrix size required by actual calculation, the current of devices on unused columns cannot be turned off, which results in waste of power consumption.
Disclosure of Invention
In view of the above problems, the present invention provides a high flexibility storage compute array.
The technical scheme provided by the invention is as follows:
the memory computing array with high flexibility is characterized by comprising a 1T1R array, wherein a plurality of rows of the 1T1R array are divided into a row section, a plurality of columns are divided into a column section, each row section corresponds to one input unit, each column section corresponds to one output unit, each 1T1R unit consists of an MOS (metal oxide semiconductor) tube and a non-volatile memory device, the grid electrode of the MOS tube is connected with a word line, the source electrode of the MOS tube is connected with a source line, the drain electrode of the MOS tube is connected with one end of the non-volatile memory device, one end of the non-volatile memory device is connected with the drain electrode of the MOS tube, and the other end of the non-volatile memory device is connected with a bit line. One source line is connected with the source electrodes of the MOS tubes of all the units in one row of the array and is parallel to the input direction; one bit line is connected with the nonvolatile devices of all the units in one row of the array and is vertical to the input direction; the peripheral circuit of the 1T1R array comprises a word line driver, an input unit, an output unit, a linear voltage stabilizer and a control module, wherein the gate electrode of the MOS tube of all the units in one column in the array is connected with the word line driver, the linear voltage stabilizer and the control module are used for gating one corresponding row and one corresponding column, floating the input of the unselected row, and inputting the word line corresponding to the unselected column into ground voltage to close the transistor.
Further, the word line driver controls the word line of the 1T1R array to input a power supply voltage or a ground voltage to turn on or off a column of word lines.
Further, the input unit comprises an input register, a voltage multiplexer and a line multiplexer, wherein the input register consists of (a + 1) D triggers, and (a + 1) bit scan chains are input into the input register of the input unit above from the input unit at the lowest position in sequence under the control of a clock signal; the first a outputs of the input register are voltage multiplexer decoding signals and are connected to an a-A decoder in the voltage multiplexer, and the a +1 th output is a row multiplexer enabling signal and is connected to an AND gate of the row multiplexer. The voltage multiplexer comprises an a-A decoder and A transmission gates; the a-A decoder converts a decoded signals output by the input register into A-bit one-hot code output, i.e. only one bit of the A-bit output is high level, the other bits are low level, so as to open one of the transmission gates, and the quantity relationship isA=2 a (ii) a The voltage multiplexer selects one of A voltages output by the LDO and sends the selected one to the row multiplexer, and the row multiplexer comprises a B-B decoder, B AND gates and B transmission gates; B-B decoder converts the B-bit row decoding signal into B-bit one-hot code output, i.e. only one bit of B-bit output is high level, the rest bits are low level, and connected to B two-input AND gates, and the quantity relation is B =2 b (ii) a And the other input of the B two-input AND gates is a row multiplexer enabling signal output by the input register, when the enabling signal is in a high level, the output of the AND gate is the same as the output of the B-B decoder, one of the transmission gates is selectively opened, one of the B row source lines is selected to be connected with the output of the voltage multiplexer, the other source lines are in a floating state, when the enabling signal is in a low level, the output of the AND gate is all low, all the transmission gates are closed, and the B row source lines are all in the floating state.
Further, the output unit structure comprises a column multiplexer, a clamping circuit and an analog-to-digital converter, wherein the column multiplexer comprises a C-C decoder and C transmission gates. The C-C decoder converts the C-bit row decoding signal into C-bit one-hot code output, namely only one bit of the C-bit output is high level, the rest bits are low level, and the number relation is C =2 c (ii) a One of the transmission gates is selectively opened by the output of the decoder, one of the C column bit lines is selected to be connected with the clamping circuit, and the rest bit lines are in a floating state; the clamp circuit uses an operational amplifier OP and a feedback resistor R f The selected bit line is clamped at a reference potential, and the output voltage of the operational amplifier is equal to the multiplication result of the vector matrix. The output voltage is stored on a capacitor C1 by a sampling hold circuit consisting of a transistor N1 and the capacitor C1, and finally an input analog-to-digital converter is converted into an x-bit digital signal, wherein x is the design precision of the analog-to-digital converter, and the analog-to-digital converter adopts a Flash-ADC or SAR-DAC structure.
Further, the input unit uniformly generates a plurality of input voltages required for calculation using a linear regulator, the voltage multiplexer is used for inputting the voltages to the row multiplexer, the row multiplexer is connected with the source line of the 1T1R array, and the bit line of the 1T1R array is connected to an analog-to-digital converter through the column multiplexer.
Compared with the traditional array structure, the invention mainly comprises three characteristics.
The first feature is to change the word line direction from parallel to the input to perpendicular to the input. The change brings two benefits: one is to keep the unselected row inputs floating to shield one row from affecting the computation of the rest of the array. Secondly, the transistors can be closed by inputting low voltage into the word line, so that the effect of shielding a row is achieved, and the calculation of the rest part of the array is not influenced. The proposed design can therefore shield any number of rows and columns in the array, and any subarray in a large array can be selected for computation without increased power consumption due to current flow through the shielded devices.
The second feature is the addition of a row multiplexer within the input unit. A row multiplexer selects one of the rows in the plurality of arrays to be connected to the voltage multiplexer. The reason for this design is that even though the use of voltage multiplexers instead of digital-to-analog converters can reduce the layout height mismatch problem, the height of multiple rows is still required to match one voltage multiplexer.
The third feature is to exchange the digital-to-analog converter or buffer of the input unit for a common linear regulator. The linear voltage stabilizer generates various input voltages required by calculation in a unified manner, and inputs the voltages to all input units simultaneously, and one of the voltages is selected by the input units through the voltage multiplexer and is connected to the array. The problem of layout mismatching caused by using one digital-to-analog converter input for each row and the problem of high consumption power of the digital-to-analog converter are solved. Meanwhile, the advantages brought by multi-value voltage input can be utilized, multi-bit input is realized in a mode of starting the array for many times without a mode of inputting binary voltage by a buffer, and the calculation delay and the working times of the analog-to-digital converter are reduced.
Drawings
FIG. 1 is a schematic diagram of a matrix multiplication based on an array of non-volatile devices;
FIG. 2 is a schematic diagram of two conventional 1T1R arrays for vector matrix multiplication;
FIG. 3 is a schematic diagram of an array structure and peripheral circuits according to the present invention;
FIG. 4 is a circuit diagram of an input unit according to the present invention;
FIG. 5 is a circuit diagram of an output unit used in the present invention;
FIG. 6 is a schematic diagram of the operation state of the array structure selection logic array according to the present invention;
fig. 7 is a schematic diagram illustrating an operation state of the array structure selection logic sub-array according to the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and the specific examples.
Referring to fig. 3, the memory computing array and its peripheral circuit design with high flexibility of the present invention includes a 1T1R array, a word line driver, an input unit, an output unit, and a control module. The 1T1R array is composed of 1T1R units, each 1T1R unit is composed of an MOS tube and a non-volatile memory device, a grid electrode of the MOS tube is connected with a word line, a source electrode of the MOS tube is connected with a source line, a drain electrode of the MOS tube is connected with one end of the non-volatile memory device, one end of the non-volatile memory device is connected with a drain electrode of the MOS tube, and the other end of the non-volatile memory device is connected with a bit line. One source line is connected with the source electrodes of the MOS tubes of all the units in one row of the array and is parallel to the input direction; one bit line is connected with the nonvolatile devices of all the units in one row of the array and is vertical to the input direction; one word line is connected with the grid electrodes of the MOS tubes of all the units in one row in the array and is vertical to the input direction. A word line driver is provided for controlling a word line of the 1T1R array to input a power supply voltage or a ground voltage to turn on or off a column of word lines. On the layout, a plurality of rows of the 1T1R array are divided into a row section, a plurality of columns are divided into a column section, each row section corresponds to one input unit, and each column section corresponds to one output unit. As shown in the right layout, the 1T1R array is divided into m row segments corresponding to m input cells in the layout design, while the 1T1R array is divided into n column segments corresponding to n output cells in the layout design.
And the input unit replaces a DAC (digital-to-analog converter) to realize a multi-value input function. The input unit circuit is shown in FIG. 4 and includes an input register and a voltage multiplexerAnd a row multiplexer. The input register is composed of (a + 1) D triggers, and the (a + 1) bit scan chain is input into the input register of the upper input unit which is scanned from the lowest input unit in sequence under the control of the clock signal. The first a outputs of the input register are voltage multiplexer decode signals connected to the a-A decoder in the voltage multiplexer, and the (a + 1) th output is a row multiplexer enable signal connected to the AND gate of the row multiplexer. The voltage multiplexer includes an a-A decoder and A transmission gates. The a-A decoder converts a decoded signals output by the input register into A-bit one-hot code output, namely, only one bit of the A-bit output is high level, and the rest bits are low level to open one of the transmission gates, wherein the number relation is A =2 a . The voltage multiplexer selects one of A voltages output by the LDO and sends the selected one to the row multiplexer. The row multiplexer includes a B-B decoder, B and gates and B transmission gates. The B-B decoder converts the B-bit row decoding signal into B-bit one-hot code output, namely, only one bit of the B-bit output is high level, the rest bits are low level, and the B-bit output is connected to B two-input AND gates, and the number relation is B =2 b . And the other input of the B two-input AND gates is a row multiplexer enabling signal output by the input register, when the enabling signal is in a high level, the output of the AND gate is the same as the output of the B-B decoder, one of the transmission gates is selectively opened, one of the B row source lines is selected to be connected with the output of the voltage multiplexer, the other source lines are in a floating state, when the enabling signal is in a low level, the output of the AND gate is all low, all the transmission gates are closed, and the B row source lines are all in the floating state.
The output cell circuit is shown in fig. 5. The output unit structure comprises a column multiplexer, a clamping circuit and an analog-to-digital converter. The column multiplexer includes a C-C decoder and C transmission gates. The C-C decoder converts the C-bit row decoding signal into C-bit one-hot code output, namely only one bit of the C-bit output is high level, the rest bits are low level, and the number relation is C =2 c . One of the transmission gates is selectively opened by the output of the decoder, one of the C column bit lines is selected to be connected with the clamping circuit, and the rest bit lines are in a floating state. The clamp circuit uses an operational amplifierOP and feedback resistor R f The selected bit line is clamped at a reference potential, and the output voltage of the operational amplifier is equal to the multiplication result of the vector matrix. The output voltage is stored on the capacitor C1 by a sampling hold circuit formed by a transistor N1 and the capacitor C1, and finally the input analog-to-digital converter is converted into an x-bit digital signal, wherein x is the design precision of the analog-to-digital converter, and the analog-to-digital converter can use a Flash-ADC or SAR-DAC structure.
One row multiplexer gates only one source line on a corresponding row, one column multiplexer gates only one bit line on a corresponding column, and the source lines in the unselected rows and the bit lines in the unselected columns are all kept floating. Compared with the column multiplexer, the row multiplexer increases the function of floating all corresponding source lines so as to realize the flexible sub-array selection function. Assuming that a physical array includes m row segments and n column segments, each row segment includes B rows of memory cells and each column segment includes C columns of memory cells, a physical array may be divided into B × C logic arrays, and each logic array includes m rows × n columns of matrix size. In the logic array, any row and any column can be selected to form a logic sub-array for calculation, and no power consumption is generated on all the unselected memory units.
FIG. 6 is a schematic diagram illustrating the operation of the array structure selection logic array according to the present invention. In fig. 6, the size of the 1T1R physical array is exemplified by 6 rows by 6 columns, the physical array is divided into three row segments, each row segment includes two rows of devices, the physical array is divided into three column segments, and each column segment includes two columns of devices. Each row segment corresponds to one input unit and each column segment corresponds to one output unit. In a vector matrix multiplication calculation, a row multiplexer in one input unit can only pass one row in one row section, and a column multiplexer in one output unit can only pass one column in one column section. All the gated devices on the rows and columns form an Array for this vector matrix multiplication, called a logic Array (Logical Array). In contrast, the entire Array is referred to as a Physical Array. The physical array is divided into 4 logical arrays, each logical array comprising 3 rows by 3 columns of memory cells. Only one of the logic arrays can be selected for calculation at a time of vector matrix multiplication. As shown in fig. 6, the row decoding signal is 0, all row decoders select rows 1, 3, and 5 to form an array, the column decoding signal is 0, all column decoders select rows 1, 3, and 5 to form an array, and the selected 3 rows × 3 columns of devices form a logic array of this computation, as shown in red in the figure. When the column decoder selects the 1, 3, and 5 columns, the word line driving simultaneously inputs the word line voltages of the 1, 3, and 5 columns to the power supply Voltage (VDD), and inputs the remaining word lines to the ground voltage (GND). The unselected cells can be divided into two categories, (1) cells in unselected columns, (2) cells in unselected rows on selected columns. Analysis can show that (1) in the unselected columns, because the gates of the transistors are all grounded, the path of the 1T1R device between the source line and the bit line is closed, and no current flows. (2) In the unselected row on the selected column, the transistor gate is connected to the power supply voltage, the transistor is turned on, the bit line is clamped to the ground level, and the source line is floating, so that no current flows. This means that each time the vector matrix multiplication is performed, one of the logic arrays can be selected for calculation, and no power consumption is generated on unselected memory cells. In the input module, input data is stored in an input register of each input unit through a scan chain, a voltage multiplexer decoding signal is generated, and one of a plurality of input voltages generated by the linear voltage stabilizer is selected and sent to the row multiplexer to replace the function of the digital-to-analog converter. The bit line current of the selected column in the output module is converted into the voltage on the capacitor through the clamping circuit, and finally the voltage is converted into a digital signal by the ADC for reading.
Fig. 7 is a diagram illustrating a part of a logic Sub Array (Logical Sub Array) in a logic Array is selected for calculation. A part of rows and columns in the logic array can be arbitrarily selected to form a logic sub-array to carry out vector matrix multiplication. As shown in fig. 7 for example, the row corresponding to the row multiplexer 3 that is closed, and the column corresponding to the column multiplexer 3 that is closed are selected, and the rows and columns corresponding to the row multiplexers 1 and 2 and the column multiplexers 1 and 2 are formed into a logical subarray of size 2 rows by 2 columns. The method is that the enabling signals of the row multi-path selectors in the input units 3 are set to be low through the scan chains to close all AND gates, all source lines corresponding to the input units 3 are enabled to be floating, meanwhile, word line driving is carried out to set word lines of all storage units corresponding to the output units 3 to be at the ground level, and all columns corresponding to the output units 3 are closed. As shown, the unselected cells can be divided into two categories, (1) cells in unselected columns, (2) cells in unselected rows on selected columns. Analysis can show that (1) in the unselected columns, because the gates of the transistors are all grounded, the path of the 1T1R device between the source line and the bit line is closed, and no current flows. (2) In the unselected row on the selected column, the gate of the transistor is connected to the power supply voltage, the transistor is turned on, the bit line is clamped to the ground level, and the source line is floated, so that no current flows. This means that a part of rows and columns in the logic array can be arbitrarily selected to form a logic sub-array for vector matrix multiplication, and no power consumption is generated on unselected memory cells. The devices in the selected columns and rows are computed in the same manner as in fig. 6.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims (6)

1. The memory computing array with high flexibility is characterized by comprising a 1T1R array, wherein a plurality of rows of the 1T1R array are divided into a row section, a plurality of columns are divided into a column section, each row section corresponds to one input unit, each column section corresponds to one output unit, each 1T1R unit consists of an MOS (metal oxide semiconductor) tube and a non-volatile memory device, the grid electrode of the MOS tube is connected with a word line, the source electrode of the MOS tube is connected with a source line, the drain electrode of the MOS tube is connected with one end of the non-volatile memory device, one end of the non-volatile memory device is connected with the drain electrode of the MOS tube, and the other end of the non-volatile memory device is connected with a bit line. One source line is connected with the source electrodes of the MOS tubes of all the units in one row of the array and is parallel to the input direction; one bit line is connected with the nonvolatile devices of all the units in one row of the array and is vertical to the input direction; the peripheral circuit of the 1T1R array comprises a word line drive, an input unit, an output unit, a linear voltage stabilizer and a control module, wherein the word line drive, the input unit, the output unit, the linear voltage stabilizer and the control module are used for gating a corresponding row and a corresponding column, floating the input of the unselected row and inputting the word line corresponding to the unselected column into ground voltage to close a transistor, if the 1T1R array is divided into m row sections, the layout design corresponds to the m input units, the 1T1R array is divided into n column sections, the layout design corresponds to the n output units, each row section comprises B row storage units, each column section comprises C column storage units, a physical array is divided into B row and C column logic arrays, the matrix size of each logic array is m row and n column, one logic array is selected for calculation by performing vector matrix multiplication every time, no power consumption is generated on the unselected storage units, and any row and any column in the logic array are selected to form a logic sub array for calculation.
2. The high flexibility memory compute array of claim 1, wherein the wordline driver is to control a wordline input supply voltage or ground voltage of a 1T1R array to turn on or off a column of wordlines.
3. The high flexibility memory computing array of claim 1, wherein the input cells comprise an input register, a voltage multiplexer and a row multiplexer, the input register is composed of (a + 1) D flip-flops, and (a + 1) bit scan chains are input into the input register of the upper input cell, and are sequentially scanned from the lowest input cell under the control of a clock signal; the first a outputs of the input register are voltage multiplexer decoding signals and are connected to an a-A decoder in the voltage multiplexer, and the a +1 th output is a row multiplexer enabling signal and is connected to an AND gate of the row multiplexer. The voltage multiplexer comprises an a-A decoder and A transmission gates; the a-A decoder converts a decoded signals output by the input register into A-bit one-hot code output, i.e. only one bit of the A-bit output is high level, and the rest bits are low level to open one of the A-bit outputA transmission gate with a number relation of A =2 a (ii) a The voltage multiplexer selects one of A voltages output by the LDO and sends the selected one to the row multiplexer, and the row multiplexer comprises a B-B decoder, B AND gates and B transmission gates; the B-B decoder converts the B-bit row decoding signal into B-bit one-hot code output, namely, only one bit of the B-bit output is high level, the rest bits are low level, and the B-bit output is connected to B two-input AND gates, and the number relation is B =2 b (ii) a And the other input of the B two-input AND gates is a row multiplexer enabling signal output by the input register, when the enabling signal is in a high level, the output of the AND gate is the same as the output of the B-B decoder, one of the transmission gates is selectively opened, one of the B row source lines is selected to be connected with the output of the voltage multiplexer, the other source lines are in a floating state, when the enabling signal is in a low level, the output of the AND gate is all low, all the transmission gates are closed, and the B row source lines are all in the floating state.
4. The memory compute array of claim 1, wherein the output cell structure comprises a column multiplexer, a clamp circuit and an analog-to-digital converter, the column multiplexer comprising a C-C decoder and C transmission gates. The C-C decoder converts the C-bit row decoding signal into C-bit one-hot code output, namely only one bit of the C-bit output is high level, the rest bits are low level, and the number relation is C =2 c (ii) a One of the transmission gates is selectively opened by the output of the decoder, one of the C column bit lines is selected to be connected with the clamping circuit, and the rest bit lines are in a floating state; the clamp circuit uses an operational amplifier OP and a feedback resistor R f The selected bit line is clamped at a reference potential, the output voltage of the operational amplifier is equal to the multiplication result of the vector matrix at the moment, the output voltage is stored on the capacitor C1 by a sampling and holding circuit consisting of the transistor N1 and the capacitor C1, and finally the output voltage is input into the analog-to-digital converter to be converted into an x-bit digital signal, wherein x is the design precision of the analog-to-digital converter.
5. The high flexibility in-memory computing array of claim 4, wherein the analog-to-digital converter employs a Flash-ADC or SAR-DAC architecture.
6. The memory computing array of claim 3, wherein the input unit uniformly generates a plurality of input voltages required for computing using a linear regulator, the voltage multiplexer is used for inputting the voltages to the row multiplexer, the row multiplexer is connected to the source line of the 1T1R array, and the bit line of the 1T1R array is connected to an analog-to-digital converter through the column multiplexer.
CN202211396106.9A 2022-11-09 2022-11-09 High-flexibility storage computing array Pending CN115831170A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211396106.9A CN115831170A (en) 2022-11-09 2022-11-09 High-flexibility storage computing array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211396106.9A CN115831170A (en) 2022-11-09 2022-11-09 High-flexibility storage computing array

Publications (1)

Publication Number Publication Date
CN115831170A true CN115831170A (en) 2023-03-21

Family

ID=85527278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211396106.9A Pending CN115831170A (en) 2022-11-09 2022-11-09 High-flexibility storage computing array

Country Status (1)

Country Link
CN (1) CN115831170A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116386687A (en) * 2023-04-07 2023-07-04 北京大学 Memory array for balancing voltage drop influence

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116386687A (en) * 2023-04-07 2023-07-04 北京大学 Memory array for balancing voltage drop influence
CN116386687B (en) * 2023-04-07 2024-03-19 北京大学 Memory array for balancing voltage drop influence

Similar Documents

Publication Publication Date Title
US11132176B2 (en) Non-volatile computing method in flash memory
US10552510B2 (en) Vector-by-matrix multiplier modules based on non-volatile 2D and 3D memory arrays
KR102653629B1 (en) Analog neural memory system for deep learning neural networks with multiple vector matrix multiplication arrays and shared components
KR100276201B1 (en) Nonvolatile Semiconductor Memory and Method of Use thereof
Lee et al. High-density and highly-reliable binary neural networks using NAND flash memory cells as synaptic devices
WO2021076182A1 (en) Accelerating sparse matrix multiplication in storage class memory-based convolutional neural network inference
US20210020232A1 (en) Temperature effect compensation in memory arrays
CN113467751B (en) Analog domain memory internal computing array structure based on magnetic random access memory
Li et al. A 40-nm MLC-RRAM compute-in-memory macro with sparsity control, on-chip write-verify, and temperature-independent ADC references
CN115831170A (en) High-flexibility storage computing array
CN114400031B (en) Complement mapping RRAM (resistive random access memory) storage and calculation integrated chip and electronic equipment
US6292398B1 (en) Method for the in-writing verification of the threshold value in non-volatile memories
CN115794728A (en) Memory computing bit line clamping and summing peripheral circuit and application thereof
Choi et al. An in-flash binary neural network accelerator with SLC NAND flash array
US7023738B2 (en) Full-swing wordline driving circuit
CN114496010A (en) Analog domain near memory computing array structure based on magnetic random access memory
KR102630992B1 (en) Word line and control gate line tandem decoder for analog neural memory in deep learning artificial neural networks.
WO2022212282A1 (en) Compute-in-memory devices, systems and methods of operation thereof
US6028793A (en) High voltage driver circuit for a decoding circuit in multilevel non-volatile memory devices
JP4290618B2 (en) Nonvolatile memory and operation method thereof
Zhang et al. XMA: a crossbar-aware multi-task adaption framework via shift-based mask learning method
US20230410862A1 (en) In-memory computation circuit using static random access memory (sram) array segmentation
US20230386565A1 (en) In-memory computation circuit using static random access memory (sram) array segmentation and local compute tile read based on weighted current
CN112133342B (en) Memory device
Kim et al. ReRAM-based processing-in-memory (PIM)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination