CN114999544A - Memory computing circuit based on SRAM - Google Patents

Memory computing circuit based on SRAM Download PDF

Info

Publication number
CN114999544A
CN114999544A CN202210585976.4A CN202210585976A CN114999544A CN 114999544 A CN114999544 A CN 114999544A CN 202210585976 A CN202210585976 A CN 202210585976A CN 114999544 A CN114999544 A CN 114999544A
Authority
CN
China
Prior art keywords
input end
sram
memory
data
nmos tube
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210585976.4A
Other languages
Chinese (zh)
Inventor
贺雅娟
张振伟
骆宏阳
王梓霖
张波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210585976.4A priority Critical patent/CN114999544A/en
Publication of CN114999544A publication Critical patent/CN114999544A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/41Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming static cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/41Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming static cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger
    • G11C11/413Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention belongs to the technical field of integrated circuits, and particularly relates to a memory computing circuit based on an SRAM. The invention provides an SRAM (static random access memory) based memory computing circuit, which comprises a 6T SRAM Cell array and a multiplexing computing unit, wherein the memory computing circuit writes weight data into an SRAM in advance for storage, and then performs multiply-accumulate operation on the multiplexing computing unit and input data to realize the memory computing. The memory computing unit provided by the invention increases the area of the memory, but enables data to be computed in the memory, and obviously reduces the data carrying capacity and power consumption of applications such as a convolutional neural network.

Description

Memory computing circuit based on SRAM
Technical Field
The invention belongs to the technical field of integrated circuits, and relates to a memory computing circuit based on an SRAM.
Background
With the rapid development and large-scale application of artificial intelligence technology and internet of things technology, the data volume required to be processed by the neural network algorithm is larger and larger. In a computer based on a traditional von Neumann architecture, a memory and a Central Processing Unit (CPU) are separated from each other, in the calculation process, the memory firstly transmits data into the CPU through a bus for calculation, and after the calculation is finished, a calculation result is written back into the memory through the bus. The von neumann architecture is dominant when the speed of the memory and bus are matched. However, in recent decades, the memory has been developed towards a pyramid hierarchy, i.e., the farther the memory is from the CPU, the greater the capacity, the cheaper the memory, but the slower the memory, and the speed of the CPU has been increasing at a high rate following moore's law. This also makes the access speed of the Memory far lower than the data processing speed of the CPU, which is a "Memory Wall" limiting the performance of the computer, and is also recognized as the biggest bottleneck in the von neumann architecture. This phenomenon is more evident in applications requiring high frequency data reading and writing and transmission, such as deep neural networks. The deep neural network, as the most widely applied algorithm in artificial intelligence image recognition at present, needs to perform access, multiplication and accumulation operations on a large amount of image data. The method is limited by limited transmission speed and bandwidth of a system bus and a memory, and the reasoning speed and the application range of the algorithm of the deep neural network are limited by the factors.
To break through this bottleneck of von neumann architecture, a Computing-in-Memory (CIM) architecture is proposed. As the name implies, in-memory computation is the performance of some computation in memory, such as a multiplication, exclusive nor and accumulation operation. Therefore, a large amount of data in the memory can be calculated without being transmitted to the interior of the CPU through a system bus, the result can be calculated by relying on some calculating units and control circuits in the memory, and the result is output only through the system bus, so that the bus interaction between the CPU and the memory is greatly reduced, the problems of unmatched CPU speed and memory speed and data access power consumption are solved, and the problems of transmission bandwidth and delay limitation on the system bus are solved.
Disclosure of Invention
The invention aims to provide a memory computing unit, which realizes the multiply-accumulate operation of data by introducing a computing unit circuit into an SRAM memory, avoids the defect that the data can be taken out of the memory through a bus and then sent to a CPU through the bus for computation in a von Neumann architecture, and can effectively reduce the data transportation amount and the circuit power consumption generated by the data transportation amount.
In order to achieve the above object, the present invention discloses an SRAM-based memory Computing unit, which includes a 6T SRAM Cell array and a multiplexing Computing unit (multiplexing Computing Cell). Before calculation, the weight data to be calculated is written into the 6T SRAM Cell through the SRAM peripheral circuit for storage. When the device works in the memory computing mode, input data are pre-coded, stored weight data are read out at the same time, and a multiplexing computing unit is used for performing multiply-accumulate operation so as to realize the memory computing function. The multiplexing calculation unit is that each 16 6T SRAM cells multiplex the same calculation unit in a time-sharing mode, only 1 6T Cell is gated in each calculation, and compared with the scheme that each 6T Cell needs one calculation unit, the time-sharing multiplexing strategy greatly reduces the area of a memory calculation circuit and is more suitable for edge calculation equipment.
The technical scheme of the invention is as follows:
an SRAM-based memory compute circuit, the memory compute unit comprising a 6T SRAM Cell array of 16 rows and 64 columns and a 64 column multiplexing compute unit; the 6T SRAM cells of each column are connected with column read-write signal lines BLP and BLN, the 6T SRAM cells of each row are connected with read-write signal lines WL, input data are divided into a first input end, a second input end and a third input end after being coded and connected with a multiplexing calculation unit, and the multiplexing calculation unit obtains weight data through the signal lines BLP and BLN; specifically, each multiplexing calculation unit comprises a first NMOS transistor, a second NMOS transistor, a third NMOS transistor and a fourth NMOS transistor, wherein the grid electrode of the first NMOS transistor is connected with a signal line BLP, and the drain electrode of the first NMOS transistor is connected with the source electrode of the second NMOS transistor; the grid of the second NMOS tube is connected with the first input end, and the drain of the second NMOS tube is connected with the second input end; the grid electrode of the third NMOS tube is connected with the signal line BLN, and the drain electrode of the third NMOS tube is connected with the source electrode of the fourth NMOS tube; the grid electrode of the fourth NMOS tube is connected with the first input end, and the drain electrode of the fourth NMOS tube is connected with the third input end; the source electrode of the first NMOS tube and the source electrode of the third NMOS tube are connected to be used as output ends;
the input data are converted into voltage signals of a first input end, a second input end and a third input end after being coded, and the method specifically comprises the following steps: when the input data is 1, the first input end is the power supply voltage, the second input end is the power supply voltage and the third input end is the 0 voltage after the coding; when the input data is 0, the voltage signals of the first input end, the second input end and the third input end are all 0 after being coded; when the input data is-1, the first input end is the power supply voltage after the coding, the second input end is the 0 voltage, and the third input end is the power supply voltage.
The invention has the beneficial effects that: the invention provides an SRAM (static random access memory) based memory computing unit, which realizes the memory computing by multiplexing a computing unit, so that data can be computed in a memory, and the data carrying capacity and power consumption of applications such as a convolutional neural network can be remarkably reduced.
Drawings
FIG. 1 is a diagram illustrating the overall circuit architecture of the present invention.
FIG. 2 is a schematic diagram of encoding an input according to the present invention.
FIG. 3 is a schematic diagram of the calculation process of the present invention.
FIG. 4 is a diagram illustrating the calculation results of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention will become more apparent, a detailed and complete description of the embodiments of the present invention will be given below with reference to the accompanying drawings.
As shown in FIG. 1, the present invention is mainly composed of an SRAM array and a multiplexing calculating unit. Before calculation, the weight data to be calculated is written into the 6T SRAM Cell through the SRAM peripheral circuit for storage. When the circuit is operating in the memory computing mode, the input encoding circuit encodes the input data as shown in FIG. 2, i.e., different inputs correspond to different voltages on INV and INS _ N, INS _ P. Meanwhile, the same calculation unit is multiplexed by every 16 SRAM cells in a time-sharing mode, namely, only one of every 16 6T SRAMs is started to calculate in each clock cycle. Compared with a scheme that each 6T Cell needs one computing unit, the time division multiplexing strategy greatly reduces the area of an in-memory computing circuit and is more suitable for edge computing equipment.
The specific calculation process is shown in fig. 3, first, a 6T Cell is gated, the weight data is reflected on BLP and BLN, if the weight stored data is +1, the BLP voltage is VDD, and the BLN voltage is 0; if the weight stores data as-1, the BLP voltage is 0 and the BLN upper voltage is VDD. The voltages of the input data after being encoded are expressed on INV, INS _ P and INS _ N, if the input is +1, the voltages of INV and INS _ P are VDD and the voltage of INS _ N is 0. Thus, the paths of N1 and N2 are turned on, and the paths of N3 and N4 are turned off, i.e., OUTPUT is charged. If the calculation result is 1, the charging current on OUTPUT is represented; if the calculation result is 0, the circuit is not conducted, namely, the circuit is not charged or discharged; if the result of the calculation is-1, it is represented as a discharge current on OUTPUT. The calculation results of the multiplexing calculation units in the same row are commonly applied to one output bit line, as shown in fig. 4.
In summary, the present invention provides an SRAM-based memory computing unit, which includes a 6T SRAM Cell array and a multiplexing computing unit. The memory calculation circuit performs memory calculation by writing weight data into an SRAM in advance and then performing calculation by multiplexing a calculation unit with input data. The memory computing unit provided by the invention increases the area of the memory, but can complete the computation of data in the memory, and can remarkably reduce the data carrying capacity and power consumption of applications such as a convolutional neural network and the like.

Claims (2)

1. An SRAM-based memory computing circuit, wherein the memory computing unit comprises a 6T SRAM Cell array with 16 rows and 64 columns of multiplexing computing units; the 6T SRAM cells of each column are connected with column read-write signal lines BLP and BLN, the 6T SRAM cells of each row are connected with read-write signal lines WL, input data are divided into a first input end, a second input end and a third input end after being coded and connected with a multiplexing calculation unit, the multiplexing calculation unit obtains weight data through the signal lines BLP and BLN, and the weight data are stored in the 6T SRAM cells through the signal lines BLP and BLN before calculation; specifically, each multiplexing calculation unit comprises a first NMOS transistor, a second NMOS transistor, a third NMOS transistor and a fourth NMOS transistor, wherein the grid electrode of the first NMOS transistor is connected with a signal line BLP, and the drain electrode of the first NMOS transistor is connected with the source electrode of the second NMOS transistor; the grid of the second NMOS tube is connected with the first input end, and the drain of the second NMOS tube is connected with the second input end; the grid electrode of the third NMOS tube is connected with the signal line BLN, and the drain electrode of the third NMOS tube is connected with the source electrode of the fourth NMOS tube; the grid electrode of the fourth NMOS tube is connected with the first input end, and the drain electrode of the fourth NMOS tube is connected with the third input end; the source electrode of the first NMOS tube and the source electrode of the third NMOS tube are connected to be used as output ends;
the input data are converted into voltage signals of a first input end, a second input end and a third input end after being coded, and the method specifically comprises the following steps: when the input data is 1, the first input end is the power supply voltage after encoding, the second input end is the power supply voltage, and the third input end is the 0 voltage; when the input data is 0, the voltage signals of the first input end, the second input end and the third input end are all 0 after being coded; when the input data is-1, the first input end is the power supply voltage after the coding, the second input end is the 0 voltage, and the third input end is the power supply voltage.
2. The SRAM-based memory computing circuit of claim 1, wherein 16 SRAM cells in each row time-multiplex the same computing unit, and only 1 6T Cell is gated in each computation.
CN202210585976.4A 2022-05-27 2022-05-27 Memory computing circuit based on SRAM Pending CN114999544A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210585976.4A CN114999544A (en) 2022-05-27 2022-05-27 Memory computing circuit based on SRAM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210585976.4A CN114999544A (en) 2022-05-27 2022-05-27 Memory computing circuit based on SRAM

Publications (1)

Publication Number Publication Date
CN114999544A true CN114999544A (en) 2022-09-02

Family

ID=83029234

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210585976.4A Pending CN114999544A (en) 2022-05-27 2022-05-27 Memory computing circuit based on SRAM

Country Status (1)

Country Link
CN (1) CN114999544A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115586885A (en) * 2022-09-30 2023-01-10 晶铁半导体技术(广东)有限公司 Memory computing unit and acceleration method
TWI822313B (en) * 2022-09-07 2023-11-11 財團法人工業技術研究院 Memory cell

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI822313B (en) * 2022-09-07 2023-11-11 財團法人工業技術研究院 Memory cell
CN115586885A (en) * 2022-09-30 2023-01-10 晶铁半导体技术(广东)有限公司 Memory computing unit and acceleration method

Similar Documents

Publication Publication Date Title
EP3754561A1 (en) Reconfigurable memory compression techniques for deep neural networks
CN110277121B (en) Multi-bit memory integrated SRAM based on substrate bias effect and implementation method
CN114999544A (en) Memory computing circuit based on SRAM
US11868877B2 (en) Arithmetic device and electronic device
CN112151091B (en) 8T SRAM unit and memory computing device
CN111816231B (en) Memory computing device with double-6T SRAM structure
CN111816234B (en) Voltage accumulation in-memory computing circuit based on SRAM bit line exclusive nor
CN111880763A (en) SRAM circuit for realizing multiplication and addition with positive and negative numbers in memory
CN110428048B (en) Binaryzation neural network accumulator circuit based on analog delay chain
CN110941185B (en) Double-word line 6TSRAM unit circuit for binary neural network
CN110176264A (en) A kind of high-low-position consolidation circuit structure calculated interior based on memory
CN113222133A (en) FPGA-based compressed LSTM accelerator and acceleration method
CN111158635B (en) FeFET-based nonvolatile low-power-consumption multiplier and operation method thereof
CN112233712A (en) 6T SRAM (static random Access memory) storage device, storage system and storage method
Zhan et al. Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems
CN116204490A (en) 7T memory circuit and multiply-accumulate operation circuit based on low-voltage technology
CN115879530A (en) Method for optimizing array structure of RRAM (resistive random access memory) memory computing system
CN116126779A (en) 9T memory operation circuit, multiply-accumulate operation circuit, memory operation circuit and chip
CN115394336A (en) Storage and computation FPGA (field programmable Gate array) framework
CN114898792A (en) Multi-bit memory inner product and exclusive-or unit, exclusive-or vector and operation method
CN113378115A (en) Near-memory sparse vector multiplier based on magnetic random access memory
CN114911453B (en) Multi-bit multiply-accumulate full-digital memory computing device
CN115394337A (en) Memory computing circuit and method
CN112382324B (en) Subthreshold region low-power consumption and calculation integrated CMOS circuit structure
CN117636945B (en) 5-bit signed bit AND OR accumulation operation circuit and CIM circuit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination