CN114999544A

CN114999544A - Memory computing circuit based on SRAM

Info

Publication number: CN114999544A
Application number: CN202210585976.4A
Authority: CN
Inventors: 贺雅娟; 张振伟; 骆宏阳; 王梓霖; 张波
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-05-27
Filing date: 2022-05-27
Publication date: 2022-09-02

Abstract

The invention belongs to the technical field of integrated circuits, and particularly relates to a memory computing circuit based on an SRAM. The invention provides an SRAM (static random access memory) based memory computing circuit, which comprises a 6T SRAM Cell array and a multiplexing computing unit, wherein the memory computing circuit writes weight data into an SRAM in advance for storage, and then performs multiply-accumulate operation on the multiplexing computing unit and input data to realize the memory computing. The memory computing unit provided by the invention increases the area of the memory, but enables data to be computed in the memory, and obviously reduces the data carrying capacity and power consumption of applications such as a convolutional neural network.

Description

Memory computing circuit based on SRAM

Technical Field

The invention belongs to the technical field of integrated circuits, and relates to a memory computing circuit based on an SRAM.

Background

With the rapid development and large-scale application of artificial intelligence technology and internet of things technology, the data volume required to be processed by the neural network algorithm is larger and larger. In a computer based on a traditional von Neumann architecture, a memory and a Central Processing Unit (CPU) are separated from each other, in the calculation process, the memory firstly transmits data into the CPU through a bus for calculation, and after the calculation is finished, a calculation result is written back into the memory through the bus. The von neumann architecture is dominant when the speed of the memory and bus are matched. However, in recent decades, the memory has been developed towards a pyramid hierarchy, i.e., the farther the memory is from the CPU, the greater the capacity, the cheaper the memory, but the slower the memory, and the speed of the CPU has been increasing at a high rate following moore's law. This also makes the access speed of the Memory far lower than the data processing speed of the CPU, which is a "Memory Wall" limiting the performance of the computer, and is also recognized as the biggest bottleneck in the von neumann architecture. This phenomenon is more evident in applications requiring high frequency data reading and writing and transmission, such as deep neural networks. The deep neural network, as the most widely applied algorithm in artificial intelligence image recognition at present, needs to perform access, multiplication and accumulation operations on a large amount of image data. The method is limited by limited transmission speed and bandwidth of a system bus and a memory, and the reasoning speed and the application range of the algorithm of the deep neural network are limited by the factors.

To break through this bottleneck of von neumann architecture, a Computing-in-Memory (CIM) architecture is proposed. As the name implies, in-memory computation is the performance of some computation in memory, such as a multiplication, exclusive nor and accumulation operation. Therefore, a large amount of data in the memory can be calculated without being transmitted to the interior of the CPU through a system bus, the result can be calculated by relying on some calculating units and control circuits in the memory, and the result is output only through the system bus, so that the bus interaction between the CPU and the memory is greatly reduced, the problems of unmatched CPU speed and memory speed and data access power consumption are solved, and the problems of transmission bandwidth and delay limitation on the system bus are solved.

Disclosure of Invention

The invention aims to provide a memory computing unit, which realizes the multiply-accumulate operation of data by introducing a computing unit circuit into an SRAM memory, avoids the defect that the data can be taken out of the memory through a bus and then sent to a CPU through the bus for computation in a von Neumann architecture, and can effectively reduce the data transportation amount and the circuit power consumption generated by the data transportation amount.

In order to achieve the above object, the present invention discloses an SRAM-based memory Computing unit, which includes a 6T SRAM Cell array and a multiplexing Computing unit (multiplexing Computing Cell). Before calculation, the weight data to be calculated is written into the 6T SRAM Cell through the SRAM peripheral circuit for storage. When the device works in the memory computing mode, input data are pre-coded, stored weight data are read out at the same time, and a multiplexing computing unit is used for performing multiply-accumulate operation so as to realize the memory computing function. The multiplexing calculation unit is that each 16 6T SRAM cells multiplex the same calculation unit in a time-sharing mode, only 1 6T Cell is gated in each calculation, and compared with the scheme that each 6T Cell needs one calculation unit, the time-sharing multiplexing strategy greatly reduces the area of a memory calculation circuit and is more suitable for edge calculation equipment.

The technical scheme of the invention is as follows:

an SRAM-based memory compute circuit, the memory compute unit comprising a 6T SRAM Cell array of 16 rows and 64 columns and a 64 column multiplexing compute unit; the 6T SRAM cells of each column are connected with column read-write signal lines BLP and BLN, the 6T SRAM cells of each row are connected with read-write signal lines WL, input data are divided into a first input end, a second input end and a third input end after being coded and connected with a multiplexing calculation unit, and the multiplexing calculation unit obtains weight data through the signal lines BLP and BLN; specifically, each multiplexing calculation unit comprises a first NMOS transistor, a second NMOS transistor, a third NMOS transistor and a fourth NMOS transistor, wherein the grid electrode of the first NMOS transistor is connected with a signal line BLP, and the drain electrode of the first NMOS transistor is connected with the source electrode of the second NMOS transistor; the grid of the second NMOS tube is connected with the first input end, and the drain of the second NMOS tube is connected with the second input end; the grid electrode of the third NMOS tube is connected with the signal line BLN, and the drain electrode of the third NMOS tube is connected with the source electrode of the fourth NMOS tube; the grid electrode of the fourth NMOS tube is connected with the first input end, and the drain electrode of the fourth NMOS tube is connected with the third input end; the source electrode of the first NMOS tube and the source electrode of the third NMOS tube are connected to be used as output ends;

the input data are converted into voltage signals of a first input end, a second input end and a third input end after being coded, and the method specifically comprises the following steps: when the input data is 1, the first input end is the power supply voltage, the second input end is the power supply voltage and the third input end is the 0 voltage after the coding; when the input data is 0, the voltage signals of the first input end, the second input end and the third input end are all 0 after being coded; when the input data is-1, the first input end is the power supply voltage after the coding, the second input end is the 0 voltage, and the third input end is the power supply voltage.

The invention has the beneficial effects that: the invention provides an SRAM (static random access memory) based memory computing unit, which realizes the memory computing by multiplexing a computing unit, so that data can be computed in a memory, and the data carrying capacity and power consumption of applications such as a convolutional neural network can be remarkably reduced.

Drawings

FIG. 1 is a diagram illustrating the overall circuit architecture of the present invention.

FIG. 2 is a schematic diagram of encoding an input according to the present invention.

FIG. 3 is a schematic diagram of the calculation process of the present invention.

FIG. 4 is a diagram illustrating the calculation results of the present invention.

Detailed Description

In order that the above objects, features and advantages of the present invention will become more apparent, a detailed and complete description of the embodiments of the present invention will be given below with reference to the accompanying drawings.

As shown in FIG. 1, the present invention is mainly composed of an SRAM array and a multiplexing calculating unit. Before calculation, the weight data to be calculated is written into the 6T SRAM Cell through the SRAM peripheral circuit for storage. When the circuit is operating in the memory computing mode, the input encoding circuit encodes the input data as shown in FIG. 2, i.e., different inputs correspond to different voltages on INV and INS _ N, INS _ P. Meanwhile, the same calculation unit is multiplexed by every 16 SRAM cells in a time-sharing mode, namely, only one of every 16 6T SRAMs is started to calculate in each clock cycle. Compared with a scheme that each 6T Cell needs one computing unit, the time division multiplexing strategy greatly reduces the area of an in-memory computing circuit and is more suitable for edge computing equipment.

The specific calculation process is shown in fig. 3, first, a 6T Cell is gated, the weight data is reflected on BLP and BLN, if the weight stored data is +1, the BLP voltage is VDD, and the BLN voltage is 0; if the weight stores data as-1, the BLP voltage is 0 and the BLN upper voltage is VDD. The voltages of the input data after being encoded are expressed on INV, INS _ P and INS _ N, if the input is +1, the voltages of INV and INS _ P are VDD and the voltage of INS _ N is 0. Thus, the paths of N1 and N2 are turned on, and the paths of N3 and N4 are turned off, i.e., OUTPUT is charged. If the calculation result is 1, the charging current on OUTPUT is represented; if the calculation result is 0, the circuit is not conducted, namely, the circuit is not charged or discharged; if the result of the calculation is-1, it is represented as a discharge current on OUTPUT. The calculation results of the multiplexing calculation units in the same row are commonly applied to one output bit line, as shown in fig. 4.

In summary, the present invention provides an SRAM-based memory computing unit, which includes a 6T SRAM Cell array and a multiplexing computing unit. The memory calculation circuit performs memory calculation by writing weight data into an SRAM in advance and then performing calculation by multiplexing a calculation unit with input data. The memory computing unit provided by the invention increases the area of the memory, but can complete the computation of data in the memory, and can remarkably reduce the data carrying capacity and power consumption of applications such as a convolutional neural network and the like.

Claims

1. An SRAM-based memory computing circuit, wherein the memory computing unit comprises a 6T SRAM Cell array with 16 rows and 64 columns of multiplexing computing units; the 6T SRAM cells of each column are connected with column read-write signal lines BLP and BLN, the 6T SRAM cells of each row are connected with read-write signal lines WL, input data are divided into a first input end, a second input end and a third input end after being coded and connected with a multiplexing calculation unit, the multiplexing calculation unit obtains weight data through the signal lines BLP and BLN, and the weight data are stored in the 6T SRAM cells through the signal lines BLP and BLN before calculation; specifically, each multiplexing calculation unit comprises a first NMOS transistor, a second NMOS transistor, a third NMOS transistor and a fourth NMOS transistor, wherein the grid electrode of the first NMOS transistor is connected with a signal line BLP, and the drain electrode of the first NMOS transistor is connected with the source electrode of the second NMOS transistor; the grid of the second NMOS tube is connected with the first input end, and the drain of the second NMOS tube is connected with the second input end; the grid electrode of the third NMOS tube is connected with the signal line BLN, and the drain electrode of the third NMOS tube is connected with the source electrode of the fourth NMOS tube; the grid electrode of the fourth NMOS tube is connected with the first input end, and the drain electrode of the fourth NMOS tube is connected with the third input end; the source electrode of the first NMOS tube and the source electrode of the third NMOS tube are connected to be used as output ends;

the input data are converted into voltage signals of a first input end, a second input end and a third input end after being coded, and the method specifically comprises the following steps: when the input data is 1, the first input end is the power supply voltage after encoding, the second input end is the power supply voltage, and the third input end is the 0 voltage; when the input data is 0, the voltage signals of the first input end, the second input end and the third input end are all 0 after being coded; when the input data is-1, the first input end is the power supply voltage after the coding, the second input end is the 0 voltage, and the third input end is the power supply voltage.

2. The SRAM-based memory computing circuit of claim 1, wherein 16 SRAM cells in each row time-multiplex the same computing unit, and only 1 6T Cell is gated in each computation.