CN114937470A - Fixed point full-precision memory computing circuit based on multi-bit SRAM unit - Google Patents

Fixed point full-precision memory computing circuit based on multi-bit SRAM unit Download PDF

Info

Publication number
CN114937470A
CN114937470A CN202210549764.0A CN202210549764A CN114937470A CN 114937470 A CN114937470 A CN 114937470A CN 202210549764 A CN202210549764 A CN 202210549764A CN 114937470 A CN114937470 A CN 114937470A
Authority
CN
China
Prior art keywords
bit
output
adder
memory
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210549764.0A
Other languages
Chinese (zh)
Other versions
CN114937470B (en
Inventor
贺雅娟
骆宏阳
王梓霖
张波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210549764.0A priority Critical patent/CN114937470B/en
Publication of CN114937470A publication Critical patent/CN114937470A/en
Application granted granted Critical
Publication of CN114937470B publication Critical patent/CN114937470B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/54Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using elements simulating biological cells, e.g. neuron
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/01Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • G06F7/501Half or full adders, i.e. basic adder cells for one denomination
    • G06F7/502Half adders; Full adders consisting of two cascaded half adders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/41Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming static cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger
    • G11C11/413Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction
    • G11C11/417Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction for memory cells of the field-effect type
    • G11C11/419Read-write [R-W] circuits
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Neurology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Molecular Biology (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Evolutionary Computation (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Complex Calculations (AREA)

Abstract

The invention belongs to the technical field of integrated circuits, and particularly relates to a fixed point full-precision memory computing circuit based on a multi-bit SRAM unit. According to the invention, multiplication is realized by adding two transistors to form a transmission gate on the basis of a traditional SRAM storage array circuit, an adder tree is added for partial sum accumulation, and a bit serial input mode and a shift accumulator are adopted to complete multi-bit operation, so that precision-lossless matrix vector multiplication operation in an SRAM storage array is realized. The invention realizes the multi-bit SRAM memory calculation without precision loss, has the characteristics of small area and high parallelism, and is suitable for the convolutional neural network system which needs large-scale multiply-accumulate calculation.

Description

Fixed point full-precision memory computing circuit based on multi-bit SRAM unit
Technical Field
The invention belongs to the technical field of integrated circuits, and particularly relates to a fixed-point full-precision memory computing circuit based on a multi-bit SRAM unit.
Background
In recent years, the computing power has been increasing due to the development of integrated circuits. The field of artificial intelligence has also advanced rapidly, and given that its application scenarios typically involve pictures, audio and video, these are data intensive applications that are distinct from the traditional computationally intensive, control intensive ones. Convolutional Neural Networks (CNNs), especially in the context of processing pictures and video, have become widely used.
However, since the convolutional layer and the fully-connected layer both need a large amount of weights and a large amount of convolution operations, not only is a challenge to the computing power of the conventional von neumann architecture, but also a large amount of data transportation becomes a bottleneck of the power consumption and speed of the whole system. Especially in the embedded field, more and more internet of things devices need AI to give intelligence, but are limited by battery endurance limit and MCU calculation limit, and AI tasks can only be completed by sending data to the cloud end and processing the data and then returning the data. Not only is this delay high and in some cases does not meet the requirements, but the privacy of the individual is not well protected.
The SRAM memory calculation array circuit is a solution for data intensive application, multiplication and accumulation operation is completed in a memory, multi-bit data are multiplied and accumulated in parallel, the calculation mode is matched with the CNN calculation mode, and the real-time mode is met. Meanwhile, because the weight is stored in the array, the power consumption of carrying the weight back and forth is avoided, and the power consumption of the whole system is reduced.
Disclosure of Invention
Aiming at the problem that the traditional SRAM array circuit can not realize internal calculation, the invention provides a fixed-point full-precision memory calculation circuit based on a multi-bit SRAM unit, which realizes the multi-bit memory calculation function under the condition of obviously improving the energy efficiency ratio through the structural innovative design.
The technical scheme of the invention is as follows:
the fixed-point full-precision memory computing circuit based on the multi-bit SRAM unit is characterized by comprising 64 rows and 4 columns of storage units, 1 adder tree, 4 sense amplifiers and 1 accumulator.
In the memory array, a memory unit in each column is connected with two signal lines BL and BLB, and the BL and BLB signal lines are read-write operation bit lines and are used for loading data during read-write operation; the memory cells in each row are connected with three signal lines, namely WL, input and output, wherein the WL signal line is a read-write operation word line and is used for selecting a row during read-write operation, the input signal line is an input signal line and is used for inputting signals during a memory computing mode, and the output signal line is an output signal line and is used for outputting a multiplication result of the input and a stored value during the memory computing mode.
The memory computing circuit has an SRAM mode and a memory computing mode; the input terminals of the sense amplifiers are connected to the BL signal line and the BLB signal line, the SRAM mode uses the sense amplifier outputs, and the memory computation mode combines the outputs of the 4 columns into out1[3:0] to out64[3:0] which are fed into the adder tree for accumulation.
Specifically, the adder tree has 64 input ports in1[3:0] to in64[3:0] in total, each port corresponds to an output of each row of the storage array, and one output port sum [9:0] outputs 10 bits, which represents the accumulated result of all inputs. The adder tree does not work in the SRAM mode, and the multiplication results of 4 units in each row are accumulated in parallel in the memory calculation mode.
Specifically, the accumulator is responsible for accumulating the results of the adder tree, and has a 10-bit input port iat [9:0], and a 14-bit output port result.
Specifically, the memory unit in the memory array is an 8-transistor memory unit and comprises a first PMOS transistor, a second PMOS transistor, a first NMOS transistor, a second NMOS transistor, a third NMOS transistor, a fourth NMOS transistor, a fifth NMOS transistor and a sixth NMOS transistor, wherein a source electrode of the first PMOS transistor and a source electrode of the second PMOS transistor are connected with a power supply, a drain electrode of the first PMOS transistor is connected with a grid electrode of the second PMOS transistor, a drain electrode of the first NMOS transistor, a grid electrode of the second NMOS transistor, a drain electrode of the third NMOS transistor and a grid electrode of the fifth NMOS transistor; the drain electrode of the second PMOS tube is connected with the grid electrode of the first PMOS tube, the grid electrode of the first NMOS tube, the drain electrode of the second NMOS tube, the drain electrode of the fourth NMOS tube and the grid electrode of the sixth NMOS tube; the source electrode of the first NMOS tube and the source electrode of the second NMOS tube are grounded; the grid electrode of the third NMOS tube is connected with a row read-write operation signal WL, and the source electrode of the third NMOS tube is connected with a column read-write operation signal BL; the grid electrode of the fourth NMOS tube is connected with a row read-write operation signal WL, and the source electrode of the fourth NMOS tube is connected with a column write operation signal BLB; the source electrode of the fifth NMOS tube is connected with an input signal input, and the drain electrode of the fifth NMOS tube is connected with the drain electrode of the sixth NMOS tube to be used as an output signal output; and the source electrode of the sixth NMOS tube is grounded.
Specifically, the adder tree comprises 6 levels of addition trees, and the addition trees are alternately arranged by using 10T full adders and 28T full adders respectively for accumulation, wherein the 1 st level of addition tree is formed by 32 4-bit 10T full adders to generate 32 5-bit accumulation sums, the accumulation combination mode is from 0 to 63, and two adjacent inputs are combined in sequence; the 2-stage addition tree is formed by 16 5-bit 28T full adders to generate 16 6-bit accumulation sums, and the accumulation combination mode is the same as that of the 1 st stage; the 3 rd-level addition tree is formed by 8 6-bit 10T full adders, 8 accumulated sums of 7 bits are generated, and the accumulation combination mode is the same as that of the 1 st level; the 4 th-level addition tree is formed by 4 7-bit 28T full adders, 4 accumulated sums of 8 bits are generated, and the accumulated combination mode is the same as that of the 1 st level; the 5 th-stage addition tree is formed by 2 8-bit 10T full adders, 2 9-bit accumulation sums are generated, and the accumulation combination mode is the same as that of the 1 st stage; the 6 th-stage addition tree is formed by 1 9-bit 28T full adder, generates 1 accumulation sum of 10 bits as output and inputs the accumulation sum to the accumulator.
Specifically, the accumulator comprises a first D trigger, a second D trigger, a third D trigger, a 14bit adder and a shift circuit; the input end of the first D trigger is connected with the output port of the adder tree, the output end of the first D trigger is connected with one input end of the 14-bit adder, the other input end of the 14-bit adder is connected with the output end of the shifting circuit, the input end of the shifting circuit is connected with the output end of the second D trigger, the input end of the second D trigger is connected with the output end of the 14-bit adder, and the shifting circuit is used for shifting the output of the second D trigger by one bit to the left and then inputting the output of the second D trigger into the 14-bit adder; the input end of the third D trigger is connected with the output end of the 14bit adder, and the output end of the third D trigger is the output end of the accumulator; the accumulator shifts the last accumulation result to the left by one bit each clock cycle and adds the accumulated sum generated by the current clock cycle through a 14-bit adder. The method specifically comprises the following steps: the inputs are clocked in sequence from MSB to LSB, i.e. each cycle requires 2 that the adder tree produces a 10bit partial sum which is shifted left in sequence and then accumulated. The purpose of multi-bit multiply accumulation is achieved, and 5 periods are needed for completing the operation for 4-bit input.
Specifically, the supported operation data type is an unsigned number.
The invention has the beneficial effects that: the invention realizes the multi-bit SRAM memory calculation without precision loss by modifying the SRAM basic memory cell and adding the adder tree and the shift accumulator circuit, has the characteristics of small area and high parallelism, and is suitable for the convolutional neural network system which needs large-scale multiplication and accumulation calculation.
Drawings
FIG. 1 is a circuit for computing a fixed-point full-precision memory based on a multi-bit SRAM cell according to the present invention.
FIG. 2 is a schematic diagram of an 8T SRAM memory operation unit.
FIG. 3 is a diagram of an adder tree architecture.
FIG. 4 is a schematic diagram of 10T and 28T full adders used in an adder tree.
Fig. 5 is a schematic diagram of an accumulator and a corresponding timing diagram.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of a fixed-point full-precision memory computing circuit based on a multi-bit SRAM cell according to the present invention. The storage and computation integrated SRAM array circuit is composed of 64 rows and 4 columns, each row shares two root word lines of WL and input, each column shares two bit lines of BL and BLB, and output of each row is connected to an adder tree. The array circuit comprises 64-by-4 8-transistor storage units in total, and the 8-transistor storage units comprise a first NMOS transistor, a second NMOS transistor, a third NMOS transistor, a fourth NMOS transistor, a fifth NMOS transistor, a sixth NMOS transistor, a first PMOS transistor and a second PMOS transistor.
FIG. 2 is a schematic diagram of an 8-T SRAM cell. The first PMOS source electrode and the second PMOS source electrode in the 8-tube storage unit are connected with a power supply voltage. The first, second and sixth NMOS source stages are grounded. The drain of the first PMOS (denoted as node Q) is connected with the gate of the second PMOS, the drain of the first NMOS, the gate of the second NMOS, the drain of the third NMOS and the gate of the fifth NMOS. The gate of the first PMOS (denoted as node QB) is connected to the drain of the second PMOS, the gate of the first NMOS, the drain of the second NMOS, the drain of the fourth NMOS, and the gate of the sixth NMOS. The gates of the third and fourth NMOS are connected to a word line WL. The source of the third NMOS is connected to BL and the source of the fourth NMOS is connected to BLB. And the source of the fifth NMOS is connected with input. And the drain electrode of the fifth NMOS is connected with the drain electrode of the sixth NMOS to be used as an output node.
The fixed point full-precision memory computing circuit based on the multi-bit SRAM unit is characterized in that all NMOS tubes in the fixed point full-precision memory computing circuit are connected with a grounding voltage GND, and all PMOS tubes are connected with a power supply voltage VDD.
In order to realize matrix vector multiplication operation in the storage array, the invention uses two transistors to form a transmission gate, the input end is respectively connected with the input, AND two grids are connected with Q AND QB to form AND logic realization. input is an input signal so that the result of the multiplication of data and input data can be stored in the output port output unit.
In order to realize multi-bit accumulation operation in the memory array, the invention adopts an adder tree and a shift accumulator, wherein the adder tree generates multiplication results of 4 memory units and corresponding input in each 64 rows of the memory array and accumulates the multiplication results in parallel. The shift accumulator is responsible for left-shifting and accumulating the partial sums generated by the adder tree over 4 cycles, as will be described in detail later.
The operation of the memory array circuit of the present invention is described in detail below with reference to fig. 1, 2, 3, 4 and 5:
1. SRAM mode:
(1) and (3) maintaining operation:
during the period in which the memory cell holds data, the word line WL is kept at a low level. At this time, the third NMOS transistor MN3 and the fourth NMOS transistor MN4 are both turned off, and neither of the read bit lines BL and BLB affects the storage node Q or QB. The latch structure formed by the first PMOS transistor MP1, the second PMOS transistor MP2, the first NMOS transistor MN1, and the second NMOS transistor MN2 latches data of the storage nodes Q and QB.
(2) And (3) writing operation:
suppose that 8-pipe memory cell storage node Q is high and QB is low before a write operation, i.e. storing data as '1'. When writing data '0', the write operation word line is pulled high to high level to select the cell, and simultaneously, data '0' to be written is loaded on the write bit line, namely BL is low level and BLB is high level. The BL pulls down the node Q through the third NMOS transistor MN3, the BLB pulls up the node QB through the fourth NMOS transistor MN4, the latch structure feedback loop is broken, and the data '0' is written into the memory cell. Writing data '1' is the same as the above process.
(3) Read operation
Suppose before a read operation, the storage node Q of the memory cell is high and QB is low, i.e., data is stored as '1'. At the beginning of the read operation, the bit lines BL and BLB are precharged to high, the read lines WL and WLB are pulled high, the third NMOS transistor MN3 and the fourth NMOS transistor MN4 are turned on, and the second NMOS transistor is turned on due to the Q point being high. At this time, BLB is pulled low through the second NMOS transistor MN2 and the fourth NMOS transistor, BL remains unchanged, and "1" reading is completed. Reading data "0" is the same as the above process.
2. Memory computing mode:
in the memory calculation mode, the data stored in the memory cell represents 0 if the data is '0', and represents 1 if the data is '1'.
Assuming that the memory cell stores data of 1, i.e. Q point is "1", QB is "0", the fifth NMOS transistor MN5 is turned on, and the sixth NMOS transistor MN6 is turned off. At this time, output is connected with input through a fifth NMOS transistor MN5, and if input is "0", output is also "0"; if input is "1", output is also "1".
Assuming that the storage unit stores data as 0, i.e. the Q point is "0", QB is "1", at this time, the fifth NMOS transistor MN5 is turned off, the sixth NMOS transistor MN6 is turned on, and the output is "0" no matter whether the input is "0" or "1".
Since the four memory cells in each row share an input word line, the output of the nth row at this time:
Figure BDA0003654347430000051
wherein W i The storage data '0' or '1' of the corresponding column in fig. 1 is represented. The adder tree accumulates the 4-bit multiplication outputs generated by 64 rows in parallel.
Input in of adder in FIG. 3 i Corresponding output out generated by each row connected to the array i . The first 64 inputs through the first stage 10T 4bit adder will produce 32 partial 5bit summations, which will reduce area but will incur a threshold penalty. The second stage would then use a 28T 5bit mirror adder to avoid successive dips leading to errors. The third stage adopts a 10T adder again, the fourth stage adopts a 28T adder, the fifth stage adopts a 10T adder, the sixth stage adopts a 28T adder to complete the whole accumulation process, and the accumulation sum output sum [9:0] is generated]。
Figure BDA0003654347430000052
Figure BDA0003654347430000061
Fig. 5 depicts a timing diagram for the bit-serial input mode and a circuit diagram for the corresponding accumulator. For the case of a 4-bit input X times a 4-bit weight W, the inputs are sequentially input from MSB to LSB in an always periodic manner. I.e. the 1 st cycle input X 3 2 nd cycle input X 2 The third cycle inputs X 1 Fourth cycle input X 0 . Each clock cycle, the adder tree generates an accumulated sum, denoted sum i
Figure BDA0003654347430000062
In the accumulator, the output sum of the adder tree is connected to the inputs iat [9:0 ]. Each clock cycle shifts the previous accumulation result to the left by one bit and adds the accumulation sum generated in the current clock cycle by a 14-bit adder. The specific process is as follows:
cycle 1: result is 0, S is result < 1+ sum 0
And 2, period: result is sum 0 ,S=result<<1+sum 1
Cycle 3: result is sum 0 <<1+sum 1 ,S=result<<1+sum 2
And 4, period: result is sum 0 <<2+sum 1 <<1+sum 2 ,S=result<<1+sum 3
And 5, period: result is sum 0 <<3+sum 1 <<2+sum 2 <<1+sum 3
For binary, left-shifting by one bit is equivalent to multiplying by 2, then the above equation is all substituted to reduce to the final accumulator output result:
Figure BDA0003654347430000063
that is, the multiply-accumulate operation of 64 4-bit inputs and 4-bit weights is completed. The vector matrix multiplication can be completed by splicing a plurality of banks.
In summary, the fixed-point full-precision memory computing circuit based on the multi-bit SRAM cell according to the present invention implements matrix vector multiplication by improving the structure. Compared with the traditional structure, the invention realizes AND logic by adding two transistors to form a transmission gate in the aspect of the array circuit structure, AND obtains the result of multiplying the input AND the memory cell through the output port. The data of each column are added in parallel by adding an adder tree, the area is greatly reduced by using full adders of 10T and 28T in a crossed mode, multi-bit input is realized by a bit serial input mode and a shift accumulator, and therefore multi-bit matrix vector multiplication operation in an SRAM array is realized.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention, and it is to be understood that the scope of the invention is not limited to such specific statements and embodiments (e.g., number of rows 64 and number of columns 4). Those skilled in the art, having the benefit of this disclosure, may effect numerous modifications thereto and changes may be made without departing from the scope of the invention in its aspects.

Claims (4)

1. The fixed point full-precision memory computing circuit based on the multi-bit SRAM unit is characterized in that the memory computing circuit has an SRAM mode and a memory computing mode and comprises a storage array, an adder tree, a sensitive amplifier and an accumulator;
the memory array is composed of 64 rows and 4 columns of memory cells, and the memory cells in each column are connected with a column read operation signal BL and a column write operation signal BLB; the memory cells in each row are connected with a row read-write operation signal WL, an input signal input and an output signal output, the read-write operation signal WL is used for selecting a middle row during read-write operation, the input signal input is used for inputting data during an in-memory calculation mode, and the output signal output is used for outputting multiplication results of the input data and a stored value during the in-memory calculation mode;
the number of the sensitive amplifiers is 4, the input of each sensitive amplifier corresponds to a read operation signal BL and a column write operation signal BLB of a column of storage units, and the output of each sensitive amplifier is the output of the memory computing circuit in an SRAM mode;
the adder tree has 64 4-bit input ports in1[3:0] to in64[3:0], each port corresponds to the output of each row of the storage array, and the output ports of the adder tree are 10-bit ports sum [9:0] and represent the accumulation results of all the inputs; the adder tree does not work in an SRAM mode, and multiplication results of 4 units in each row are accumulated in parallel in an in-memory calculation mode;
the accumulator is responsible for accumulating the results of the adder tree and has a 10-bit input port iat [9:0] and a 14-bit output port result [13:0], and in the memory calculation mode, the output of the accumulator is the output of the memory calculation circuit.
2. The fixed-point full-precision memory computing circuit based on the multi-bit SRAM unit as claimed in claim 1, wherein the memory cells in the memory array are 8-transistor memory cells, and include a first PMOS transistor, a second PMOS transistor, a first NMOS transistor, a second NMOS transistor, a third NMOS transistor, a fourth NMOS transistor, a fifth NMOS transistor, and a sixth NMOS transistor, wherein a source of the first PMOS transistor and a source of the second PMOS transistor are connected to a power supply, a drain of the first PMOS transistor is connected to a gate of the second PMOS transistor, a drain of the first NMOS transistor, a gate of the second NMOS transistor, a drain of the third NMOS transistor, and a gate of the fifth NMOS transistor; the drain electrode of the second PMOS tube is connected with the grid electrode of the first PMOS tube, the grid electrode of the first NMOS tube, the drain electrode of the second NMOS tube, the drain electrode of the fourth NMOS tube and the grid electrode of the sixth NMOS tube; the source electrode of the first NMOS tube and the source electrode of the second NMOS tube are grounded; the grid electrode of the third NMOS tube is connected with a row read-write operation signal WL, and the source electrode of the third NMOS tube is connected with a column read-write operation signal BL; the grid electrode of the fourth NMOS tube is connected with a row read-write operation signal WL, and the source electrode of the fourth NMOS tube is connected with a column write operation signal BLB; the source electrode of the fifth NMOS tube is connected with an input signal input, and the drain electrode of the fifth NMOS tube is connected with the drain electrode of the sixth NMOS tube to be used as an output signal output; and the source electrode of the sixth NMOS tube is grounded.
3. The fixed-point full-precision memory computing circuit based on the multi-bit SRAM unit as claimed in claim 2, wherein the adder tree comprises 6 levels of addition trees, and the addition trees are respectively accumulated by using two different full adders of 10T and 28T, wherein the 1 st level of addition tree is formed by 32 4-bit 10T full adders, 32 accumulated sums of 5 bits are generated, the accumulation combination mode is from 0 to 63, and two adjacent inputs are sequentially combined; the 2-level addition tree is formed by 16 5-bit 28T full adders, 16 6-bit accumulation sums are generated, and the accumulation combination mode is the same as that of the 1 st level; the 3 rd-level addition tree is formed by 8 6-bit 10T full adders, 8 accumulated sums of 7 bits are generated, and the accumulated combination mode is the same as that of the 1 st level; the 4 th-level addition tree is formed by 4 7-bit 28T full adders, 4 accumulated sums of 8 bits are generated, and the accumulated combination mode is the same as that of the 1 st level; the 5 th stage addition tree is formed by 2 8-bit 10T full adders to generate 2 9-bit accumulation sums, and the accumulation combination mode is the same as that of the 1 st stage; the 6 th-stage addition tree is formed by 1 9-bit 28T full adder, generates 1 accumulation sum of 10 bits as output and inputs the accumulation sum to the accumulator.
4. The multi-bit SRAM cell based fixed point full precision memory calculation circuit of claim 3, wherein said accumulator comprises a first D flip-flop, a second D flip-flop, a third D flip-flop, a 14bit adder and a shift circuit; the input end of the first D trigger is connected with the output port of the adder tree, the output end of the first D trigger is connected with one input end of the 14-bit adder, the other input end of the 14-bit adder is connected with the output end of the shifting circuit, the input end of the shifting circuit is connected with the output end of the second D trigger, the input end of the second D trigger is connected with the output end of the 14-bit adder, and the shifting circuit is used for shifting the output of the second D trigger by one bit to the left and then inputting the output of the second D trigger to the 14-bit adder; the input end of the third D trigger is connected with the output end of the 14bit adder, and the output end of the third D trigger is the output end of the accumulator; the accumulator shifts the last accumulation result to the left by one bit each clock cycle and adds the accumulated sum generated by the current clock cycle through a 14-bit adder.
CN202210549764.0A 2022-05-20 2022-05-20 Fixed point full-precision memory computing circuit based on multi-bit SRAM unit Active CN114937470B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210549764.0A CN114937470B (en) 2022-05-20 2022-05-20 Fixed point full-precision memory computing circuit based on multi-bit SRAM unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210549764.0A CN114937470B (en) 2022-05-20 2022-05-20 Fixed point full-precision memory computing circuit based on multi-bit SRAM unit

Publications (2)

Publication Number Publication Date
CN114937470A true CN114937470A (en) 2022-08-23
CN114937470B CN114937470B (en) 2023-04-07

Family

ID=82864195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210549764.0A Active CN114937470B (en) 2022-05-20 2022-05-20 Fixed point full-precision memory computing circuit based on multi-bit SRAM unit

Country Status (1)

Country Link
CN (1) CN114937470B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658013A (en) * 2022-09-30 2023-01-31 杭州智芯科微电子科技有限公司 ROM memory computing device and electronic apparatus of vector multiplier adder
CN116913342A (en) * 2023-09-13 2023-10-20 安徽大学 Memory circuit with in-memory Boolean logic operation function, and module and chip thereof
CN118503203A (en) * 2024-07-10 2024-08-16 中国人民解放军国防科技大学 Configurable in-memory computing architecture based on standard cells and compiler therefor

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6054918A (en) * 1996-09-30 2000-04-25 Advanced Micro Devices, Inc. Self-timed differential comparator
US20140347916A1 (en) * 2013-05-24 2014-11-27 Nvidia Corporation Eight transistor (8t) write assist static random access memory (sram) cell
US20180321911A1 (en) * 2015-12-01 2018-11-08 Institute Of Computing Technology, Chinese Academy Of Sciences Adder device, data accumulation method and data processing device
CN110515454A (en) * 2019-07-24 2019-11-29 电子科技大学 A kind of neural network framework electronic skin calculated based on memory
CN113035251A (en) * 2021-05-21 2021-06-25 中科院微电子研究所南京智能技术研究院 Digital memory computing array device
US20210208876A1 (en) * 2020-01-07 2021-07-08 SK Hynix Inc. Processing-in-memory (pim) system including multiplying-and-accumulating (mac) circuit
CN113345484A (en) * 2021-06-24 2021-09-03 苏州兆芯半导体科技有限公司 Data operation circuit and storage and calculation integrated chip
CN113419705A (en) * 2021-07-05 2021-09-21 南京后摩智能科技有限公司 Memory multiply-add calculation circuit, chip and calculation device
CN113593618A (en) * 2021-07-30 2021-11-02 电子科技大学 Storage and calculation integrated storage array structure suitable for differential SRAM storage unit
CN113741858A (en) * 2021-09-06 2021-12-03 南京后摩智能科技有限公司 In-memory multiply-add calculation method, device, chip and calculation equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6054918A (en) * 1996-09-30 2000-04-25 Advanced Micro Devices, Inc. Self-timed differential comparator
US20140347916A1 (en) * 2013-05-24 2014-11-27 Nvidia Corporation Eight transistor (8t) write assist static random access memory (sram) cell
US20180321911A1 (en) * 2015-12-01 2018-11-08 Institute Of Computing Technology, Chinese Academy Of Sciences Adder device, data accumulation method and data processing device
CN110515454A (en) * 2019-07-24 2019-11-29 电子科技大学 A kind of neural network framework electronic skin calculated based on memory
US20210208876A1 (en) * 2020-01-07 2021-07-08 SK Hynix Inc. Processing-in-memory (pim) system including multiplying-and-accumulating (mac) circuit
CN113035251A (en) * 2021-05-21 2021-06-25 中科院微电子研究所南京智能技术研究院 Digital memory computing array device
CN113345484A (en) * 2021-06-24 2021-09-03 苏州兆芯半导体科技有限公司 Data operation circuit and storage and calculation integrated chip
CN113419705A (en) * 2021-07-05 2021-09-21 南京后摩智能科技有限公司 Memory multiply-add calculation circuit, chip and calculation device
CN113593618A (en) * 2021-07-30 2021-11-02 电子科技大学 Storage and calculation integrated storage array structure suitable for differential SRAM storage unit
CN113741858A (en) * 2021-09-06 2021-12-03 南京后摩智能科技有限公司 In-memory multiply-add calculation method, device, chip and calculation equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MINGU KANG .ETC: "AN In-Memory VLSI Architecture for Convolutional Neural Network" *
XIN SI .ETC: "A Local Computing Cell and 6T SRAM-Based Computing-in-Memory Macro with 8-b MAC Operation for Edge AI Chips" *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658013A (en) * 2022-09-30 2023-01-31 杭州智芯科微电子科技有限公司 ROM memory computing device and electronic apparatus of vector multiplier adder
CN115658013B (en) * 2022-09-30 2023-11-07 杭州智芯科微电子科技有限公司 ROM in-memory computing device of vector multiply adder and electronic equipment
CN116913342A (en) * 2023-09-13 2023-10-20 安徽大学 Memory circuit with in-memory Boolean logic operation function, and module and chip thereof
CN116913342B (en) * 2023-09-13 2023-12-01 安徽大学 Memory circuit with in-memory Boolean logic operation function, and module and chip thereof
CN118503203A (en) * 2024-07-10 2024-08-16 中国人民解放军国防科技大学 Configurable in-memory computing architecture based on standard cells and compiler therefor

Also Published As

Publication number Publication date
CN114937470B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN114937470B (en) Fixed point full-precision memory computing circuit based on multi-bit SRAM unit
CN113467751B (en) Analog domain memory internal computing array structure based on magnetic random access memory
CN109979503B (en) Static random access memory circuit structure for realizing Hamming distance calculation in memory
CN114089950B (en) Multi-bit multiply-accumulate operation unit and in-memory calculation device
CN112992232B (en) Multi-bit positive and negative single-bit memory computing unit, array and device
CN117636945B (en) 5-bit signed bit AND OR accumulation operation circuit and CIM circuit
CN117271436B (en) SRAM-based current mirror complementary in-memory calculation macro circuit and chip
CN117608519B (en) Signed multiplication and multiply-accumulate operation circuit based on 10T-SRAM
Tsai et al. RePIM: Joint exploitation of activation and weight repetitions for in-ReRAM DNN acceleration
CN111193511A (en) Design of digital-analog hybrid reading circuit applied to eFlash storage and calculation integrated circuit
KR102555621B1 (en) In-memory computation circuit and method
Jiang et al. CIMAT: A transpose SRAM-based compute-in-memory architecture for deep neural network on-chip training
CN113345484A (en) Data operation circuit and storage and calculation integrated chip
CN114512161B (en) Memory computing device with symbols
CN111627479B (en) Coding type flash memory device, system and coding method
Zhao et al. A Novel Transpose 2T-DRAM based Computing-in-Memory Architecture for On-chip DNN Training and Inference
CN114895869B (en) Multi-bit memory computing device with symbols
US20230266943A1 (en) Digital in-memory computing macro based on approximate arithmetic hardware
CN116543808A (en) All-digital domain in-memory approximate calculation circuit based on SRAM unit
CN115629734A (en) In-memory computing device and electronic apparatus of parallel vector multiply-add device
CN115910152A (en) Charge domain memory calculation circuit and calculation circuit with positive and negative number operation function
CN114647398B (en) Carry bypass adder-based in-memory computing device
US11935586B2 (en) Memory device and method for computing-in-memory (CIM)
CN116959517A (en) In-memory computing circuit based on multiplexing Booth multiplication unit
CN114911453B (en) Multi-bit multiply-accumulate full-digital memory computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant