CN118312468B  Inmemory operation circuit with symbol multiplication and CIM chip  Google Patents
Inmemory operation circuit with symbol multiplication and CIM chip Download PDFInfo
 Publication number
 CN118312468B CN118312468B CN202410735739.0A CN202410735739A CN118312468B CN 118312468 B CN118312468 B CN 118312468B CN 202410735739 A CN202410735739 A CN 202410735739A CN 118312468 B CN118312468 B CN 118312468B
 Authority
 CN
 China
 Prior art keywords
 cbl
 bit line
 operand
 multiplication
 sram
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Active
Links
 238000004364 calculation method Methods 0.000 claims abstract description 29
 239000003990 capacitor Substances 0.000 claims abstract description 21
 238000000034 method Methods 0.000 claims description 20
 230000005540 biological transmission Effects 0.000 claims description 18
 230000008569 process Effects 0.000 claims description 17
 230000008859 change Effects 0.000 claims description 16
 101150110971 CIN7 gene Proteins 0.000 claims description 9
 101150110298 INV1 gene Proteins 0.000 claims description 9
 101100397044 Xenopus laevis invsa gene Proteins 0.000 claims description 9
 230000006870 function Effects 0.000 abstract description 8
 239000000047 product Substances 0.000 description 28
 238000010586 diagram Methods 0.000 description 12
 238000007599 discharging Methods 0.000 description 7
 230000006872 improvement Effects 0.000 description 7
 238000002474 experimental method Methods 0.000 description 6
 238000012545 processing Methods 0.000 description 6
 238000013527 convolutional neural network Methods 0.000 description 5
 238000013461 design Methods 0.000 description 5
 238000013528 artificial neural network Methods 0.000 description 4
 238000012360 testing method Methods 0.000 description 4
 238000004458 analytical method Methods 0.000 description 3
 238000013473 artificial intelligence Methods 0.000 description 3
 230000007423 decrease Effects 0.000 description 3
 238000013500 data storage Methods 0.000 description 2
 230000009977 dual effect Effects 0.000 description 2
 239000012467 final product Substances 0.000 description 2
 230000007246 mechanism Effects 0.000 description 2
 238000009825 accumulation Methods 0.000 description 1
 230000009286 beneficial effect Effects 0.000 description 1
 238000012512 characterization method Methods 0.000 description 1
 230000001419 dependent effect Effects 0.000 description 1
 238000011161 development Methods 0.000 description 1
 230000004069 differentiation Effects 0.000 description 1
 238000005265 energy consumption Methods 0.000 description 1
 238000013507 mapping Methods 0.000 description 1
 238000012986 modification Methods 0.000 description 1
 230000004048 modification Effects 0.000 description 1
 230000002093 peripheral effect Effects 0.000 description 1
 238000004088 simulation Methods 0.000 description 1
 238000012546 transfer Methods 0.000 description 1
Classifications

 Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSSSECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSSREFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
 Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
 Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
 Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
 Static RandomAccess Memory (AREA)
Abstract
The invention belongs to the technical field of integrated circuits, and particularly relates to an inmemory operation circuit with symbol multiplication and a CIM chip thereof. The inmemory operation circuit comprises at least one row of operation units, and the operation units comprise a weight storage part and a calculation part; the weight storage part adopts an SRAM cell with double word lines; the circuit connection relation of the calculation part is as follows: the drains of P1 and N3 are connected to the computation bit line CBL; the gate of N3 is connected with the bit line BL, and the gate of P1 is connected with the bit line BLB; the source electrode of N3 is connected with the drain electrode of N4; the source electrode of the P1 is connected with the drain electrode of the P2; the grid electrode of N4 is connected with an input word line INN; the grid electrode of P2 is connected with an input word line INP; the source electrode of N4 is connected with VSS; the source electrode of P2 is connected with VDD; the capacitor C is connected between CBL and VSS; the scheme solves the problems of large area overhead and low operation efficiency of the common CIM circuits with signed multiplication and multiplyaccumulate operation functions.
Description
Technical Field
The invention belongs to the technical field of integrated circuits, and particularly relates to an inmemory operation circuit with a sign multiplication function and a CIM chip adopting the same.
Background
With the rapid development and popularization of artificial intelligence, convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs) have become one of the most influential innovations in the field of computer vision. A large number of multiply and multiplyaccumulate (MAC) operations are required for data processing in neural networks such as CNN and DNN, and such operations are performed in computers based on von neumann architecture. This problem is known as von neumann bottleneck or memory wall, because of the high power consumption and delay caused by the need to carry data between the processor and the memory at a frequency. Demonstration of DNN processors and accelerators based on von neumann architecture shows that energy consumption and latency are primarily dependent on the input data between the processor and memory. Thus, conventional von neumann computers are not suitable for processing artificial intelligencerelated computing tasks such as neural networks.
In order to overcome the von neumann bottleneck, a technical staff proposes a memorybased inmemory operation (CIM) architecture, and the novel computer architecture directly utilizes a memory to realize logic operation without carrying data in the memory and a processor, so that the data processing efficiency can be greatly improved, and the running power consumption of equipment can be reduced.
Convolutional neural networks include a number of signed multiply and multiplyaccumulate operations. The existing CIM circuits capable of realizing multibit signed multiplication or multiply accumulation generally separate positive and negative weights in the calculation process, and the multiplication of the positive weights and the multiplication of the negative weights are respectively executed in different SRAM units. In order to solve the problems of large area overhead and low operation efficiency commonly existing in various CIM circuits with signed multiplication and multiplyaccumulate operation functions, the invention provides an inmemory operation circuit with signed multiplication and a CIM chip adopting the same.
Disclosure of Invention
The technical scheme provided by the invention is as follows:
An inmemory arithmetic circuit with sign multiplication comprises at least one row of arithmetic units, wherein each row of arithmetic units comprises a weight storage part and a calculation part. Wherein the weight storage part adopts an SRAM cell with double word lines. The drains of the two transmission tubes in the SRAM unit are respectively connected to the two bit lines, and the gates of the two transmission tubes are respectively connected to the two word lines. The computing part consists of two NMOS tubes N3 and N4, two PMOS tubes P1 and P2 and a capacitor C; the circuit connection relation is as follows:
The drains of P1 and N3 are connected to the computation bit line CBL; the gate of N3 is connected with the bit line BL, and the gate of P1 is connected with the bit line BLB; the source electrode of N3 is connected with the drain electrode of N4; the source electrode of the P1 is connected with the drain electrode of the P2; the grid electrode of N4 is connected with an input word line INN; the grid electrode of P2 is connected with an input word line INP; the source electrode of N4 is connected with VSS; the source electrode of P2 is connected with VDD; one end of the capacitor C is connected to the computation bit line CBL, and the other end is connected to VSS.
The operation unit of each column is used for realizing multiplication operation between a signed 2bit first operand and an unsigned second operand; the operation logic for performing the multiplication operation is:
Prestoring a second operand in the SRAM cell, and precharging the bit line CBL to an intermediate potential; encoding the input first operand by the level states of WLL, WLR, INN and INP; the product of the first operand and the second operand is reflected in the change in bit line voltage of the computation bit line CBL.
As a further improvement of the invention, the SRAM unit adopts a 6TSRAM unit or other SRAM units with double word lines, which are obtained by adding MOS tubes on the basis of the 6TSRAM unit.
In the present invention, the 6TSRAM cell includes two NMOS transistors N1 and N2, and two inverters INV0 and INV1. The circuit connection relationship is as follows: the input end of INV0 and the output end of INV1 are connected with the source electrode of N1 and serve as a storage node Q. The output end of INV0 and the input end of INV1 are connected with the source electrode of N2 and serve as a storage node QB. The drains of N1 and N2 are connected to bit lines BL and BLB, respectively, and the gates of N1 and N2 are connected to word lines WLL and WLR, respectively.
As a further improvement of the present invention, during the multiplication operation, when the first operand in the multiplication operation is "+1", WLL, INN and INP are set low and WLR is set high. When the first operand in the multiplication operation is "1", WLL, INN and INP are set high and WLR is set low. When the first operand in the multiplication operation is "0", WLL and INN are set low, and WLR and INP are set high.
As a further improvement of the present invention, when the bit line voltage of the calculated bit line CBL rises during the multiplication operation, the product is expressed as "+1"; when the bit line voltage of the calculated bit line CBL drops, the product is represented as "1"; when the bit line voltage of the calculated bit line CBL remains unchanged, the product is represented as "0".
As a further improvement of the present invention, the weight storage section includes a plurality of SRAM cells arranged in columns; each SRAM cell is connected to the same set of bit lines BL and BLB; each SRAM cell is also connected to an independent word line WLL and WLR of the corresponding row; the same column of SRAM cells share the same circuitry of the computation portion.
As a further improvement of the present invention, in the calculation section, the aspect ratio of the transistors N3 and N4 is the same; the aspect ratio of transistors P1 and P2 is the same.
As a further improvement of the invention, the differentiation of the weights of the second operands of the multiplication operations performed in the arithmetic units of the different columns is achieved by adjusting the capacitance magnitudes of the capacitances C in the calculation sections of the columns. The multiple of the capacitance C in each column relative to the unit capacitance is the weight of the second operand in the operation process.
As a further improvement of the present invention, the inmemory arithmetic circuit of signed multiplication includes N columns of arithmetic units, and the multiplying power of the capacitance value of the capacitor C in each column of arithmetic units is 1,2, 4, 8, …,2 ^{N1}, respectively. A transmission gate TG for connecting the computation bit lines CBL of the adjacent columns of operation units is arranged between the adjacent columns of operation units respectively.
The inmemory operation circuit with the signed multiplication comprising the multicolumn operation unit can realize the multiplication operation of the first operand with 2 bits and the second operand with Nbit, and the operation process comprises the following operations:
(1) Disconnecting transmission gates among operation columns; and precharging the computation bit lines CBL of two columns each to an intermediate potential.
(2) The second operand of Nbit is decomposed into N singlebit numbers according to the bits, and each singlebit number is prestored in weight storage parts of different columns according to the corresponding weights.
(3) Synchronizing input of the encoded first operand to all selected columns of SRAM cells through WLL, WLR, INN and INP; and then completing multiplication operation of one bit in the first operand and the second operand in each column;
(4) Closing transmission gates among operation columns, wherein the product of the 2bit first operand and the Nbit second operand is reflected on the change of bit line voltage of a calculation bit line CBL; the change direction of the bit line voltage of the CBL reflects the sign of the product, and the change amplitude of the CBL reflects the numerical value of the product.
The invention also comprises a CIM chip integrated with the inmemory arithmetic circuit with sign multiplication.
The technical scheme provided by the invention has the following beneficial effects:
The invention designs an inmemory computing circuit with symbol multiplication based on an SRAM unit with double word lines and double bit lines, wherein the inmemory computing circuit stores 1bit weight in the SRAM unit, the number of 2bit with symbols is divided into two parts, namely 1bit symbol bit and 1bit unsigned symbol bit, the 1bit symbol bit is represented by controlling the high and low levels of the double word lines WLL and WLR, and the 1bit unsigned symbol bit is controlled by combining with the input word lines INN and INP of the newly added computing part. In the circuit, according to different values of each signal in the characterization weight and the signed number, the conduction of the chargedischarge path of the word line CBL relative to the power supply and the ground can be controlled and calculated, and the final product result is characterized by the site voltage change of the CBL.
The circuit can realize the operation of multiplication among different symbol numbers in a single circuit unit, saves the area cost of the circuit, and has relatively high reliability.
The scheme of the invention can also represent the high and low positions of the weights in each column by calculating the size of the capacitor C mounted in part, and reflect the calculation result on the voltage of the CBL, thereby ensuring the accuracy of the calculation result. The design circuit is also used for supporting multiplication of 2bit signed numbers and multibit weights, and the performance is more powerful.
Drawings
Fig. 1 is a schematic circuit diagram of an inmemory arithmetic circuit with signed multiplication provided in embodiment 1 of the present invention.
Fig. 2 is a circuit diagram of the arithmetic unit in each column in fig. 1.
Fig. 3 is a circuit diagram of an arithmetic unit in which the weight storage section provided in embodiment 1 of the present invention includes a plurality of SRAM cells.
Fig. 4 is a circuit diagram of an inmemory arithmetic circuit including signed multiplication of multiple columns of arithmetic units provided in embodiment 1 of the present invention.
Fig. 5 is a schematic diagram of a CIM chip according to embodiment 2 of the present invention.
FIG. 6 is a signal diagram of the calculated bit line CBL of the multiplication stage of the number of symbols and the single bit weight in the test experiment.
FIG. 7 is a signal diagram of the calculated bit line CBL of the symbol number "11" and the 4bit weight in the multiplication stage in the test experiment.
FIG. 8 is a signal diagram of the calculated bit line CBL of the symbol number "01" and the 4bit weight in the multiplication stage in the test experiment.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
The present embodiment provides an inmemory operation circuit with sign multiplication, as shown in fig. 1, which includes at least one row of operation units, and each row of operation units includes a weight storage portion and a calculation portion. Wherein the weight storage part adopts an SRAM cell with double word lines. In the practical circuit of this embodiment, the SRAM cell may be a classical 6TSRAM cell, or may be another SRAM cell with dual word lines, such as 8TSRAM, 10TSRAM, and 12TSRAM, which is obtained by further adding MOS transistors on the basis of the 6TSRAM cell and then upgrading the MOS transistors.
In this type of SRAM cell with dual word lines employed in this embodiment, at least one inverted latch structure with two storage nodes Q and QB is included, and two transfer tubes for connecting bit lines BL and BLB are located on both sides of the inverted latch structure. In the scheme of this embodiment, the sources of the two transmission pipes in the SRAM cell are connected to the storage nodes Q and QB, the drains are respectively connected to the two bit lines BL and BLB, and the gates are respectively connected to two independent word lines, which are respectively denoted WLL and WLR in this embodiment.
For example, when the weight storage section of the present embodiment adopts a 6TSRAM cell, it includes two NMOS transistors N1 and N2, and two inverters INV0 and INV1. As shown in fig. 2, the circuit connection relationship is: the input end of INV0 and the output end of INV1 are connected with the source electrode of N1 and serve as a storage node Q. The output end of INV0 and the input end of INV1 are connected with the source electrode of N2 and serve as a storage node QB. The drains of N1 and N2 are connected to bit lines BL and BLB, respectively, and the gates of N1 and N2 are connected to word lines WLL and WLR, respectively.
As shown in fig. 2, the computing part in each column of computing units in the present embodiment is composed of two NMOS transistors N3, N4, two PMOS transistors P1, P2, and a capacitor C; the circuit connection relation is as follows:
The drains of P1 and N3 are connected to the computation bit line CBL; the gate of N3 is connected with the bit line BL, and the gate of P1 is connected with the bit line BLB; the source electrode of N3 is connected with the drain electrode of N4; the source electrode of the P1 is connected with the drain electrode of the P2; the grid electrode of N4 is connected with an input word line INN; the grid electrode of P2 is connected with an input word line INP; the source electrode of N4 is connected with VSS; the source electrode of P2 is connected with VDD; one end of the capacitor C is connected to the computation bit line CBL, and the other end is connected to VSS.
As can be seen in connection with the circuit diagram: in the calculation section of the present embodiment, when the storage node Q and the input word line INN are both high, the calculation bit line CBL may be connected to the ground VSS through N3 and N4, thereby forming a discharging path. When both the storage node QB and the input word line INP are at high level, the computation bit line CBL may be connected to the power supply terminal VDD through P1 and P2, thereby forming a charging path.
In particular, in order to ensure uniformity of chargedischarge characteristics in the charge path and the discharge path, the transistors N3 and N4 employed in the calculation section of the present embodiment have the same aspect ratio; the aspect ratio of transistors P1 and P2 is also the same.
Based on the above working principle of the computing part in the circuit, in the inmemory computing circuit with sign multiplication provided in this embodiment, the computing unit of each column can implement multiplication operation between the first operand with sign 2 bits and the second operand with no sign 1 bit. In detail, the operation logic of the circuit scheme for performing multiplication is as follows:
1. the second operand is prestored in the SRAM cell, the computation bit line CBL is precharged to an intermediate potential.
Taking the 6TSRAM based circuit scheme shown in fig. 2 as an example, if the second operand in the multiplication operation to be performed is "0", the storage node Q is set low and QB is set high by the original data storage function of the 6TSRAM cell. Conversely, if the second operand in the multiplication operation to be performed is "1", the storage node Q is set high and QB is set low by the original data storage function of the 6TSRAM cell.
In addition, the bit line level of the calculated bit line CBL needs to be precharged to an intermediate potential before operation, and in this embodiment, the "intermediate potential" refers to an intermediate potential with respect to the power supply voltage VDD and the ground terminal VSS. Specifically, this embodiment precharges the CBL to 1/2VDD at the initial stage.
2. The first operand of the input is encoded by the level states of WLL, WLR, INN and INP.
In the present embodiment, the input of the first operand is realized by inputting control signals of different level states to the word lines WLL, WLR in the weight storage section and the input word lines INN and INP of the calculation section. Specifically, in combination with the working principle of the circuit, the encoding rule of the second operand in this embodiment is as follows:
(1) When the first operand in the multiplication operation is "+1", WLL, INN and INP are set low and WLR is set high.
(2) When the first operand in the multiplication operation is "1", WLL, INN and INP are set high and WLR is set low.
(3) When the first operand in the multiplication operation is "0", WLL and INN are set low, and WLR and INP are set high.
3. In the inmemory arithmetic circuit with signed multiplication provided in this embodiment, after the input of the first operand and the second operand is completed, the product of the first operand and the second operand is reflected on the change of the bit line voltage of the computation bit line CBL.
Specifically, in the circuit of this embodiment, the on state of each MOS transistor between CBL and VSS at low level and VDD at high level is affected due to the level state of Q, QB, WLL, WLR, INN, INP. And forms a charge path and a discharge path between CBL and VDD or VSS, thereby causing the bit line voltage of CBL at "intermediate potential" to rise or fall. Whereas the change in bit line voltage on CBL can just characterize the final product.
Specifically, in the multiplication operation performed by the circuit of the present embodiment, when the bit line voltage of the calculated bit line CBL rises, the product is expressed as "+1"; when the bit line voltage of the calculated bit line CBL drops, the product is represented as "1"; when the bit line voltage of the calculated bit line CBL remains unchanged, the product is represented as "0".
In order to make the principle and performance of the inmemory operation circuit with signed multiplication provided in this embodiment clearer, the following describes in detail the operation process of the circuit with reference to 6 different operation processes of different first operands and second operands:
1、（+1）×1
first, a second operand "1" is prestored in the 6TSRAM cell, and CBL is precharged to VDD/2. At this time, the storage node Q in the 6TSRAM cell is high and QB is low.
Then, WLL is set to low level and WLR is set to high level, at this time, the N1 pipe is kept turned off, the N2 pipe is turned on, the data of QB terminal is transferred to the gate terminal of the P1 pipe through the N2 pipe, and the P1 pipe is turned on. At the same time, the INN is set low and the INP is set low, so that the P2 pipe is also opened. At this time, the charge path between CBL and VDD is turned on, and since N3 and N4 fail to be turned on, the discharge path between CBL and VSS remains off. Thus, the bit line voltage on the calculated bit line CBL will gradually rise from VDD/2 to VDD. The bit line voltage of CBL increases, indicating that the product results in (+ 1).
Namely, the operation is completed: (+1) ×1= (+1).
2、（1）×1
First, a second operand "1" is prestored in the 6TSRAM cell, and CBL is precharged to VDD/2. At this time, the storage node Q in the 6TSRAM cell is high and QB is low.
Then, WLL is set high, WLR is set low, at this time, the N1 pipe is turned on, the N2 pipe remains turned off, the data at the Q end is transferred to the gate end of the N3 pipe through the N1 pipe, and the N3 pipe is turned on. At the same time, the INN terminal is set high and the INP terminal is set high, so that the N4 pipe is also opened. At this time, a discharging circuit between CBL and VSS is turned on; and since P1 and P2 fail to turn on, the charge path between CBL and VDD is closed. Thus, the bit line voltage on the calculated bit line CBL gradually decreases from VDD/2 to VSS. The bit line voltage of CBL decreases, indicating that the product results in (1).
Namely, the operation is completed: (+1) ×1= (+1).
3、（+1）×0
First, a second operand "0" is prestored in the 6TSRAM cell, and CBL is precharged to VDD/2. At this time, the storage node Q in the 6TSRAM cell is low and QB is high.
Then, WLL is set to low level and WLR is set to high level, at this time, the N2 pipe is turned on, the N1 pipe is kept turned off, the data of QB terminal is transferred to the gate terminal of the P1 pipe through the N2 pipe, and the P1 pipe is turned off. At the same time, the INN terminal is set low and the INP terminal is set low, so that the N4 pipe is also closed. In this state, since both N4 and P1 are turned off, the discharging circuit between CBL and VSS is turned off, and the charging path between CBL and VDD is also turned off. Thus, calculating the bit line voltage on bit line CBL keeps the current level state unchanged, indicates that the product result is 0.
Namely, the operation is completed: (+1) ×0=0.
4、（1）×0
First, a second operand "0" is prestored in the 6TSRAM cell, and CBL is precharged to VDD/2. At this time, the storage node Q in the 6TSRAM cell is low and QB is high.
Then, WLL is set high, WLR is set low, at this time, the N1 pipe is turned on, the N2 pipe remains turned off, the data at the Q end is transferred to the gate end of the N3 pipe through the N1 pipe, and the N3 pipe is turned off. At the same time, the INN terminal is set high and the INP terminal is set high, so that the P2 pipe is also closed. In this state, since both N3 and P2 are turned off, the discharging circuit between CBL and VSS is turned off, and the charging path between CBL and VDD is also turned off. Thus, calculating the bit line voltage on bit line CBL keeps the current level state unchanged, indicates that the product result is 0.
Namely, the operation is completed: (1) ×0=0.
5、0×1
First, a second operand "1" is prestored in the 6TSRAM cell, and CBL is precharged to VDD/2. At this time, the storage node Q in the 6TSRAM cell is high and QB is low.
WLL is then set low and WLR is set low, at which point both N1 and N2 are turned off. At the same time, the INN terminal is set to a low potential and the INP terminal is set to a high potential. In this state, the discharging circuit between CBL and VSS is turned off, and the charging path between CBL and VDD is also turned off. Thus, calculating the bit line voltage on bit line CBL keeps the current level state unchanged, indicates that the product result is 0.
Namely, the operation is completed: 0×1=0.
6、0×0
First, a second operand "0" is prestored in the 6TSRAM cell, and CBL is precharged to VDD/2. At this time, the storage node Q in the 6TSRAM cell is low and QB is high.
WLL is then set low and WLR is set low, at which point both N1 and N2 are turned off. At the same time, the INN terminal is set to a low potential and the INP terminal is set to a high potential. In this state, the discharging circuit between CBL and VSS is turned off, and the charging path between CBL and VDD is also turned off. Thus, calculating the bit line voltage on bit line CBL keeps the current level state unchanged, indicates that the product result is 0.
Namely, the operation is completed: 0×0=0.
In combination with the above procedure, a truth table summarizing the inmemory operation procedure of signed multiplication performed by the signed multiplication circuit provided in this embodiment is shown in the following table 1:
Table 1: truth table for operation process of inmemory operation circuit with sign multiplication
It can be found in combination with the above truth table that: the inmemory operation circuit comprising the weight storage part and the calculation part for signed multiplication provided by the embodiment can completely realize multiplication operation of 2bit numbers with symbols and 1bit numbers without symbols.
In a further optimized solution of the present embodiment, as shown in fig. 3, the weight storage portion of each column of the operation units includes a plurality of SRAM cells arranged in columns; each SRAM cell is connected to the same set of bit lines BL and BLB; each SRAM cell is also connected to an independent word line WLL and WLR of the corresponding row; the same column of SRAM cells share the same circuitry of the computation portion.
In the circuit design shown in fig. 3, the SRAM cells of each row in the same column and the lower computing part can form a basic unit for performing the task of signed multiplication, so that the area overhead in the circuit can be greatly saved. In addition, the SRAM units in the same column share a design of a computing part, and the SRAM units also allow different second operands to be respectively prestored to different rows in a prestoring stage, and then different rows are sequentially started, so that different multiplication operations are completed, and the operation efficiency is improved. Further, after the design is applied to an array containing a large number of SRAM units, the second operand of another operation task can be prestored in the SRAM units of other different rows and different columns in the process that one SRAM unit participates in multiplication operation, so that the working efficiency of the circuit for processing largescale similar logic operation tasks is improved.
In a further optimized scheme of this embodiment, as shown in fig. 4, the inmemory operation circuit with signed multiplication may further include multiple columns of operation units, where N is equal to or greater than 2 in the number of columns of operation units in the circuit. At this time, the capacitance of the capacitor C mounted on each column calculating section is adjusted, and a transmission gate TG for connecting the calculation bit lines CBL of the two is provided between the adjacent column calculating units. It is also possible to distinguish between the weights of the second operands of the multiplication operations performed in the different columns of arithmetic units.
The principle of distinguishing the second operand weight is that: when the size of the mounted capacitors C in the different columns are set to 8C,4C,2C,1C, respectively, their charge amounts remain unchanged to 8q,4q,2q,1q, respectively. On this basis, the operation results in the operations in different columns are reflected on the respective calculated bit lines CBL, and although the bit line voltage of each column has only three states of VDD, VDD/2 and VSS, when the CBLs of different columns carrying different capacitors are connected by the transmission gate, the capacitors on different columns will share charges, so that the bit line voltage on the calculated bit lines CBL finally presents more voltage states of different levels, and these different levels of voltages can be used to represent the product of the 2bit first operand and the Nbit second operand.
Based on the circuit scheme in fig. 4, the inmemory operation circuit including signed multiplication of multiple columns of operation units provided in this embodiment realizes the operation procedure of multiplying the 2bit first operand and the Nbit second operand as follows:
(1) Disconnecting transmission gates TG among operation columns; and precharging the computation bit lines CBL of two columns each to an intermediate potential.
(2) The second operand of Nbit is decomposed into N singlebit numbers according to the bits, and each singlebit number is prestored in weight storage parts of different columns according to the corresponding weights.
(3) Synchronizing input of the encoded first operand to all selected columns of SRAM cells through WLL, WLR, INN and INP; and then completing multiplication operation of one bit in the first operand and the second operand in each column;
(4) Closing transmission gates among operation columns, wherein the product of the 2bit first operand and the Nbit second operand is reflected on the change of bit line voltage of a calculation bit line CBL; the change direction of the bit line voltage of the CBL reflects the sign of the product, and the change amplitude of the CBL reflects the numerical value of the product.
Taking the multiplication operation of 2 bits by 2 bits as an example, the operation process needs two rows of operation units, wherein the capacitance mounted on one operation unit is 1C, and the capacitance mounted on the other operation unit is 2C. At this time, BLK on which 1C capacitor is mounted is a loworder operation column, and an operation unit on which 2C capacitor is mounted is a highorder operation column.
Let the operation process be "+1×11", and the product result be "+3". In this case, the bit line voltages on the CBL of the two operation units are VDD before charge sharing, and the bit line voltage of the CBL after charge sharing is still VDD, Δv=vdd/2.
Let the operation process be "+1×10", and the product result be "+2". In this case, in the circuit, the bit line voltage on the CBL of the loworder arithmetic unit is VDD/2 before charge sharing, and the bit line voltage on the CBL of the highorder arithmetic unit is VDD before charge sharing. Considering that the capacitance of the loworder arithmetic unit is 1C and the capacitance of the highorder arithmetic unit is 2C, the bit line voltage of CBL after charge sharing is 5VDD/6, Δv=vdd/3.
Let the operation process be "+1×01", and the product result be "+2". In this case, in the circuit, the bit line voltage on the CBL of the loworder arithmetic unit is VDD before charge sharing, the bit line voltage on the CBL of the loworder arithmetic unit is VDD/2 before charge sharing, and considering that the capacitance of the loworder arithmetic unit is 1C and the capacitance of the highorder arithmetic unit is 2C, the bit line voltage of the CBL after charge sharing is 2VDD/3, Δv=vdd/6.
Let the operation process be "+1×00", and the product result be "+2". In this case, in the circuit, the bit line voltages on the CBL of the two operation units are VDD/2 before charge sharing, and the bit line voltages after charge sharing are still VDD/2, Δv=0.
From this, it can be seen that when the product results are +3, +2, +1, and 0 steps down, the variation DeltaV of the bit line voltage on the CBL after charge sharing is also VDD/2, VDD/3, VDD/6, and 0 steps down, and the amount of decrease per stage is VDD/6.
The rule is summarized as follows: in the charge sharing mechanism of the present embodiment, when the product result includes 2 ^{M} cases, each arithmetic unit may divide the amount of change in the bit line voltage of CBL (from VDD/2 to VDD) into 2 ^{M} different gradients, and establish a mapping relationship between Δv of the different gradients and the digital amount of the different product result.
The above description is given by using the example that the number of 2bit band symbols is positive, and based on the same principle, the same rule should be provided when the number of 2bit band symbols is negative. Similarly, when more operation units are used to perform multiplication operations of a higher bit second operand, the correlation law should be satisfied.
Therefore, in the scheme of the embodiment, after the capacitors C with different sizes are mounted on the respective operation units, under the charge sharing mechanism of the embodiment, the multiplication result of the 2bit signed first operand and the Mbit unsigned second operand is reflected on the variation of the bit line voltage of the calculated bit line CBL. The variation direction and specific numerical value of the bit line voltage on the chargeshared CBL are quantized through the successive approximation ADC, so that the digital quantity of different operation results can be accurately obtained.
Example 2
On the basis of embodiment 1, this embodiment further provides a CIM chip in which the inmemory arithmetic circuit of signed multiplication as in embodiment 1 is integrated. As shown in fig. 5, the CIM chip includes an n×n SRAM array, and a calculation module including N circuits as the calculation part in embodiment 1 is disposed under the array; the computing bit lines CBL of all the unit circuits in the computing module are sequentially connected in series through N1 transmission gates; and the capacitance values of the capacitors C mounted on each unit circuit are distributed in steps of 8, 4, 2 and 1. In addition, various peripheral circuits related to realizing data reading, writing and maintaining functions based on the SRAM array are also included in the CIM chip,
The CIM chip provided by the embodiment has the same data processing function as the SRAM chip, and can be used for executing multiplication operation of signed numbers and singlebit or multibit numbers. And thus is suitable for performing data processing tasks of neural networks such as CNN and DNN in artificial intelligence.
Performance testing
In order to further verify the performance of the inmemory arithmetic circuit with signed multiplication provided by the present invention, a technician makes an experimental plan to perform a simulation experiment on the function of the circuit shown in fig. 5:
1. multiplication of signed numbers with single bit weights
In this embodiment, first, a circuit of one 6TSRAM cell and its corresponding computing part is used as an experimental object, and multiplication operation of multiplying a 2bit signed number by a 1bit weight is performed, so as to verify the operation performance of the circuit when executing signed number multiplication (+1x1 and1x1). Wherein the precharge voltage of CBL before calculation (2 ns before) is set to VDD/2.
The signal change of the calculated bit line CBL during the experiment is shown in fig. 6. Analysis of the signal flow diagram in fig. 6 can find that: starting from 2ns, the circuit starts 2bit signed number by 1bit weight calculation. When the number of 2bit signed signals is '11' (refer to1), WLL is set to high potential, WLR is set to low potential, the sign bit represents negative, the 1bit weight is '1', and CBL is discharged to VSS. When the number of 2bit signed signals is '01' (refer to +1), WLL is set to low potential, WLR is set to high potential, the sign bit indicates positive and 1bit weight is '1', and CBL is electrified to VDD.
2. Multiplication of signed numbers (1) with 4 bit weights
The experiment further uses four operation units in the circuit as operation objects to calculate the weight of multiplying the number of the 2bit band symbol by the weight of 4 bit. In the circuit, 3 transmission gates connect CBL of 4 operation units based on 6TSRAM units in the same row together, and the sizes of the capacitors C mounted in various types are 8C, 4C, 2C and 1C respectively. In this operation, VDD is set to 900 mV), CBL reaches a set voltage of 450mV before 2 ns.
All calculation tasks from 11 to 0000 to 1111 are sequentially executed in the experimental process, the circuit starts to calculate the weight of 2bit with the number of symbols to 4bit at 2ns, and the charge sharing is started at 2.2 ns: the signal flow diagram of the resulting CBL is shown in fig. 7.
The data in the observation graph can be found that: when the number of 2bit signed marks is 11 (WLL is set to high potential, WLR is set to low potential, sign bit represents negative), the weight of 4bit is 0000, four CBLs keep 450mV unchanged, and CBLs after charge sharing are 448.55mV; the weight of 4 bits is 0001, and the CBL after charge sharing is 421.32mV; the weight of 4 bits is 0010, and the CBL after charge sharing is 391.05mV; the weight of 4 bits is 0011, and the CBL after charge sharing is 358.97mV; the weight of 4 bits is '0100', and the CBL after charge sharing is 331.07mV; the weight of 4 bits is '0101', and the CBL after charge sharing is 300.72mV; the weight of 4 bits is 0110, and the CBL after charge sharing is 268.75mV; the weight of 4 bits is 0111, and the CBL after charge sharing is 243.42mV; the weight of 4 bits is 1000, and CBL after charge sharing is 209.92mV; the weight of 4 bits is '1001', and CBL after charge sharing is 181.73mV; the weight of 4 bits is '1010', and the CBL after charge sharing is 150.36mV; the weight of 4 bits is 1011, and CBL after charge sharing is 119.17mV; the weight of 4 bits is 1100, and the CBL after charge sharing is 91.92mV; the 4bit weight is 1101, and the CBL after charge sharing is 62.03mV; the 4bit weight is 1110, and the CBL after charge sharing is 33.25mV; the 4bit weight is "1111", and the CBL after charge sharing is 1.99mV.
Analysis of the data showed that: in the multibit multiplication operation process, the fluctuation of the difference value of each operation result is within the error allowable range, the circuit has better linearity when discharging, and the reliability of the circuit operation result is higher.
3. Multiplication of signed numbers (+1) with 4 bit weights
The experiment continues with four operation units in the circuit as operation objects, and the calculation of the number of the 2bit band symbols multiplied by the weight of 4 bits is carried out. In the circuit, 3 transmission gates connect CBL of 4 operation units based on 6TSRAM units in the same row together, and the sizes of the capacitors C mounted in various types are 8C, 4C, 2C and 1C respectively. In this operation, VDD is set to 900 mV), CBL reaches a set voltage of 450mV before 2 ns.
All calculation tasks from '01' to '0000' to '1111' are sequentially executed in the experimental process, the circuit starts to calculate the number of 2bit band symbols to 4bit weight at 2ns, and charge sharing is started at 2.2 ns: the signal flow diagram of the resulting CBL is shown in fig. 8.
Analysis of the data in the graph may reveal:
When the number of 2bit signed signals is '01' (WLL is set to low potential, WLR is set to high potential, sign bit represents positive), the weight of 4bit is '0000', four CBLs keep 450mV unchanged, and the CBL after charge sharing is 450mV; the weight of 4 bits is 0001, and the CBL after charge sharing is 481.25mV; the weight of 4 bits is 0010, and the CBL after charge sharing is 510.02mV; the weight of 4 bits is 0011, and the CBL after charge sharing is 539.11mV; the weight of 4 bits is '0100', and the CBL after charge sharing is 570.07mV; the weight of 4 bits is '0101', and the CBL after charge sharing is 599.48mV; the weight of 4 bits is 0110, and the CBL after charge sharing is 630.58mV; the weight of 4 bits is 0111, and the CBL after charge sharing is 661.94mV; the weight of 4 bits is 1000, and CBL after charge sharing is 688.31mV; the weight of 4 bits is '1001', and CBL after charge sharing is 719.83mV; the weight of 4 bits is '1010', and the CBL after charge sharing is 752.36mV; the weight of 4 bits is 1011, and CBL after charge sharing is 780.66mV; the weight of 4 bits is '1100', and CBL after charge sharing is 811.02mV; the weight of 4 bits is 1101, and the CBL after charge sharing is 840.83mV; the weight of 4 bits is 1110, and CBL after charge sharing is 868.52mV; the 4bit weight is 1111, and the CBL after charge sharing is 898.87mV.
In the multibit multiplication operation process, the fluctuation of the difference value of each operation result is within the error allowable range, the circuit also has better linearity during charging, and the reliability of the operation result of the circuit is higher.
The technical features of the abovedescribed embodiments may be arbitrarily combined, and all possible combinations of the technical features in the abovedescribed embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
Claims (9)
1. An inmemory arithmetic circuit with signed multiplication, characterized in that it comprises at least one column of arithmetic units, each column of arithmetic units comprising a weight storage part and a calculation part; wherein, the weight stores the part, it adopts SRAM cell with double word lines; the drains of two transmission pipes in the SRAM unit are respectively connected to two bit lines, and the gates of the two transmission pipes are respectively connected to two word lines WLL and WLR; the computing part comprises two NMOS tubes N3 and N4, two PMOS tubes P1 and P2 and a capacitor C; the circuit connection relation is as follows:
The drains of P1 and N3 are connected to the computation bit line CBL; the gate of N3 is connected with the bit line BL, and the gate of P1 is connected with the bit line BLB; the source electrode of N3 is connected with the drain electrode of N4; the source electrode of the P1 is connected with the drain electrode of the P2; the grid electrode of N4 is connected with an input word line INN; the grid electrode of P2 is connected with an input word line INP; the source electrode of N4 is connected with VSS; the source electrode of P2 is connected with VDD; one end of the capacitor C is connected to the computation bit line CBL, and the other end is connected to VSS;
The operation unit of each column is used for realizing multiplication operation between a signed 2bit first operand and an unsigned second operand; the operation logic for performing the multiplication operation is:
Prestoring a second operand in the SRAM cell; calculating the bit line CBL to be precharged to an intermediate potential; encoding the input first operand by the level states of WLL, WLR, INN and INP; the product of the first operand and the second operand is reflected in a change in the bit line voltage of the computation bit line CBL;
When the first operand in the multiplication operation is "+1", WLL, INN and INP are set to low level, and WLR is set to high level; when the first operand in the multiplication operation is "1", WLL, INN and INP are set to high level, and WLR is set to low level; when the first operand in the multiplication operation is "0", WLL and INN are set low, and WLR and INP are set high.
2. The inmemory arithmetic circuit of signed multiplication of claim 1, wherein: the SRAM unit adopts a 6TSRAM unit or other SRAM units with double word lines, which are obtained by adding MOS tubes on the basis of the 6TSRAM unit.
3. The inmemory arithmetic circuit of signed multiplication of claim 2, wherein: the 6TSRAM unit comprises two NMOS transistors N1 and N2 and two inverters INV0 and INV1; the circuit connection relationship is as follows: the input end of the INV0 and the output end of the INV1 are connected with the source electrode of the N1 and serve as a storage node Q; the output end of the INV0 and the input end of the INV1 are connected with the source electrode of the N2 and serve as a storage node QB; the drains of N1 and N2 are connected to bit lines BL and BLB, respectively, and the gates of N1 and N2 are connected to word lines WLL and WLR, respectively.
4. The inmemory arithmetic circuit of signed multiplication of claim 3, wherein: during the multiplication operation, when the bit line voltage of the calculated bit line CBL rises, the product is expressed as "+1"; when the bit line voltage of the calculated bit line CBL drops, the product is represented as "1"; when the bit line voltage of the calculated bit line CBL remains unchanged, the product is represented as "0".
5. The inmemory arithmetic circuit of signed multiplication of claim 1, wherein: the weight storage part comprises a plurality of SRAM units which are arranged in columns; each SRAM cell is connected to the same set of bit lines BL and BLB; each SRAM cell is also connected to an independent word line WLL and WLR of the corresponding row; the same column of SRAM cells share the same circuitry of the computation portion.
6. The inmemory arithmetic circuit of signed multiplication of claim 1, wherein: in the calculation section, the aspect ratio of the transistors N3 and N4 is the same; the aspect ratio of transistors P1 and P2 is the same.
7. The inmemory arithmetic circuit of signed multiplication of claim 1, wherein: the weight of a second operand of multiplication operation executed in operation units of different columns is distinguished by adjusting the capacitance of a capacitor C in each column of calculation parts; the multiple of the capacitance C in each column relative to the unit capacitance is the weight of the second operand in the operation process.
8. The inmemory arithmetic circuit of signed multiplication of claim 7, wherein: the device comprises N rows of operation units, wherein the multiplying power of capacitance values of capacitors C in each row of operation units is 1, 2, 4, 8, … and 2 ^{N1} respectively; a transmission gate TG for connecting a calculation bit line CBL of each two adjacent columns of operation units is arranged between each two adjacent columns of operation units respectively;
The strategy for implementing multiplication operation of a 2bit first operand and a Nbit second operand based on a multicolumn arithmetic unit is as follows:
Disconnecting transmission gates among operation columns; and precharging the computation bit lines CBL of each two columns to an intermediate potential;
Decomposing a second operand of Nbit into N singlebit numbers according to bits, and prestoring each singlebit number into weight storage parts of different columns according to corresponding weights;
Synchronizing input of the encoded first operand to all selected columns of SRAM cells through WLL, WLR, INN and INP; and then completing multiplication operation of one bit in the first operand and the second operand in each column;
Closing transmission gates among operation columns, wherein the product of the 2bit first operand and the Nbit second operand is reflected on the change of bit line voltage of a calculation bit line CBL; the change direction of the bit line voltage of the CBL reflects the sign of the product, and the change amplitude of the CBL reflects the numerical value of the product.
9. A CIM chip, characterized in that: integrated with an inmemory arithmetic circuit of signed multiplication as claimed in any one of claims 1 to 8.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN202410735739.0A CN118312468B (en)  20240607  20240607  Inmemory operation circuit with symbol multiplication and CIM chip 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN202410735739.0A CN118312468B (en)  20240607  20240607  Inmemory operation circuit with symbol multiplication and CIM chip 
Publications (2)
Publication Number  Publication Date 

CN118312468A CN118312468A (en)  20240709 
CN118312468B true CN118312468B (en)  20240816 
Family
ID=91724870
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN202410735739.0A Active CN118312468B (en)  20240607  20240607  Inmemory operation circuit with symbol multiplication and CIM chip 
Country Status (1)
Country  Link 

CN (1)  CN118312468B (en) 
Citations (2)
Publication number  Priority date  Publication date  Assignee  Title 

CN117332832A (en) *  20231009  20240102  中科南京智能技术研究院  Signed calculating unit for inmemory calculation 
CN117608519A (en) *  20240124  20240227  安徽大学  Signed multiplication and multiplyaccumulate operation circuit based on 10TSRAM 
Family Cites Families (6)
Publication number  Priority date  Publication date  Assignee  Title 

US11269629B2 (en) *  20181129  20220308  The Regents Of The University Of Michigan  SRAMbased process in memory system 
US11176991B1 (en) *  20201030  20211116  Qualcomm Incorporated  Computeinmemory (CIM) employing lowpower CIM circuits employing static random access memory (SRAM) bit cells, particularly for multiplyandaccumluate (MAC) operations 
CN114512161B (en) *  20220419  20220705  中科南京智能技术研究院  Memory computing device with symbols 
KR20240015508A (en) *  20220727  20240205  연세대학교 산학협력단  Computation apparatus in memory capable of signed weight 
CN117219140B (en) *  20231103  20240130  安徽大学  Inmemory computing circuit based on 8TSRAM and current mirror 
CN117636945B (en) *  20240126  20240409  安徽大学  5bit signed bit AND OR accumulation operation circuit and CIM circuit 

2024
 20240607 CN CN202410735739.0A patent/CN118312468B/en active Active
Patent Citations (2)
Publication number  Priority date  Publication date  Assignee  Title 

CN117332832A (en) *  20231009  20240102  中科南京智能技术研究院  Signed calculating unit for inmemory calculation 
CN117608519A (en) *  20240124  20240227  安徽大学  Signed multiplication and multiplyaccumulate operation circuit based on 10TSRAM 
Also Published As
Publication number  Publication date 

CN118312468A (en)  20240709 
Similar Documents
Publication  Publication Date  Title 

US11335387B2 (en)  Inmemory computing circuit for fully connected binary neural network  
CN112951294A (en)  Computing device and computing method  
CN112558919B (en)  Memory computing bit unit and memory computing device  
CN113255904B (en)  Voltage margin enhanced capacitive coupling storage integrated unit, subarray and device  
CN113257306B (en)  Storage and calculation integrated array and accelerating device based on static random access memory  
CN115039177A (en)  Low power consumption inmemory compute bit cell  
CN116206650B (en)  8TSRAM unit and operation circuit and chip based on 8TSRAM unit  
US11372622B2 (en)  Timeshared computeinmemory bitcell  
CN114300012B (en)  Decoupling SRAM memory computing device  
CN110941185B (en)  Doubleword line 6TSRAM unit circuit for binary neural network  
US20230297235A1 (en)  Srambased cell for inmemory computing and hybrid computations/storage memory architecture  
CN117219140B (en)  Inmemory computing circuit based on 8TSRAM and current mirror  
CN115810374A (en)  Memory circuit and memory computing circuit with BCAM addressing and logic operation functions  
CN113936717A (en)  Storage and calculation integrated circuit for multiplexing weight  
CN114038492B (en)  Multiphase sampling memory internal computing circuit  
CN112116937A (en)  SRAM circuit structure for realizing multiplication and or logic operation in memory  
TW202238593A (en)  Computeinmemory with ternary activation  
CN117130978A (en)  Charge domain inmemory computing circuit based on sparse tracking ADC and computing method thereof  
Zhang et al.  Inmemory multibit multiplication based on bitline shifting  
CN118312468B (en)  Inmemory operation circuit with symbol multiplication and CIM chip  
CN116204490A (en)  7T memory circuit and multiplyaccumulate operation circuit based on lowvoltage technology  
CN115658010A (en)  Pulse width modulation circuit, quantization circuit, storage circuit and chip  
CN115312090A (en)  Memory computing circuit and method  
CN115472197A (en)  SRAM memory computing circuit capable of quantizing bit line voltage difference through redundant lines  
CN118298872B (en)  Inmemory computing circuit with configurable input weight bit and chip thereof 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
PB01  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination  
GR01  Patent grant  
GR01  Patent grant 