CN117056277A - Multiply-accumulate in-memory computing circuit for configuring self-adaptive scanning ADC (analog-to-digital converter) based on read-write separation SRAM (static random Access memory) - Google Patents

Multiply-accumulate in-memory computing circuit for configuring self-adaptive scanning ADC (analog-to-digital converter) based on read-write separation SRAM (static random Access memory) Download PDF

Info

Publication number
CN117056277A
CN117056277A CN202311050617.XA CN202311050617A CN117056277A CN 117056277 A CN117056277 A CN 117056277A CN 202311050617 A CN202311050617 A CN 202311050617A CN 117056277 A CN117056277 A CN 117056277A
Authority
CN
China
Prior art keywords
read
8tsram
multiply
circuit
accumulate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311050617.XA
Other languages
Chinese (zh)
Inventor
蔺智挺
李劲铮
吴秀龙
彭春雨
刘玉
戴成虎
赵强
卢文娟
胡薇
郝礼才
周永亮
李鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202311050617.XA priority Critical patent/CN117056277A/en
Publication of CN117056277A publication Critical patent/CN117056277A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7821Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/41Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming static cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger
    • G11C11/413Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Static Random-Access Memory (AREA)

Abstract

The invention belongs to the technical field of integrated circuits, and particularly relates to a multiply-accumulate in-memory computing circuit for configuring an adaptive scanning ADC (analog-to-digital converter) based on read-write separation SRAM (static random access memory) and a corresponding CIM (common information model) chip. The circuit comprises: a memory array, a row signal line, a column signal line, a mode control circuit, and a quantization circuit. The memory array is formed by arranging a plurality of 8TSRAM units according to an array. The row signal lines include WL, RWL, and SW; the column signal lines include BL, BLB, and RBL. The mode control circuit is used for switching the access states of the row signal lines and the column signal lines. The mode control circuit includes a row switch group and a column switch group. The row switch group is used for adjusting the connection port of the RBL. The column switches are used to adjust the ground state of the RWL wiring ports, respectively. And RWL and SW connected states. The quantization circuit is used for quantizing and outputting the result of the logic operation. The circuit has the functions of data storage and MAC, and overcomes the defects of the traditional scheme in the aspects of integration level, power consumption and energy efficiency.

Description

Multiply-accumulate in-memory computing circuit for configuring self-adaptive scanning ADC (analog-to-digital converter) based on read-write separation SRAM (static random Access memory)
Technical Field
The invention belongs to the technical field of integrated circuits, and particularly relates to a multiply-accumulate in-memory computing circuit for configuring an adaptive scanning ADC (analog-to-digital converter) based on read-write separation SRAM (static random access memory) and a corresponding CIM (common information model) chip.
Background
In recent years, the development of artificial intelligence technology has led to explosive growth of computing tasks to be processed, and edge devices are required to bear more and more mass data processing tasks. The conventional von neumann computing architecture adopts a design that a processor and a memory are separated, so that when performing high-density operation, the data movement of storage and computation consumes great resources, and is difficult to cope with the requirement of great power.
The in-memory computing (Computing in Memory) is a novel computing architecture, and the computing and the memory are fused together to reduce the transmission of data between the computing and the memory, so that the computing can be directly performed in the data transmission process, and the computing speed and the energy efficiency are improved. Among them, an in-Memory computing architecture implemented based on Static Random-Access Memory (SRAM) is one of the main directions explored in academia and industry. The memory computing design based on the SRAM can realize the basic read-write function of the SRAM, and can also realize the computing function by setting the bit line voltage, changing the cell structure and modifying the peripheral circuit. Some of the existing memory designs mainly realize the functions of Boolean operation, content addressable memory, multiplication accumulation and the like.
In-memory computation is currently mostly applied to a neural network accelerator, and convolution computation in the neural network is at the heart of the neural network accelerator, massive convolution computation is needed in the functional implementation process of the neural network, and the key algorithm of the convolution computation is matrix vector multiplication, including multiplication and accumulation computation, namely multiply-accumulate operation (Multiply Accumulate, MAC). Therefore, realizing energy-efficient and high-accuracy multiply-accumulate operations in-memory computing architecture is becoming one of the core research directions of in-memory computing technology.
The existing system architecture of the intra-village computing circuit capable of executing multiply-accumulate operation tasks is numerous, but most circuits are required to rely on complex peripheral circuits to realize management of input signals and quantization output of computing results, and the peripheral circuits improve power consumption when devices execute logic operation and reduce the integration level of chips. Resulting in a reduced processing rate of the chip and increased power consumption. Therefore, how to provide a more efficient multiply-accumulate intra-village calculation circuit is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In order to solve the problem that various multiplication and accumulation in-memory computing circuits in the prior art have defects in the aspects of integration level, power consumption and energy efficiency, the invention provides a multiplication and accumulation in-memory computing circuit based on a read-write separation SRAM configuration self-adaptive scanning ADC and a corresponding CIM chip.
The invention is realized by adopting the following technical scheme:
a multiply-accumulate in-memory computing circuit based on read-write separation SRAM configuration adaptive scan ADC has data read-write holding, multiplication and multiply-accumulate operation functions, is divided according to circuit functions, and comprises: a memory array, a row signal line, a column signal line, a mode control circuit, and a quantization circuit.
The memory array is formed by arranging a plurality of 8TSRAM units according to an array; each 8TSRAM is composed of 2 PMOS tubes P1-P2 and 6 NMOS tubes N1-N6, and P1, P2 and N1-N4 form a classical 6T memory unit with two memory nodes Q and QB. The grid electrode of N5 is connected with the storage node Q, and the drain electrode of N5 is connected with the source electrode of N6.
The row signal lines are connected to all 8TSRAM cells in the same row in the memory array, including word lines WL connected to the gates of each N1 and N2, read word lines RWL connected to the gates of each N6, and switch word lines SW connected to the sources of each N5.
The column signal lines are connected to all 8TSRAM cells in the same column in the memory array, including BL connected to the drain of each N1, BLB connected to the drain of each N2, and RBL connected to the drain of each N6.
The mode control circuit is used for switching the access states of the row signal lines and the column signal lines so as to adjust the working mode of the calculation circuit in the multiply-accumulate memory. The mode control circuit includes a row switch group and a column switch group. The row switch group is connected to the column signal lines of each column of the memory array, wherein the read bit line RBL is connected with the RBL port through a switch S3 and is connected with the NIN port through a switch S4. The column switch group is connected to a row signal line of each row of the memory array, wherein a read word line RWL is connected to the RWL port through a switch S1, and is connected to a sampling Current input port Current through a switch S5. The switching word line SW communicates Vss through the switch S2. The switch S4 is connected between the read word line RWL and the switching word line SW.
The quantization circuit is used for quantizing and outputting the result of multiplication or multiply-accumulate operation of the calculation circuit in the multiply-accumulate memory.
In the memory array, each 8TSRAM cell serves as a basic unit for performing data read, write, hold and multiply operations, and all 8TSRAM cells in the same row serve as basic units for performing multiply-accumulate operations.
As a further improvement of the present invention, the operation logic of each 8TSRAM cell to perform data write and data hold operations is as follows:
first, the switches S1 to S6 connected to the corresponding 8TSRAM cells are turned off, i.e., switched to the data write mode. Then, the bit line BL or BLB of the corresponding column is precharged to a high level or a low level according to data to be written to the storage node Q or QB. Finally, the word line WL of the 8TSRAM unit of the corresponding row is started to finish the writing of the data.
After the data writing is completed, the word line WL is restored to a low level, and the bit lines BL and BLB are disconnected from the storage node, i.e., enter a data hold state.
As a further improvement of the present invention, the operating logic of each 8TSRAM cell to perform a data read operation is as follows:
first, the switches S1 to S3 connected to the corresponding 8TSRAM cells are closed, and the switches S4 to S6 are opened, that is, switched to the data read mode. Then, the read bit lines RBL of the corresponding column are precharged to a high level through the RBL ports. Then, the read word line RWL of the corresponding row is set high through the RWL port. Finally, when the read bit line RBL is maintained at a high level, the data stored in the storage node Q corresponding to the 8TSRAM cell is "1", and when the read bit line RBL is lowered to a low level, the data stored in the storage node Q corresponding to the 8TSRAM cell is "0".
As a further improvement of the present invention, the operation logic of each 8TSRAM cell to perform the multiplication operation is as follows:
first, in the data hold state, the value of the storage node Q is taken as one of the operands. Then, the switches S1 to S3 connected to the corresponding 8TSRAM units are opened, and the switches S4 to S6 are closed, that is, switched to the operation mode. Then, a sampling Current is input to the write word line RWL through the Current port, and an inverted value of another operand IN of the multiplication is input through the NIN port. And finally, taking the conduction state of whether the sampling current can flow through S4, N5 and N6 and flow out through the IN port as a value representing the multiplication result.
As a further improvement of the invention, the multiplication result in the 8TSRAM cell can also pass the calculated voltage V generated on the connected read word line RWL RWL And (3) carrying out quantification:
(1) When calculating the voltage V RWL Initial voltage V of read word line RWL before relative operation RWL0 And (3) descending, representing the multiplication result as 1.
(2) When calculating the voltage V RWL Initial voltage V of read word line RWL before relative operation RWL0 And when the result is unchanged, the characterization multiplication result is 0.
As a further improvement of the invention, in the multiply-accumulate in-memory computing circuit of the self-adaptive scanning ADC based on the read-write separation SRAM configuration, the operation logic of executing multiply-accumulate operation of all 8TSRAM units in each row is as follows:
First, in the data hold state, the value of the storage node Q of each 8TSRAM cell in the same row is taken as one of the operands. Then, the switches S1 to S3 connected to the 8TSRAM cells are opened, and the switches S4 to S6 are closed. Then, a sampling Current is input to the write word line RWL through the Current port, and an inverted value NIN of another operand IN is input through the NIN port of each 8-ts ram cell IN the same row. Finally, the number m of 8TSRAM cells connected to the same read word line RWL in the on state is used as a value representing the multiplication and accumulation operation result.
As a further improvement of the invention, the quantization circuit comprises a replica line, a current steering DAC, an ADC logic control circuit, a comparator, and a sampling controller.
The duplication line is formed by arranging a plurality of duplication units according to the line, each duplication unit is formed by two NMOS tubes, and is formed by connecting a circuit connection mode of N5 and N6 in an 8TSRAM unit. The grid electrode of each N5 in the replica row is connected with a signal line which is always in a high level; the other ports are respectively connected with the read word line RWL, the read bit line RBL and the switching word line SW corresponding to the duplicated rows.
The current steering DAC is used for synchronously outputting a single-time reference current and a sampling current with a current value equal to multiple times of the reference current according to the received time sequence control signals in each continuous period. The reference Current is input to the read word line RWL of the replica row, and the sampling Current is input to the sampling Current input ports Current of the respective rows of the memory array.
The ADC logic control circuit is used for sequentially generating time sequence control signals for controlling the current steering DAC according to the clock signals. After receiving the time sequence control signal, the current steering DAC sequentially generates a required single-time reference circuit and sampling current with gradually increased multiplying power. The ADC logic control circuit is also used for receiving the output of one comparator as a feedback signal and terminating the generation process of the time sequence control signal according to the feedback signal.
The comparator comprises two inputs and one output; one of the input ports is connected to the read word line RWL of each row in the memory array, and the other input port is connected to the read word line RWL of the replica row. The output of the comparator is used as a feedback signal required by the ADC logic control circuit.
The sampling controller is composed of a plurality of sampling switches Pre_calcualte, and one sampling switch Pre_calcualte is connected between each read word line RWL and the input port of the comparator; the sampling switch is opened when the current steering DAC outputs a reference current or a sampling current required by logic operation; and closed after the logic operation is finished to allow the comparator to quantize;
as a further improvement of the invention, the ADC logic control circuit comprises a timing control module and a result transfer module, the timing controller module being configured to generate the required timing control signal. The result transfer module is used for sending a termination signal to the time sequence control module when the acquired feedback signal is overturned, and outputting a value obtained by subtracting one from the current sequence number m of the time sequence control signal as a result of multiply-accumulate operation.
As a further improvement of the present invention, the circuit connection relationship of the 6T memory cell is as follows:
p1 and N3 form one inverter, and P2 and N4 form the other inverter; the two are inversely cross-coupled to form storage nodes Q and QB. The storage node Q is connected to the bit line BL through a transmission pipe N1, and the storage node QB is connected to the bit line BLB through a transmission pipe N2. The gates of N1 and N2 are connected to word line WL, the sources of P1 and P2 are connected to VDD, and the sources of N3 and N4 are connected to VSS.
The invention also comprises a CIM chip which is formed by packaging the multiply-accumulate in-memory computing circuit of the self-adaptive scanning ADC based on the read-write separation SRAM configuration.
The technical scheme provided by the invention has the following beneficial effects:
on the basis of a classical read-write separated 8TSRAM, the invention utilizes reasonably configured word lines and bit lines to match with each switch in a mode control circuit, and can adjust the element connection state and signal path of the circuit. Thereby enabling each 8TSRAM to execute complete data storage functions and multiplication functions in different modes. The improved circuit of the invention uses the conduction state of the computing unit in the 8TSRAM as the multiplication result, and uses the voltage of the read word line in the same row as the state data representing the multiplication accumulation operation.
The scheme of the invention is a transposed multiply-accumulate in-memory computing circuit operated by rows, and the circuit, the operation logic and the quantization method are completely different from the conventional scheme. The invention also designs an adaptive scanning ADC as a quantization circuit aiming at the special logic circuit operated by rows. The quantization circuit realizes accurate quantization of the nonlinear calculation result through a strategy of fixing the reference voltage and enabling the calculation voltage representing the calculation result to be doubled successively.
The brand new architecture designed by the invention eliminates a plurality of non-ideal effects caused by the voltage mode in the traditional in-memory calculation and improves the energy efficiency and the accuracy of the full parallel multiply-accumulate calculation. The self-adaptive scanning ADC creatively designed on the quantization mode can give consideration to the area and the integration level of the chip and improve the power consumption and the energy efficiency of the chip.
The scheme of the invention can complete high-precision multiply-accumulate operation based on the classical architecture of the SRAM circuit, thereby having wide prospect in the aspect of neural network application.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
Fig. 1 is a circuit diagram of a multiply-accumulate in-memory computing circuit of an adaptive scan ADC configured based on a read-write split SRAM according to embodiment 1 of the present invention. Fig. 2 is a circuit diagram of an 8TSRAM cell used in the multiply-accumulate in-memory calculation circuit in embodiment 1 of the present invention. FIG. 3 is a signal flow diagram of the multiply-accumulate in-memory computing circuit according to embodiment 1 of the present invention when performing data read/write operations; the left two columns in the figure are performing data read operations and the right two columns are performing data write operations. Fig. 4 is an equivalent circuit diagram of the multiply-accumulate in-memory computing circuit provided in embodiment 1 of the present invention in a logic operation state. The left side is the equivalent circuit of each 8TSRAM unit, and the right side is the equivalent circuit diagram of the whole multiply-accumulate in-memory computing circuit. Fig. 5 is a circuit schematic of the quantization circuit part designed in embodiment 1 of the present invention. Fig. 6 is a signal timing chart of the quantization circuit designed in embodiment 1 of the present invention when sampling and quantizing the calculation result. Fig. 7 is a chip architecture diagram of a CIM chip provided in embodiment 1 of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
The embodiment provides a multiply-accumulate in-memory computing circuit of an adaptive scanning ADC based on read-write separation SRAM configuration, which has the functions of data read-write holding, multiplication and multiply-accumulate operation. As shown in fig. 1, the multiply-accumulate in-memory computing circuit includes: a memory array, a row signal line, a column signal line, a mode control circuit, and a quantization circuit.
The memory array is formed by arranging a plurality of 8TSRAM units according to an array. The 8TSRAM in the embodiment is designed based on a widely used standard read-write separated SRAM device with 8 MOS transistors, and the conversion of a storage mode and a calculation mode is performed by a mode control circuit. Specifically, as shown in fig. 2, each 8TSRAM is composed of 2 PMOS transistors P1 to P2 and 6 NMOS transistors N1 to N6, and P1, P2, N1 to N4 constitute a classical 6T memory cell having two memory nodes Q and QB. P1 and N3 in the 6T memory cell form one inverter, and P2 and N4 form the other inverter; the two are inversely cross-coupled to form storage nodes Q and QB. The storage node Q is connected to the bit line BL through a transmission pipe N1, and the storage node QB is connected to the bit line BLB through a transmission pipe N2. The gates of N1 and N2 are connected to word line WL, the sources of P1 and P2 are connected to VDD, and the sources of N3 and N4 are connected to VSS. With this portion, data writing and data holding can be achieved. The other two MOS transistors N5 and N6 are used as parts for realizing the functions of data reading and logic operation. Specifically, the gate of N5 is connected to the drain of the storage node Q, N5 and the source of N6.
In order to more efficiently manage the data storage function and the logic operation function of the 8-TSRAM cells, the present embodiment provides 6 signal lines on each 8-TSRAM cell, which can be divided into 3 row signal lines and 3 column signal lines. Wherein the row signal lines are connected to all 8TSRAM cells in the same row in the memory array, including word lines WL connected to the gates of each N1 and N2, read word lines RWL connected to the gates of each N6, and switch word lines SW connected to the sources of each N5. The column signal lines are connected to all 8TSRAM cells in the same column in the memory array, including bit line BL connected to the drain of each N1, bit line BLB connected to the drain of each N2, and read bit line RBL connected to the drain of each N6.
The mode control circuit is used for switching the access states of the row signal lines and the column signal lines so as to adjust the working mode of the calculation circuit in the multiply-accumulate memory. The mode control circuit includes a row switch group and a column switch group. The row switch group is connected to the column signal lines of each column of the memory array, wherein the read bit line RBL is connected with the RBL port through a switch S3 and is connected with the NIN port through a switch S4. The column switch group is connected to a row signal line of each row of the memory array, wherein a read word line RWL is connected to the RWL port through a switch S1, and is connected to a sampling Current input port Current through a switch S5. The switching word line SW communicates Vss through the switch S2. The switch S4 is connected between the read word line RWL and the switching word line SW.
In the mode control circuit, the switches S1 and S5 may switch the read word line RWL to two different ports in different operation modes, one of which RWL is used to input a word line voltage signal in a data read operation; the other port Current is used for inputting sampling Current required by logic operation into the 8TSRAM unit in the logic operation stage. Likewise, switches S3 and S6 may also switch the read bit line RBL to two different ports, one of which is used to precharge the read bit line RBL high during a data read operation; the other port NIN is used to input the inverted value of one of the operands IN the multiplication operation to the 8TSRAM unit. The switch S2 is used for controlling the on and off of the ground path of the switching word line SW. The switch S4 is used to connect the read word line RWL and the switching word line SW to allow the sampling current to flow into the 8TSRAM cell.
Based on the control logic of the mode control circuit, in the multiply-accumulate in-memory computing circuit of the adaptive scan ADC configured based on the read-write separation SRAM provided by the embodiment, the mode control circuit can be switched into three different modes according to the circuit function, namely a data writing mode, a data reading mode and a logic operation module. In the data writing mode, all switches are turned off, so that only the 6T memory cell portion in the 8TSRAM cell can be kept connected to the word line WL and the bit line BLB, thereby realizing data writing and data holding operations. In the data read mode, the 6T memory cell portion is in a data retention state, while switches S1-S3 are closed and switches S4-S6 are open; at this time, the ports RBL and RWL are connected to the read bit line and the read word line, so that data reading can be realized. In the operation mode, the 6T memory cell part is in a data holding state, the switches S1 to S3 are opened, and the switches S4 to S6 are closed; at this time, the ports IN and Current are connected to the read bit line and the read word line, so that the injection of the operand and the sampling Current can be realized, and the logic operation can be further realized.
The present embodiment is directed to the operation logic employing specific storage and operation in the above-described circuit. The present invention also relates to a new quantization circuit for quantizing and outputting the result of the multiply or multiply-accumulate operation of the multiply-accumulate in-memory calculation circuit, which will be described in detail later.
In the memory array provided in this embodiment, each 8TSRAM cell serves as a basic unit for performing data read, write, hold, and multiply operations. In particular, all 8TSRAM cells in the same row in the memory array together serve as the base unit for performing multiply-accumulate operations. The present embodiment adopts a "row-wise operation" mode for the multiply-accumulate operation, and the "transpose" design is a brand-new design scheme of the multiply-accumulate logic operation circuit.
The principle and operation logic of the multiply-accumulate in-memory computing circuit of the adaptive scan ADC configured based on the read-write separation SRAM provided in this embodiment will be described in detail in the order of the data writing+data holding function, the data reading function, and the logic operation function.
1. Data writing and data holding function
As seen in connection with fig. 2 and the right half of fig. 3, the operation logic of each 8TSRAM cell in this embodiment to perform the data write and data hold operations is as follows:
First, the switches S1 to S6 connected to the corresponding 8TSRAM cells are turned off, i.e., switched to the data write mode, by the mode control circuit. At this point, RWL is inactive and the sense amplifier at the end of RBL does not receive the enable signal. The 6T memory cell composed of P1-P2 and N1-N4 in the middle is used as a basic unit for executing data writing.
Then, the bit line BL or BLB of the corresponding column is precharged to a high level or a low level according to data to be written to the storage node Q or QB. For example, when it is necessary to write data "1" into the storage node Q and data "0" into the storage node QB, it is necessary to set the bit line BL to a high level and the bit line BLB to a low level. Conversely, if it is necessary to write data "0" into the storage node Q and data "1" into the storage node QB, the bit line BL needs to be set to a low level and the bit line BLB needs to be set to a high level.
Finally, the word line WL of the 8TSRAM unit of the corresponding row is started to finish the writing of the data. After the word line WL is turned on, both the transfer transistors N1 and N2 are turned on. At this time, taking the storage node Q writing "1" and the storage node QB writing "0", the bit line BL charges the storage node Q, that is: the bit line BL in the high state raises Q of the connected storage node to a high potential, and writes data "1". And bit line storage node QB discharges through bit line BLB, i.e.: the bit line BLB in the low-level transition pulls the connected storage node QB low to write data "0".
After the data writing is completed, the word line WL is restored to a low level, and the bit lines BL and BLB are disconnected from the storage node, i.e., enter a data hold state. In the data holding state, the cross-coupled latch structure formed by P1, P2, N3, N4 can stably hold the data stored in the storage node.
2. Data reading function
In the multiply-accumulate in-memory computing circuit of the adaptive scan ADC configured based on the read-write separation SRAM provided in this embodiment, the operation logic of each 8TSRAM unit to perform the data read operation is as follows:
first, the switches S1 to S3 connected to the corresponding 8TSRAM cells are closed by the mode control circuit, and the switches S4 to S6 are opened, that is, switched to the data read mode. In the data read mode, as shown in the left half of fig. 3, at this time, the word line WL is inactive, both NMOS transistors N1 and N2 are in the off state, and the middle 6T cell is in the data retention state.
Then, the read bit lines RBL of the corresponding column are precharged to a high level through the RBL ports. And sets the read word line RWL of the corresponding row to a high level through the RWL port; i.e. rwl=1, rbl=1. At this time, the NMOS transistor N6 is in a conductive state, and the conductive state of the NMOS transistor is determined by the data stored in the storage node Q. When the data stored in the storage node Q is "1", Q is represented as high level, that is, the NOMS tube N5 is turned on, and at this time, the read bit line RBL may be grounded through N6 and N5 in sequence, and the read bit line is lowered to low level after discharging. In contrast, when the data stored in the storage node Q is "0", it represents that Q is low, i.e., the NMOS transistor N5 is turned off, and the D read bit line RBL cannot be discharged through N6 and N5, so remains high.
Therefore, during a data operation, when the read bit line RBL is maintained at a high level, data stored in the storage node Q corresponding to the 8TSRAM cell is "1", and when the read bit line RBL is lowered to a low level, data stored in the storage node Q corresponding to the 8TSRAM cell is "0".
The multiply-accumulate in-memory computing circuit based on the read-write separation SRAM configuration self-adaptive scanning ADC provided by the embodiment adopts a read-write separation design, realizes the isolation of a read bit line RBL and a storage node, has no "0" node voltage division problem, and can greatly relieve the read damage phenomenon in the SRAM unit of non-separation design.
3. Logic operation function
In the in-multiply-accumulate-in-memory computing circuit of the adaptive scan ADC configured based on the read-write separation SRAM provided in this embodiment, the logic operation functions performed include two kinds of multiply operation and multiply-accumulate operation.
3.1 multiplication operations
The execution body of the multiplication task is each 8TSRAM unit, and the operation logic of the multiplication operation executed by the calculation circuit in the multiply-accumulate memory provided by the embodiment is as follows:
first, in the data hold state, the value of the storage node Q is taken as one of the operands. Then, the switches S1 to S3 connected to the corresponding 8TSRAM units are opened, and the switches S4 to S6 are closed, that is, switched to the operation mode. When the circuit of this embodiment performs logic operation, the structure of the 8TSRAM cell or the connection mode of the internal circuit is not required to be changed, and only the state of the mode control circuit is required to be adjusted. In the computing mode, the word line WL connected in the 6T memory cell is still in the off state, and at this time, the 6T memory cell is in the data recognition state, thereby completely isolating the stored data from the computing process.
After the switch S4 is closed, each 8TSRAM cell in fig. 1 and 2 switches to the state shown in fig. 4. As can be seen from fig. 4, the gate of N5 is connected to the storage node, while the source of M5 is directly connected to the gate of N6. After switches S5 and S6 are closed, RWL/SW is further connected to sampling Current port Current, and RBL is connected to NIN port. At this time, the two NMOS transistors N5 and N6 originally used for the read operation form a similar low voltage Cascode structure as shown in the left side of fig. 4, at which point the operation of the 8T SRAM array in the computation preparation phase is completed.
Then, IN the calculation process, a sampling Current is input to the write word line RWL through the Current port, and an inverted value of another operand IN of the multiplication operation is input through the NIN port. In the equivalent circuit on the left side of fig. 4, the Q value in each calculation unit is provided by the stabilized stored value (weight) in the SRAM cell, and determines whether the N5 pipe is turned on or not as one of the operands of the multiplication operation. Since the RBL access input NIN is inverted from the TOP layer input signal IN, the corresponding NIN will also be high or low for different IN. 4 different combinations of cases occur for different Q and NIN, the implementation of multiply-accumulate is described in detail by the following example:
In the calculation process, the sampling current is injected into the unit through the RWL signal line, and if the storage node Q is at a low level, the NMOS transistor N5 is turned off, and no matter the NIN is at a high level or a low level, the sampling current cannot smoothly flow into the calculation unit.
Next, a case will be discussed where if Q is high and NIN is high, then for the tube N6, the drain and source positions will also switch due to NIN potential switching, in which case the original low voltage Cascode connection will not be established, which, like connecting the gate and source to the N6 tube, will directly cause the MOS tube to turn off, so that no influx path will be formed for the RWL/SW injected sampling current.
When the storage node Q is at a high level and the NIN is at a low level, the connection mode of the NMOS transistor N5 and the NMOS transistor N6 is the connection of the low voltage Cascode. In a sense, the connection mode can be regarded as an improvement on the connection mode of the MOS diode, is a typical structure for converting input current into gate voltage, and is widely applied to a Casode current mirror. It can be seen that at this time when the sampling current is injected from the connected RWL/SW, the current smoothly flows out of NIN. Of particular emphasis is the fact that: in this embodiment, the sampling current of the corresponding value is also converted into a corresponding gate voltage at the gate of N6 due to the low voltage Cascode structure, which can be used as an important theoretical basis for sampling and quantifying the calculation result.
From the above discussion of the combination of Q and NIN (IN), the principle of implementing multiplication operations within this 8T SRAM cell can be summarized as follows: when the sampling current is injected from the connected RWL/SW, the sampling current smoothly flows into the cell only when q=1, in=1 (nin=0). The cell sampling current of the rest of the combination case cannot flow in. IN the 8TSRAM unit provided IN this embodiment, the conduction state of whether the sampling current can flow through S4, N5, N6 and flow out through the IN port is used as a tableThe value characterizing the result of the multiplication operation. In this embodiment, the multiplication result of a single 8T SRAM cell can be obtained by analyzing the calculated voltage V generated on the read word line RWL RWL (reflecting the gate terminal voltage of N6) to perform quantization, the state of change of the word line voltage on RWL during multiplication is:
(1) When calculating the voltage V RWL Initial voltage V of read word line RWL before relative operation RWL0 And (3) descending, representing the multiplication result as 1.
(2) When calculating the voltage V RWL Initial voltage V of read word line RWL before relative operation RWL0 And when the result is unchanged, the characterization multiplication result is 0.
Based on the above, the operation logic and the truth table thereof when the 8TSRAM unit provided in this embodiment performs multiplication are summarized as follows:
Table 1: multiplication truth table of 8TSRAM in this embodiment
3.2 multiply accumulate operations
Based on the multiplication operation, the present embodiment implements the multiply-accumulate operation by adopting a row transpose design, that is, using all 8TSRAM cells in the same row to complete the multiply-accumulate operation. The working principle of the multiply-accumulate in-memory computing circuit is as follows:
in connection with the right half of the circuit in fig. 4, they have a common RWL/SW and respective independent RBLs for each row of 8T SRAMs in a fully parallel operation according to a computational equivalence model. As described above, RWL/SW functions to inject the calculated sample current, and RBL functions to access the input of each calculation unit. If a certain amount of total calculated sampling current is injected from RWL/SW, the multiplication principle of the calculation unit can obtain that the calculation unit with partial product of "1" can provide a path for the sampling current to flow in, and the calculation unit with partial product of "0" cannot establish any current path for the sampling current to flow in.
Because the design specifications of each conducting calculation unit are identical, which means that the same equivalent input impedance exists, each conducting calculation unit can equally divide the injected calculation sampling current according to ohm's law, and the ratio of the current divided by each conducting calculation unit to the total sampling current comprises the quantity of unit conduction. According to the structure of the unit circuit, the current value flowing into the unit is converted into a corresponding stable voltage value to be reflected on the RWL/SW signal line. The number of the units in one row after multiplication is 1, namely the number of the units with open calculated channels, is the required multiplication accumulated value, so that the current value after the current division and the voltage value after the conversion represent the result after the multiplication accumulated operation of the row.
Based on the above circuit principle, in the multiply-accumulate in-memory computing circuit of the adaptive scan ADC configured based on the read-write separation SRAM provided in this embodiment, the operation logic of executing the multiply-accumulate operation by all 8TSRAM units in each row is as follows:
first, in the data hold state, the value of the storage node Q of each 8TSRAM cell in the same row is taken as one of the operands. Then, the switches S1 to S3 connected to the 8TSRAM cells are opened, and the switches S4 to S6 are closed. Then, a sampling Current is input to the write word line RWL through the Current port, and an inverted value NIN of another operand IN is input through the NIN port of each 8-ts ram cell IN the same row. Finally, the number m of 8TSRAM cells connected to the same read word line RWL in the on state is used as a value representing the multiplication and accumulation operation result.
In summary, the units in a row determine whether the calculation channel is opened or not according to multiplication logic, when sampling current is injected, the units with the opened channels shunt the sampling current, and the current value after shunt is determined by the number of the units with the opened channels, that is, the current value can obtain the accumulation information required after multiplication is completed in the row.
The current value of the multiply-accumulate calculation result is represented as I/n, where "I" is the total sampling current and "n" is the number of multiplication results of "1". Because the current obtained by the unit with each channel opened in one row is the same, the current value after the current is divided is converted into a corresponding grid voltage according to the characteristics of the unit circuit and is reflected on the signal line RWL/SW, namely, the voltage on the RWL/SW signal line carries multiply-accumulate operation information. The voltage value obtained by converting the current value is related to the impedance of the equivalent model, and the calculated result voltage and the result current have a one-to-one correspondence. The quantization module can obtain a digitized multiply-accumulate calculation result through sampling and processing the voltage.
Therefore, although the 8T SRAM cell provided in this embodiment is current domain when performing operation, the calculation result is reflected on the current signal. But when the calculation result is output, it can still be converted into a voltage value for analysis and quantization.
The above has fully introduced the circuit principle and the operation logic of the multiply-accumulate in-memory computing circuit based on the read-write separation SRAM configuration adaptive scan ADC provided by the embodiment in realizing the data storage and logic operation functions. The quantization circuit of the present embodiment for a multiply-accumulate in-memory calculation circuit design will be further described below:
as shown in fig. 5, the quantization circuit provided in the present embodiment includes a replica line, a current steering DAC, an ADC logic control circuit, a comparator, and a sampling controller.
The duplication line is formed by arranging a plurality of duplication units according to the line, and each duplication unit is formed by two NMOS (N-channel metal oxide semiconductor) tubes and is formed by connecting N5 and N6 in an 8TSRAM (total volatile memory) unit in a circuit connection mode. The grid electrode of each N5 in the replica row is connected with a signal line which is always in a high level; the other ports are respectively connected with the read word line RWL, the read bit line RBL and the switching word line SW corresponding to the duplicated rows. The replica line in this embodiment can adaptively generate a reference voltage with a corresponding magnitude according to a single reference current input by the current steering DAC, so that the comparator can compare the calculated voltages and generate a quantization result.
In this embodiment, the copy cells in the copy line are not designed with the same number of isotactic as the 8TSRAM cells in the calculation line, but are designed in combination with the number of bits of the quantization result and the size of the memory array. The number of duplicated units in the duplicated rows is equal to the length of the storage array divided by the number of bits corresponding to the number of bits quantized. For example, when the storage array is a square matrix of 64×64 and the quantization result is a 3-bit number (8 bits), the number of copy units in the copy line is 8 (64++8=8). When the result of the quantization is a 2-bit number (4 bits), the number of copy units in the copy line is 16 (64++4=16).
The current steering DAC is composed of a current source module, a current mirror module and a switch circuit module. The function of the current steering DAC is to synchronously output a single-time reference current and a sampling current with a current value equal to multiple times of the reference current according to the received time sequence control signals generated by the ADC logic control circuit in each continuous period. The reference Current is input to the read word line RWL of the replica row, and the sampling Current is input to the sampling Current input ports Current of the respective rows of the memory array.
In a current steering DAC, a current source module provides a unit current that can be replicated by a current mirror module. Conventionally, the variable resistor is disposed outside the chip system to facilitate the adjustment of the unit current after the chip flows. The current mirror module and the switch circuit module are integrated at the circuit level, so that the current mirror module and the switch circuit module can be jointly used as a switch current mirror unit of a current steering DAC to carry out overall discussion. For example, in fig. 5, vdsat available for distribution is small when designing under advanced process, and because quantization logic, the switch current mirror unit design is designed to take into account the relative error of each other rather than the absolute error of the replica current, so the current mirror design adopts the conventional cascode current mirror.
For the switch design in the switch current mirror unit, it is most important to solve the influence caused by the through effect of the switch clock at high frequency, so the length of the switch keeps the minimum size to ensure the switch speed, and the width-to-length ratio is designed to be 10, so as to reduce the consumption of voltage. At the same time, the overall area size of the switch should be as small as possible to reduce the MOS parasitic capacitance. In addition, normally open shielding pipes are designed at two ends of the switch to respectively protect the output node of the current source and the integral output node of the current rudder DAC, so that the influence of the through effect of the switch clock is reduced. Each switching current mirror unit is controlled by a respective control signal cm_ A, CM _b, which determines the output path to the array and the output path to ground, respectively, which exists as a preliminary path in the quantization logic.
The ADC logic control circuit is used for sequentially generating time sequence control signals for controlling the current steering DAC according to the clock signals. After receiving the time sequence control signal, the current steering DAC sequentially generates a required single-time reference circuit and sampling current with gradually increased multiplying power. The ADC logic control circuit is also used for receiving the output of one comparator as a feedback signal and terminating the generation process of the time sequence control signal according to the feedback signal. The ADC logic control circuit comprises a time sequence control module and a result transfer module, wherein the time sequence controller module is used for generating a required time sequence control signal. The result transfer module is used for sending a termination signal to the time sequence control module when the acquired feedback signal is overturned, and outputting a value obtained by subtracting one from the current sequence number m of the time sequence control signal as a result of multiply-accumulate operation.
In the operation Mode of the computation circuit in the multiply-accumulate memory of the adaptive scan ADC configured based on the read-write separation SRAM according to the embodiment, after the operation in the computation preparation stage is completed, the Mode control circuit sends an enable signal mode_ctr to the ADC logic control circuit, and after the enable signal reaches the ADC logic control circuit, the enable signal is linked with the clock signal clk_c to control the enabling of the quantization timing control module. What needs to be specified is: in this embodiment, the clock signal adopted by the ADC logic control circuit is different from the clock of the memory cell, and the fastest clock clk_c given by the LOOP module is adopted.
After being enabled, the register chain in the timing control module signals according to the clock cycle beat, and the control for the current steering DAC is composed by the combination logic in the register chain as follows: the first calculation cycle gives CM_A <1>, and CM_B <2> low causes the corresponding switch to open, the remainder being high. The second calculation cycle gives CM_A <2>, and the CM_B <3> low causes the corresponding switch to open, keeping CM_A <1> low and the rest high. The third calculation period gives CM_A <3>, and CM_B <4> low level enables the corresponding switch to be opened, and keeps CM_A <1>, CM_A <2> low level and the rest is high level … …, and the switch control logic controls the switch array in the current steering DAC until the timing control module receives the feedback stop calculation signal.
Advantages of using such control logic to control the current steering DAC in this embodiment include the following: in one aspect, the combined encoding reduces the average current consumed by the DAC module, and the remaining current need not be maintained for a certain calculation period. On the other hand, the cm_b series switch functions as a preliminary path in logic, and if the current mirror of the next calculation cycle is completely turned off to be suspended during the previous calculation cycle, a large amount of charges will be accumulated at the original output node, and when this calculation cycle arrives, a large burr will be caused at the moment of opening the switch, and considerable time is required for stabilization.
The function of the result transfer module in the ADC logic control circuit is to finish the buffer storage of the comparison result of the comparator and send out a calculation stopping signal for the quantization time sequence control module. The comparison result of each calculation period comparator is sent to a shift register of the module for buffering. The stop calculation signal mainly consists of two signals, and when one signal is enabled, the stop calculation signal is enabled. One is the count signal that has completed the last calculation cycle. The other is that the comparison result of the comparator is output as "0", which means that during the scan comparison, a flipped calculation period has been found, so that a quantized result can be obtained, and this stop signal is provided by the buffer shift register, avoiding oscillation caused by the feedback signal.
The comparator comprises two inputs and one output; one of the input ports is connected to the read word line RWL of each row in the memory array, and the other input port is connected to the read word line RWL of the replica row. The output of the comparator is used as a feedback signal required by the ADC logic control circuit.
The sampling controller is composed of a plurality of sampling switches Pre_calcualte, and one sampling switch Pre_calcualte is connected between each read word line RWL and the input port of the comparator; the sampling switch is opened when the current steering DAC outputs a reference current or a sampling current required by logic operation; and closed after the logic operation is completed to allow the comparator to quantize.
In the following, taking a 64×64 memory array and a 3bit (8 bit) quantized output circuit as an example, in combination with the signal flow diagram in fig. 6 and the quantization circuit architecture in fig. 5; the operation logic for realizing multiply-accumulate operation and quantized output in the scheme of the embodiment is described in detail.
First, preliminary explanation is made on quantization logic: for a row of computing units, the number of units that are turned on in the operational mode if the respective binary inputs and the stored data are combined is N, as described according to the above computing principle, i.e. the MAC result of this row is N. If the injected sampling current is I, the resulting calculation current is I/N (the calculation voltage is converted from current and corresponds one to one, and thus the calculation current is discussed).
For each row of computing units performing multiply-accumulate operations, the structure is 1x 64; and takes data with 3 bits (8 bits) output. The number of the replica cells in the replica row is 8, i.e. the gear value of the generated reference current is I/8. Then, when the MAC result is quantized by adopting a slope scanning mode, the calculated currents I/N are sequentially compared with the reference currents respectively belonging to the gears of I/8, I/16, I/24, I/32, I/40, I/48, I/56 and the like in a sequential period. If the output result of the comparator is overturned at the mth time, the MAC quantized result is m-1, and then the result is converted into binary. The quantization logic is improved, and I/N, 2I/N, 3I/N, 4I/N, 5I/N, 6I/N, 7I/N and I/8 are sequentially compared in a sequential period, so that the identical quantization result can be obtained. The adaptive ramp ADC employed in this embodiment then designs the quantization logic accordingly.
In the whole operation process, the ADC logic control circuit designed in this embodiment sequentially needs to complete three main steps of sampling current injection on the memory array, sampling the calculation result of the memory array, and quantizing the analog signal of the sampled calculation result.
Specifically, in the first working period when the ADC logic control circuit receives the on signal, the ADC logic control circuit will send out a control signal to the switch circuit of the current steering DAC, so that cm_a <1>, cm_b <2> are low, and the rest signals are high, which acts to make the PMOS current mirror CM1 transmit a unit of sampling current to the array, and at the same time make the current mirror CM2 ready for opening. If the current mirror CM2 is in the floating off state (the switch cm_ A, CM _b is turned off), the output end of the current mirror accumulates to a relatively high potential, and the current mirror CM2 cannot be restored to a state capable of accurately transmitting current in a short time when the next comparison period arrives. The adjacent PMOS current mirror CM2 is thus set to the ready state and the remaining current mirrors are in the floating off state.
After the sampling current is injected into the memory array and the calculation result is obtained, a sampling switch pre_calculation in the sampling controller is opened. At this time, the calculation result voltage generated on the RWL/SW of the calculation row and the reference voltage generated on the RWL/SW of the replica row are simultaneously connected to the two input terminals of the comparator. The ADC logic control circuit sends out an enabling signal of the comparator, and the calculated voltage is compared with the reference voltage. The output of the comparator is fed back as a feedback signal to a shift register in the ADC logic control circuit. The ADC logic control circuit combines the feedback signals to judge whether to continue the comparison, and if so, the ADC logic control circuit sends out a time sequence control signal of the next period.
In the second working period that the ADC receives the starting signal, the ADC logic control module sends out control signals to the switching circuit of the current steering DAC, so that CM_A <1>, CM_A <2>, CM_B <3> are low level, and the rest signals are high level. These control signals operate similarly to the first cycle for the current steering DAC, causing the PMOS current mirrors CM1, CM2 to turn on, delivering one unit of calculated sample current to the array, respectively, namely: the current steering DAC is made to output a total of two units of calculated sample current into the SRAM array. Similarly, the ADC logic control circuit samples, compares, and outputs in later cycles according to this similar logic. Until the ADC logic control module judges that the termination condition is reached according to the received feedback signal.
And by analogy, 8 times of comparison can be completed in each period, and the increment value of the maximum multiplication accumulation result is 8. Thus, if the multiply-accumulate computation characterizes a value exceeding 56, the comparator needs to complete at least 7 comparisons, with a number of cycles of 7. If the multiplication-accumulation calculation result represents a quantity of 7, the comparator only needs to complete 1 round of comparison, namely the cycle number is 1. If the multiply-accumulate computation characterizes a quantity of 34, the comparator needs to complete at least 5 comparisons, i.e., the number of cycles is 5. The present scan quantization ADC has a characteristic of being adaptively adjusted according to the calculation.
Example 2
Based on the scheme of embodiment 1, this embodiment further provides a CIM chip, which is formed by jointly packaging a multiply-accumulate in-memory computing circuit of the adaptive scan ADC configured based on read-write separation SRAM as in embodiment 1 and peripheral circuits in other SRAM circuits. That is, the circuit design in embodiment 1 can be mass-produced and sold to the outside as a chip in this embodiment.
Because the CIM chip provided by the embodiment is an in-memory computing circuit with further updated functions designed on the basis of the traditional SRAM circuit, the circuit also has all functions of the traditional SRAM circuit. Therefore, the CIM chip also has a series of peripheral circuits of a conventional SRAM. Specifically, an architecture diagram of the CIM chip provided in this embodiment is shown in fig. 7. As can be seen in connection with fig. 7: in addition to the memory array, row signal lines, column signal lines, mode control circuitry, and quantization circuitry, the CIM chip also includes address decoding modules, read/write pre-charge modules, read/write control modules, sense amplifiers, etc. required to implement the complete data storage function.
In the CIM chip provided in this embodiment, the operations for implementing reading and writing are controlled by two separate sets of modules, namely, a read control module and a write control module, and the timing sequence of signals is implemented by a read-write enabling part in the timing sequence control module. After the read control module controls address decoding, the address decoding module selects corresponding read word lines RWL and read bit lines RBL; thereby completing the data reading operation. After the address decoding is controlled by the write control module, the corresponding word line WL, bit line BL and bit line BLB are selected by the address decoding module, and then the data writing operation is completed.
The address decoding module is divided into a row decoding part and a column decoding part, wherein an address signal consists of 9-bit binary codes, the first 3-bit address signal determines column decoding, and the last 6-bit address signal determines row decoding, namely for a 64x64 SRAM array, the column decoding adopts 3-8 decoders, namely eight-bit data which are spaced from the same row are simultaneously operated during writing/reading; the row decoding adopts two-stage decoding, which consists of a 2-4 decoder and a 4-16 decoder respectively, wherein the decoding result of the 2-4 decoder is used as an enabling signal of the second-stage 4-16 decoder, and a row of word lines WL in the array is selected for control during writing/reading.
The CIM chip provided by the embodiment has two working modes of a storage mode and a logic operation, and the mode control circuit switches the access states of the row signal line, the column signal line and each signal input port through each switch so as to adjust the working mode of the CIM chip.
In the operational mode, the mode control circuit processes the SRAM array to convert the 8T SRAM array into a compute array. In the computing array, 6T memory cells of an 8T SRAM core are in a data holding state, the rest is used as the computing cells, and the computing cells and the memory cells in the data holding state are in an isolated state. In the process of executing multiplication and multiply-accumulate operation tasks, the timing control module sends out enabling signals to the current steering DAC and ADC timing control circuit, calculates by using the 8T SRAM array, and samples and quantizes calculation results.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (10)

1. A multiply-accumulate in-memory computing circuit based on read-write separation SRAM configuration adaptive scan ADC is characterized in that the circuit has the functions of data read-write holding, multiplication and multiply-accumulate operation, and comprises:
the memory array is formed by arranging a plurality of 8TSRAM units according to an array; each 8TSRAM is composed of 2 PMOS tubes P1-P2 and 6 NMOS tubes N1-N6, and P1, P2 and N1-N4 form a classical 6T memory unit with two memory nodes Q and QB; the grid electrode of N5 is connected with the storage node Q, and the drain electrode of N5 is connected with the source electrode of N6;
a row signal line connected to all 8TSRAM cells in the same row in the memory array, including a word line WL connected to the gate of each of N1 and N2, a read word line RWL connected to the gate of each of N6, and a switch word line SW connected to the source of each of N5;
a column signal line connected to all 8TSRAM cells in the same column in the memory array, including BL connected to the drain of each N1, BLB connected to the drain of each N2, and RBL connected to the drain of each N6;
The mode control circuit is used for switching the access states of the row signal lines and the column signal lines so as to adjust the working mode of the multiply-accumulate in-memory computing circuit; the mode control circuit comprises a row switch group and a column switch group; the row switch group is connected to column signal lines of each column of the memory array, wherein a read bit line RBL is connected with an RBL port through a switch S3 and is connected with an NIN port through a switch S4; the column switch group is connected to the row signal lines of each row of the memory array, wherein the read word line RWL is connected with the RWL port through a switch S1 and is connected with the sampling Current input port Current through a switch S5; the switching word line SW communicates Vss through the switch S2; the switch S4 is connected between the read word line RWL and the switching word line SW; and
a quantization circuit for quantizing and outputting a result of the multiply or multiply-accumulate operation of the multiply-accumulate in-memory calculation circuit;
in the memory array, each 8TSRAM unit is used as a basic unit for performing data reading, writing, maintaining and multiplying operations, and all 8TSRAM units in the same row are used as basic units for performing multiply-accumulate operations.
2. The read-write separation SRAM configuration adaptive scan ADC based multiply-accumulate in-memory computation circuit of claim 1, wherein the operation logic for each 8TSRAM cell to perform a data write and data hold operation is as follows:
Firstly, the switches S1 to S6 connected with the corresponding 8TSRAM units are disconnected; then, the bit line BL or BLB of the corresponding column is precharged to a high level or a low level in accordance with data to be written to the storage node Q or QB; finally, the word line WL of the 8TSRAM units in the corresponding row is started to finish the writing of data;
after the data writing is completed, the word line WL is restored to a low level, and the bit lines BL and BLB are disconnected from the storage node, i.e., enter a data hold state.
3. The read-write separation SRAM configuration adaptive scan ADC based multiply-accumulate in-memory computation circuit of claim 1, wherein the operation logic for each 8TSRAM cell to perform a data read operation is as follows:
firstly, the switches S1-S3 connected with the corresponding 8TSRAM units are closed, and the switches S4-S6 are opened; then, the read bit lines RBL of the corresponding columns are precharged to a high level through the RBL ports; then, the read word line RWL of the corresponding row is set to a high level through the RWL port; finally, when the read bit line RBL is maintained at a high level, the data stored in the storage node Q corresponding to the 8TSRAM cell is "1", and when the read bit line RBL is lowered to a low level, the data stored in the storage node Q corresponding to the 8TSRAM cell is "0".
4. The read-write separation SRAM configuration adaptive scan ADC based multiply-accumulate in-memory computation circuit of claim 1, wherein the operation logic of each 8TSRAM cell to perform the multiplication operation is as follows:
firstly, in a data holding state, taking the value of a storage node Q as one operand; then, the switches S1 to S3 connected with the corresponding 8TSRAM units are opened, and the switches S4 to S6 are closed; then, a sampling Current is input to the write word line RWL through the Current port, and an inverted value of another operand IN of the multiplication operation is input through the NIN port; and finally, taking the conduction state of whether the sampling current can flow through S4, N5 and N6 and flow out through the IN port as a value representing the multiplication result.
5. The multiply-accumulate in-memory computation circuit of an adaptive scan ADC configured based on read-write separation SRAM of claim 4, wherein saidThe multiplication result in the 8TSRAM cell can also pass the calculated voltage V generated on the connected read word line RWL RWL And (3) carrying out quantification: when calculating the voltage V RWL Initial voltage V of read word line RWL before relative operation RWL0 Descending, representing multiplication operation result as 1; when calculating the voltage V RWL Initial voltage V of read word line RWL before relative operation RWL0 And when the result is unchanged, the characterization multiplication result is 0.
6. The multiply-accumulate in-memory computation circuit of an adaptive scan ADC based on read-write split SRAM configuration of claim 4, wherein: the operation logic for performing the multiply-accumulate operation for all 8TSRAM cells in each row is as follows:
firstly, in a data holding state, taking the value of a storage node Q of each 8TSRAM unit in the same row as one operand; then, the switches S1 to S3 connected with the corresponding 8TSRAM units are opened, and the switches S4 to S6 are closed; then, a sampling Current is input to the write word line RWL through the Current port, and an inverted value NIN of another operand IN is input through the NIN port of each 8TSRAM cell IN the same row; finally, the number m of 8TSRAM cells connected to the same read word line RWL in the on state is used as a value representing the multiplication and accumulation operation result.
7. The multiply-accumulate in-memory computation circuit of an adaptive scan ADC based on read-write split SRAM configuration of claim 6, wherein: the quantization circuit includes:
the replication line is formed by arranging a plurality of replication units in a line, each replication unit is formed by two NMOS (N-channel metal oxide semiconductor) tubes and is formed by connecting N5 and N6 in the 8TSRAM unit in a circuit connection mode, and the grid electrode of each N5 in the replication line is connected with a signal line with a high level; the other ports are respectively connected with a read word line RWL, a read bit line RBL and a switching word line SW corresponding to the duplication line;
A current steering DAC for synchronously outputting a single-time reference current and a sampling current having a current value equal to a multiple of the reference current according to the received timing control signal in successive respective periods; the reference Current is input to the read word line RWL of the copied row, and the sampling Current is input to the sampling Current input ports Current of each row of the memory array;
the ADC logic control circuit is used for sequentially generating time sequence control signals for controlling the current steering DAC according to clock signals; after receiving the time sequence control signal, the current steering DAC sequentially generates a required single-time reference circuit and sampling current with gradually increased multiplying power; the ADC logic control circuit is also used for receiving the output of one comparator as a feedback signal and terminating the generation process of the time sequence control signal according to the feedback signal;
a comparator comprising two inputs and one output; one input port is connected to the read word line RWL of each row in the memory array, and the other input port is connected to the read word line RWL of the replica row; the output of the comparator is used as a feedback signal required by the ADC logic control circuit; and
the sampling controller is composed of a plurality of sampling switches Pre_calcualte, and one sampling switch Pre_calcualte is connected between each read word line RWL and an input port of the comparator; the sampling switch is opened when the current steering DAC outputs a reference current or a sampling current required by logic operation; and closed after the logic operation is completed to allow the comparator to quantize.
8. The multiply-accumulate in-memory computation circuit of an adaptive scan ADC based on read-write split SRAM configuration of claim 7, wherein: the ADC logic control circuit comprises a time sequence control module and a result transfer module, wherein the time sequence controller module is used for generating a required time sequence control signal; the result transfer module is used for sending a termination signal to the time sequence control module when the acquired feedback signal is overturned, and outputting a value obtained by subtracting one from the current sequence number m of the time sequence control signal as a result of multiply-accumulate operation.
9. The circuit for computing the multiply-accumulate in-memory of the adaptive scan ADC based on the read-write separation SRAM configuration of claim 1, wherein the circuit connection relationship of the 6T memory cells is as follows:
p1 and N3 form one inverter, and P2 and N4 form the other inverter; the two are in opposite phase cross coupling to form storage nodes Q and QB; the storage node Q is connected to the bit line BL through the transmission pipe N1, the storage node QB is connected to the bit line BLB through the transmission pipe N2, the gates of N1 and N2 are connected to the word line WL, the sources of P1 and P2 are connected to VDD, and the sources of N3 and N4 are connected to VSS.
10. A CIM chip, characterized in that: the adaptive scan ADC is packaged by a multiply-accumulate in-memory computing circuit based on a read-write separation SRAM configuration as claimed in any one of claims 1 to 9.
CN202311050617.XA 2023-08-18 2023-08-18 Multiply-accumulate in-memory computing circuit for configuring self-adaptive scanning ADC (analog-to-digital converter) based on read-write separation SRAM (static random Access memory) Pending CN117056277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311050617.XA CN117056277A (en) 2023-08-18 2023-08-18 Multiply-accumulate in-memory computing circuit for configuring self-adaptive scanning ADC (analog-to-digital converter) based on read-write separation SRAM (static random Access memory)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311050617.XA CN117056277A (en) 2023-08-18 2023-08-18 Multiply-accumulate in-memory computing circuit for configuring self-adaptive scanning ADC (analog-to-digital converter) based on read-write separation SRAM (static random Access memory)

Publications (1)

Publication Number Publication Date
CN117056277A true CN117056277A (en) 2023-11-14

Family

ID=88665977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311050617.XA Pending CN117056277A (en) 2023-08-18 2023-08-18 Multiply-accumulate in-memory computing circuit for configuring self-adaptive scanning ADC (analog-to-digital converter) based on read-write separation SRAM (static random Access memory)

Country Status (1)

Country Link
CN (1) CN117056277A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117608519A (en) * 2024-01-24 2024-02-27 安徽大学 Signed multiplication and multiply-accumulate operation circuit based on 10T-SRAM

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117608519A (en) * 2024-01-24 2024-02-27 安徽大学 Signed multiplication and multiply-accumulate operation circuit based on 10T-SRAM
CN117608519B (en) * 2024-01-24 2024-04-05 安徽大学 Signed multiplication and multiply-accumulate operation circuit based on 10T-SRAM

Similar Documents

Publication Publication Date Title
CN111816234B (en) Voltage accumulation in-memory computing circuit based on SRAM bit line exclusive nor
CN113467751B (en) Analog domain memory internal computing array structure based on magnetic random access memory
CN117056277A (en) Multiply-accumulate in-memory computing circuit for configuring self-adaptive scanning ADC (analog-to-digital converter) based on read-write separation SRAM (static random Access memory)
CN117271436B (en) SRAM-based current mirror complementary in-memory calculation macro circuit and chip
Mu et al. SRAM-based in-memory computing macro featuring voltage-mode accumulator and row-by-row ADC for processing neural networks
CN116364137A (en) Same-side double-bit-line 8T unit, logic operation circuit and CIM chip
CN114038492B (en) Multiphase sampling memory internal computing circuit
CN117219140B (en) In-memory computing circuit based on 8T-SRAM and current mirror
CN114496010A (en) Analog domain near memory computing array structure based on magnetic random access memory
CN113935479A (en) High-energy-efficiency binary neural network accelerator for artificial intelligence Internet of things
US20230297235A1 (en) Sram-based cell for in-memory computing and hybrid computations/storage memory architecture
CN116204490A (en) 7T memory circuit and multiply-accumulate operation circuit based on low-voltage technology
US20220262426A1 (en) Memory System Capable of Performing a Bit Partitioning Process and an Internal Computation Process
US20230066113A1 (en) Computing-in-memory apparatus
KR102318820B1 (en) Multi-Bit Memory Cell and In-Memory Device Using The Same
Qiao et al. A 16.38 TOPS and 4.55 POPS/W SRAM Computing-in-Memory Macro for Signed Operands Computation and Batch Normalization Implementation
CN117636945B (en) 5-bit signed bit AND OR accumulation operation circuit and CIM circuit
US20230410862A1 (en) In-memory computation circuit using static random access memory (sram) array segmentation
CN117079688A (en) Current domain 8TSRAM unit and dynamic self-adaptive quantized memory circuit
US20230386565A1 (en) In-memory computation circuit using static random access memory (sram) array segmentation and local compute tile read based on weighted current
CN115995256B (en) Self-calibration current programming and current calculation type memory calculation circuit and application thereof
CN115658010A (en) Pulse width modulation circuit, quantization circuit, storage circuit and chip
Song et al. A 4-bit Calibration-Free Computing-In-Memory Macro With 3T1C Current-Programed Dynamic-Cascode Multi-Level-Cell eDRAM
Saragada et al. An in-memory architecture for machine learning classifier using logistic regression
Liu et al. Design of switched-current based low-power PIM vision system for IoT applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination