CN114093394B - Rotatable internal computing circuit and implementation method thereof - Google Patents

Rotatable internal computing circuit and implementation method thereof Download PDF

Info

Publication number
CN114093394B
CN114093394B CN202111273336.1A CN202111273336A CN114093394B CN 114093394 B CN114093394 B CN 114093394B CN 202111273336 A CN202111273336 A CN 202111273336A CN 114093394 B CN114093394 B CN 114093394B
Authority
CN
China
Prior art keywords
line
input
switch
word line
charge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111273336.1A
Other languages
Chinese (zh)
Other versions
CN114093394A (en
Inventor
王润声
宋嘉豪
王源
唐希源
黄如
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202111273336.1A priority Critical patent/CN114093394B/en
Publication of CN114093394A publication Critical patent/CN114093394A/en
Application granted granted Critical
Publication of CN114093394B publication Critical patent/CN114093394B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C5/00Details of stores covered by group G11C11/00
    • G11C5/02Disposition of storage elements, e.g. in the form of a matrix array
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C5/00Details of stores covered by group G11C11/00
    • G11C5/14Power supply arrangements, e.g. power down, chip selection or deselection, layout of wirings or power grids, or multiple supply levels
    • G11C5/147Voltage reference generators, voltage or current regulators; Internally lowered supply levels; Compensation for voltage drops
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/18Bit line organisation; Bit line lay-out
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C8/00Arrangements for selecting an address in a digital store
    • G11C8/14Word line organisation; Word line lay-out

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Analogue/Digital Conversion (AREA)

Abstract

The invention discloses a memory-turnable internal computing circuit and an implementation method thereof. The invention relates to a memory-transferring internal computing circuit which comprises a memory-transferring internal computing array and a peripheral circuit, wherein the memory-transferring internal computing array comprises 16 local arrays, each local array comprises 128 memory and computing columns, the 128 memory and computing columns are connected together through row computing lines, the memory and computing columns positioned in the same column are reversely connected together through a total bit line and a total bit line, each memory and computing column comprises 8 six-tube memory units and 1 charge computing unit, the memory and computing columns are reversely connected in parallel through the local bit line and the local bit line, and the peripheral circuit comprises a word line drive, a read-write peripheral circuit, a front transmission driving circuit, 16 row analog-digital converters, 16 8 selection 1 multiplexers, 16 column analog-digital converters and a total time sequence control circuit; the transposition function of the invention can enable the intelligent chip at the edge end to realize the retraining of the edge end with lower power consumption; meanwhile, the stability and the accuracy of calculation are improved by charge domain calculation.

Description

Rotatable internal computing circuit and implementation method thereof
Technical Field
The invention relates to the field of integrated circuit design (INTEGRATED CIRCUIT DESIGN), in particular to a computer circuit capable of being transferred and stored and an implementation method thereof.
Background
In recent years, the deep learning (DEEP LEARNING) algorithm has achieved very good results in various fields. At the same time, the parameter scale of the deep neural network (deep neural networks) is becoming larger and larger. This results in a large power consumption in handling the neural network parameters, known as a memory wall (memory wall) problem, when processing deep learning tasks using conventional memory-separation computing architectures. This power consumption problem makes it difficult for deep learning algorithms to be deployed to edge devices (EDGE DEVICE) that are very power hungry. In order to solve the problem of the memory wall, a new computing architecture, in-memory-computing (in-memory-computing), has been proposed by designers in recent years.
The in-memory computing circuit is particularly energy efficient due to its analog computing nature and the nature of parallel processing. In recent years, various new in-memory computing chips have been proposed, which are classified into two types, namely, current domain (current domain) in-memory computing and charge domain (charge domain) in-memory computing, according to the type of analog computing.
In the current type memory calculation chip, the input is controlled by voltage, and the result of the multiplication of the input and the weight is expressed as the magnitude of current. And the currents obtained by multiplying the multiple inputs and the weights are superposed on the same computing node, the capacitance of the computing node is discharged, and the multiply-accumulate (multiply-accumulate) operation of the whole analog domain (analog domian) is completed. However, since there is random fluctuation (variation) in the threshold value of the transistor (transistor), the calculation current may deviate from the ideal calculation result, so that the calculation accuracy is affected. This effect is particularly severe at the advanced process node (advanced technology).
In the charge type in-memory computing chip, the result of the multiplication determines whether to charge a computing capacitor, and the accumulation is to share the charge by connecting the computing capacitors together. The calculation of the capacitance is often implemented by a metal-oxide-metal (metal-oxide-metal) capacitance, which is highly accurate at advanced process nodes, so that the calculation result is very accurate. The charge type structure has the advantage of high precision, but requires an additional transistor to control the capacitor charge and discharge, resulting in a larger cell area (cell area) than the conventional 6-tube cell, and lower storage and computation density.
In addition, some data cannot be uploaded to the cloud (closed) to train the neural network for privacy protection purposes. A more efficient solution is to train a generic neural network model from the public data set, download it locally, and fine-tune (fine-tune) a portion of the model through the neural network with the user's own data to achieve the best results for everyone. This solution requires training at the edge. However, training of neural networks (training) requires taking out the transpose (transfer) of the weight matrix, unlike speculation (inference), which is rarely supported by computing chips in current memories.
Therefore, a transposeable and more area efficient charge-in-memory computational circuit is important for the deployment of neural networks at the edge.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a reversible in-memory computing circuit and an implementation method thereof, which are based on charge sharing and charge domain computation of six-tube memory cells and support transposed computation.
It is an object of the present invention to provide a computer circuit capable of being transferred to memory.
The invention relates to a memory-rotatable internal computing circuit, which comprises a memory-rotatable internal computing array and a peripheral circuit;
the rotatable internal computing array comprises 16 multiplied by N local arrays, wherein N is a natural number;
each local array comprises 128×M storage and calculation columns, M is a natural number; all 128×M storage and computation columns in the same local area array are connected together through row computation lines; all the storage and calculation columns positioned in the same column in the 16 XN local arrays are reversely connected together through a bus bit line and a bus bit line; the number of the row calculation lines is 16×N, and the numbers of the total bit lines and the total bit line inversions are 128×M respectively;
Each storage and calculation column comprises 8 multiplied by K six-tube storage units and 1 charge calculation unit, wherein K is a natural number and is in antiparallel connection with a local bit line; the local bit line has parasitic capacitance thereon; the six-tube memory unit stores weight values; all 128 XM memory cells corresponding to the computing unit columns in each local array are positioned in the same row, and 8 XK rows of six memory cells are arranged; the 16×n local arrays share 16×n×8×k=128×n×k rows of six-pipe memory cells;
All 128 XM memory cells in the same local area array and six memory cells in the same row of the computing cell column are connected in parallel with word lines and word line inversions, and the number of the word lines and the word line inversions is 8 XK respectively; the number of word lines and word line inversions of the 16×n local area arrays is 16×n×8×k=128×n×k, respectively;
Each charge calculation unit internally comprises two pre-charge transistors, a row enable switch, a row enable inverse switch, an input switch and an input inverse switch; one precharge line is connected to gate terminals of the two precharge transistors, a row enable line is connected to a control terminal of a row enable switch, the row enable line is reversely connected to a control terminal of a row enable reverse switch, one ends of the row enable switch and the row enable reverse switch are connected to a row calculation line, the other end of the row enable switch is connected to a local bit line, the other end of the driving enable inverse switch is connected to the local bit line inverse, the input line is connected to the control end of the input switch, the input line is inversely connected to the control end of the input inverse switch, one end of the input switch is connected to the local bit line, one end of the input inverse switch is connected to the local bit line inverse, the other end of the input switch is connected to the bus, and the other end of the input inverse switch is connected to the bus inverse; the input switches and the input reversing switches which are positioned in the same column in the 16 XN local area arrays are respectively connected to the same input line and the control end of the input line reversing, namely the number of the input line and the input reversing line is 128 XM;
The peripheral circuit comprises a word line drive, a read-write peripheral circuit, a front transmission driving circuit, 16×N row analog-to-digital converters, 16×M 8 selection 1 multiplexers, 16×M column analog-to-digital converters and a total time sequence control circuit;
The front transmission driving circuit is connected to the control ends of the input switch and the input inverse switch of each charge calculation unit through the input line and the input line, and connected to the control ends of the drive enable switch and the drive inverse switch of each charge calculation unit through the drive enable line and the drive enable line; each channel of the read-write peripheral circuit is respectively connected to each bus bit line and the bus bit line inverse; simultaneously, each bus bit line is respectively connected to a corresponding input port of an 8-by-1 multiplexer, and an output port of the multiplexer is connected to a column analog-to-digital converter; each row of analog-to-digital converters corresponds to a local array, and each local array is connected to a corresponding input port of each row of analog-to-digital converter through a row computing line; the word line drive registers an inverse transmission input value and has an inverse transmission input drive function, and each channel of the word line drive is connected to a corresponding word line; the total time sequence control circuit is respectively connected to the front transmission driving circuit, the read-write peripheral circuit and the word line driver; the total time sequence control circuit is connected to the pre-charge line, so that the total time sequence control circuit is connected to the pre-charge transistor of each charge calculation unit through the pre-charge line; the word line driving has two configuration modes in calculation, namely a front transmission word line driving mode and a back transmission word line driving mode;
In the initial state, the row enable switch, the row enable inverse switch, the input switch and the input inverse switch are all in an off state, the word line and the word line are inverted, the row enable line and the row enable line are inverted, the input line and the input line are inverted and the pre-charge line are all in a low level, and the bit line are inverted and are in a pre-charge voltage;
in the forward mode, the control of the input line and the input line inverse is related to the forward transmission input value; in the back transmission mode, the control of the input line and the back of the input line is irrelevant to the back transmission input value;
Forward mode: a) A pre-charging stage: the precharge line is at a low level, so that the local bit line and the local bit line are reversely precharged to a precharge voltage through a precharge transistor in the charge calculation unit; b) The total time sequence control circuit applies high level to the pre-charge line, and the pre-charge stage is ended; then word line driving applies high level to word line and word line of row in which weight value is needed to be read in 16 XN local area array, word line and word line with high level are selected in reverse, local bit line and local bit line are kept precharge voltage while being discharged to ground according to weight value stored in six tube memory unit on selected word line and word line in reverse, then word line and word line are reapplied low level in reverse, and weight reading operation is completed; c) The front transmission driving circuit reversely applies high level to the input line or the input line in each channel according to the registered front transmission input value, so as to control the closing of the input switch or the input inverse switch, and the corresponding local bit line or local bit line of the closed input switch or the input inverse switch is reversely discharged to the ground; then the input line with the high level is reapplied with the low level in the opposite direction, and multiplication of the input value and the weight value transmitted before completion; d) The front transmission input driving circuit reversely applies high level to all the line enabling lines and the line enabling lines, the line enabling switch and the line enabling inverse switch are closed, the multiplication results of 128 multiplied by M front transmission input values and corresponding weight values are accumulated on a line calculating line and transmitted to a line analog-to-digital converter, the line analog-to-digital converter carries out quantization, and 16 multiplied by N line outputs are output;
Reverse mode: a) A pre-charging stage: the pre-charge line is at a low potential, so that the local bit line and the local bit line are reversely pre-charged to a pre-charge voltage through a pre-charge transistor in the charge calculation unit; b) The total time sequence control circuit applies high level to the pre-charge line, the pre-charge stage is finished, the word line drive applies high level to the word line of the row where the weight value to be read in the local array is located or keeps the word line as low level according to the registered counter-transmission input value, then the word line applied with high level is reapplied with low level, and multiplication of counter-transmission input value and the weight value is completed; c) The total time sequence control circuit applies high level to all input lines, and the input switch is closed, so that all local bit lines in the same column in different local arrays are connected together to finish accumulation, the accumulated data are transmitted to the column analog-to-digital converter, and the column analog-to-digital converter performs quantization and outputs 16×M rows, thereby realizing transposition calculation.
The pre-charging voltage is 0.7-0.9V; the low level is 0.0V, and the high level is 0.7-0.9V.
N is more than or equal to 1 and less than or equal to 4, M is more than or equal to 1 and less than or equal to 9,K, and K is more than or equal to 1 and less than or equal to 4.
Another object of the present invention is to provide a method for implementing a computing circuit in a memory.
The invention discloses a realization method of a rotatable internal computing circuit, which comprises the following steps:
1) Initial state:
the driving enabling switch and the driving enabling inverse switch, the input switch and the input inverse switch are all in an off state, the word line and the word line are inverse, the driving enabling line and the driving enabling line are inverse, the input line and the input line are inverse and the pre-charge line are all in a low level, and the bit line are inverse in pre-charge voltage;
in the forward mode, the control of the input line and the input line inverse is related to the forward transmission input value; in the back transmission mode, the control of the input line and the back of the input line is irrelevant to the back transmission input value;
2) Forward mode:
a) A pre-charging stage: the precharge line is at a low level, so that the local bit line and the local bit line are reversely precharged to a precharge voltage through a precharge transistor in the charge calculation unit;
b) The total time sequence control circuit applies high level to the pre-charge line, and the pre-charge stage is ended; then word line driving applies high level to word line and word line of row in which weight value is needed to be read in 16 XN local area array, word line and word line with high level are selected in reverse, local bit line and local bit line are kept precharge voltage while being discharged to ground according to weight value stored in six tube memory unit on selected word line and word line in reverse, then word line and word line are reapplied low level in reverse, and weight reading operation is completed;
c) The front transmission driving circuit reversely applies high level to the input line or the input line in each channel according to the registered front transmission input value, so as to control the closing of the input switch or the input inverse switch, and the corresponding local bit line or local bit line of the closed input switch or the input inverse switch is reversely discharged to the ground; then the input line with the high level is reapplied with the low level in the opposite direction, and multiplication of the input value and the weight value transmitted before completion;
d) The front transmission input driving circuit reversely applies high level to all the line enabling lines and the line enabling lines, the line enabling switch and the line enabling inverse switch are closed, the multiplication results of 128 multiplied by M front transmission input values and corresponding weight values are accumulated on a line calculating line and transmitted to a line analog-to-digital converter, the line analog-to-digital converter carries out quantization, and 16 multiplied by N line outputs are output;
3) Reverse mode:
a) A pre-charging stage: the pre-charge line is at a low potential, so that the local bit line and the local bit line are reversely pre-charged to a pre-charge voltage through a pre-charge transistor in the charge calculation unit;
b) The total time sequence control circuit applies high level to the pre-charge line, the pre-charge stage is finished, the word line drive applies high level to the word line of the row where the weight value to be read in the local array is located or keeps the word line as low level according to the registered counter-transmission input value, then the word line applied with high level is reapplied with low level, and multiplication of counter-transmission input value and the weight value is completed;
c) The total time sequence control circuit applies high level to all input lines, and the input switch is closed, so that all local bit lines in the same column in different local arrays are connected together to finish accumulation, the accumulated data are transmitted to the column analog-to-digital converter, and the column analog-to-digital converter performs quantization and outputs 16×M rows, thereby realizing transposition calculation.
Wherein in step 2) c), when the current input value is 1, the input switch and the input inverse switch are closed; when the forward input value is 0, the input switch and the input inverse switch are turned off.
In step 3) b), the word line is high when the counter input value is 1, and remains low when the counter input value is 0; when the counter input value and the weight value are both 1, the local bit line will be discharged to ground, otherwise the local bit line will remain at the precharge voltage.
The invention has the advantages that:
compared with the traditional non-transfer internal computing circuit, the transposed computing function of the invention can enable the intelligent chip at the edge to realize the retraining of the edge with lower power consumption; meanwhile, the stability and the accuracy of calculation are improved by charge domain calculation.
Drawings
FIG. 1 is a block diagram of one embodiment of a computer circuit in a portable memory according to the present invention;
FIG. 2 is a schematic diagram of a local array of one embodiment of the inventive in-memory computing circuit, wherein (a) is a block diagram of the local array and (b) is a block diagram of the store and compute column;
FIG. 3 is a flow chart of a forward word line driving mode of an embodiment of a method for implementing a memory-capable internal computing circuit according to the present invention, wherein (a) to (d) are respectively four steps of the flow chart, the left is a schematic diagram, and the right is a timing diagram;
Fig. 4 is a flowchart of an implementation method of the memory-transferable in-memory computing circuit according to an embodiment of the present invention, in which (a) to (c) are flowcharts of three steps, respectively, and the left is a schematic diagram and the right is a timing diagram.
Detailed Description
The invention will be further elucidated by means of specific embodiments in conjunction with the accompanying drawings.
As shown in fig. 1, in the present embodiment, n=m=k=1, and the in-memory-capable computing circuit of the present embodiment includes a 128×128 in-memory-capable computing array and peripheral circuits;
The in-memory-transferable computing array includes first through sixteenth local area arrays;
Each local area array includes first through 128 th storage and computation columns; all 128 storage and calculation columns in the same local area array are connected together through row calculation lines; the storage and calculation columns positioned in the same column in all 16 local arrays are reversely connected together through a bus bit line and a bus bit line; the number of the row calculation lines is 16, and the numbers of the total bit lines and the total bit line inversions are 128, namely the first to 128 th total bit lines and the first to 128 th total bit line inversions;
Each storage and calculation column comprises a first to an eighth six-pipe storage units and 1 charge calculation unit, and is in antiparallel connection with the local bit line through the local bit line; the local bit line has parasitic capacitance thereon; the six-tube memory unit stores weight values; all 128 corresponding six-tube storage units of all storage and calculation unit columns in each local array are positioned in the same row, and 8 rows of six-tube storage units are arranged; 16 local arrays have 16×8=128 rows of six-pipe memory cells;
Each six-pipe memory unit internally comprises two cross-coupled inverters and two access pipes reversely controlled by word lines and word lines; all 128 memory cells in the same local area array and six memory cells in the same row of the computing cell column are connected in parallel with word lines and word line inversions, and the number of the word lines and the word line inversions is 8 respectively; the number of word lines and word line inversions of the 16 local area arrays is 16×8=128 respectively;
Each charge calculation unit internally comprises two pre-charge transistors, a row enable switch, a row enable inverse switch, an input switch and an input inverse switch; the device comprises a precharge line, a row enable switch, a row computation line, a local bit line, an input switch, a local bit line, an input switch and a global bit line, wherein the precharge line is connected to gate ends of two precharge transistors; the input switches and the input reversing switches which are positioned in the same column in the 16 local arrays are respectively connected to the same input line and the control end of the input line reversing, namely the number of the input lines and the input reversing lines is 128;
The peripheral circuit comprises a word line drive, a read-write peripheral circuit, a front transmission driving circuit, 16 row analog-to-digital converters, 16 8-to-1 multiplexers, 16 column analog-to-digital converters and a total time sequence control circuit;
The front transmission driving circuit is connected to the control ends of the input switch and the input inverse switch of each charge calculation unit through the input line and the input line, and connected to the control ends of the drive enable switch and the drive inverse switch of each charge calculation unit through the drive enable line and the drive enable line; each channel of the read-write peripheral circuit is respectively connected to each bus bit line and the bus bit line inverse; simultaneously, each bus bit line is respectively connected to a corresponding input port of an 8-by-1 multiplexer, and an output port of the multiplexer is connected to a column analog-to-digital converter; each row of analog-to-digital converters corresponds to a local array, and each local array is connected to a corresponding input port of each row of analog-to-digital converter through a row computing line; the word line drive registers an inverse transmission input value and has an inverse transmission input drive function, and each channel of the word line drive is connected to a corresponding word line; the total time sequence control circuit is respectively connected to the front transmission driving circuit, the read-write peripheral circuit and the word line driver; the total time sequence control circuit is connected to the pre-charge line, so that the total time sequence control circuit is connected to the pre-charge transistor of each charge calculation unit through the pre-charge line; the word line driving has two configuration modes in calculation, namely a forward word line driving mode and a reverse word line driving mode.
The implementation method of the computer circuit in the transferable memory of the embodiment comprises the following steps:
1) Initial state:
the driving enabling switch and the driving enabling inverse switch, the input switch and the input inverse switch are all in an off state, the word line and the word line are inverse, the driving enabling line and the driving enabling line are inverse, the input line and the input line are inverse and the pre-charge line are all in a low level, and the bit line are inverse in pre-charge voltage;
in the forward mode, the control of the input line and the input line inverse is related to the forward transmission input value; in the back transmission mode, the control of the input line and the back of the input line is irrelevant to the back transmission input value;
2) Forward mode, as shown in fig. 3:
a) A pre-charging stage: the precharge line is at a low level (0.0 v) so that the local bit line and the local bit line are reversely precharged to a precharge voltage (0.9 v) through the precharge transistor in the charge calculation unit, as shown in fig. 3 (a);
b) The total time sequence control circuit applies a high level (0.9V) to the pre-charge line, and the pre-charge stage is ended; then, the word line driving simultaneously applies high level to the word line and the word line of the row where the weight value is required to be read in the 16 local area arrays, the local bit line and the local bit line are reversely discharged to the ground according to the weight value stored in the six-tube memory unit on the selected word line and the word line, the precharge voltage is kept, and then the word line and the word line are reversely reapplied with low level to complete the weight reading operation, as shown in fig. 3 (b);
c) The front transmission driving circuit reversely applies high level to the input line or the input line in each channel according to the registered front transmission input value, so as to control the input switch or the input inverse switch to be closed, the front transmission input value is closed when 1, and the front transmission input value is opened when 0, and the corresponding local bit line or the local bit line of the closed input switch or the input inverse switch is reversely discharged to the ground;
the input line to which the high level is applied and the input line to which the low level is applied are reversed, and multiplication of the input value and the weight value before completion is performed, as shown in fig. 3 (c);
d) The front transmission-in driving circuit reversely applies high level to all the row enable lines and the row enable lines, the row enable switch and the row enable reverse switch are closed, the results of multiplication of 128 front transmission-in values and corresponding weight values are accumulated on a row calculation line and transmitted to a row analog-to-digital converter, and the row analog-to-digital converter performs quantization, as shown in fig. 3 (d);
3) Reverse mode, as shown in fig. 4:
a) A pre-charging stage: the precharge line is at a low potential, thereby reversely precharging the local bit line and the local bit line to a precharge voltage (0.9 v) through the precharge transistor in the charge calculation unit, as shown in fig. 4 (a);
b) The general time sequence control circuit applies a high level to a pre-charge line, the pre-charge stage is finished, a word line drive applies a high level to a word line of a row where a weight value to be read in a local array is located or keeps the word line as a low level according to a registered counter-transmission input value, if the counter-transmission input value is 1, the word line is at a high level (0.9), if the counter-transmission input value is 0, the word line is kept at a low level, if the counter-transmission input value and the weight value are both 1, a local bit line is discharged to the ground, otherwise, the local bit line is kept at a pre-charge voltage, and then the word line to which the high level is applied is reapplied with a low level, so that multiplication of the counter-transmission input value and the weight value is completed, as shown in fig. 4 (b);
c) The overall timing control circuit applies a high level to all input lines, the input switches are closed, so that all local bit lines in the same column in different local arrays are connected together, the accumulated data is transferred to the column analog-to-digital converter and quantized by the column analog-to-digital converter as shown in fig. 4 (c).
Finally, it should be noted that the examples are disclosed for the purpose of aiding in the further understanding of the present invention, but those skilled in the art will appreciate that: various alternatives and modifications are possible without departing from the spirit and scope of the invention and the appended claims. Therefore, the invention should not be limited to the disclosed embodiments, but rather the scope of the invention is defined by the appended claims.

Claims (6)

1. A pivotable in-memory computing circuit, wherein the pivotable in-memory computing circuit comprises a pivotable in-memory computing array and a peripheral circuit;
the rotatable internal computing array comprises 16 multiplied by N local arrays, wherein N is a natural number;
each local array comprises 128×M storage and calculation columns, M is a natural number; all 128×M storage and computation columns in the same local area array are connected together through row computation lines; all the storage and calculation columns positioned in the same column in the 16 XN local arrays are reversely connected together through a bus bit line and a bus bit line; the number of the row calculation lines is 16×N, and the numbers of the total bit lines and the total bit line inversions are 128×M respectively;
Each storage and calculation column comprises 8 multiplied by K six-tube storage units and 1 charge calculation unit, wherein K is a natural number and is in antiparallel connection with a local bit line; the local bit line has parasitic capacitance thereon; the six-tube memory unit stores weight values; all 128 XM memory cells corresponding to the computing unit columns in each local array are positioned in the same row, and 8 XK rows of six memory cells are arranged; the 16×n local arrays share 16×n×8×k=128×n×k rows of six-pipe memory cells;
All 128 XM memory cells in the same local area array and six memory cells in the same row of the computing cell column are connected in parallel with word lines and word line inversions, and the number of the word lines and the word line inversions is 8 XK respectively; the number of word lines and word line inversions of the 16×n local area arrays is 16×n×8×k=128×n×k, respectively;
Each charge calculation unit internally comprises two pre-charge transistors, a row enable switch, a row enable inverse switch, an input switch and an input inverse switch; one precharge line is connected to gate terminals of the two precharge transistors, a row enable line is connected to a control terminal of a row enable switch, the row enable line is reversely connected to a control terminal of a row enable reverse switch, one ends of the row enable switch and the row enable reverse switch are connected to a row calculation line, the other end of the row enable switch is connected to a local bit line, the other end of the driving enable inverse switch is connected to the local bit line inverse, the input line is connected to the control end of the input switch, the input line is inversely connected to the control end of the input inverse switch, one end of the input switch is connected to the local bit line, one end of the input inverse switch is connected to the local bit line inverse, the other end of the input switch is connected to the bus, and the other end of the input inverse switch is connected to the bus inverse; the input switches and the input reversing switches which are positioned in the same column in the 16 XN local area arrays are respectively connected to the same input line and the control end of the input line reversing, namely the number of the input line and the input reversing line is 128 XM;
The peripheral circuit comprises a word line drive, a read-write peripheral circuit, a front transmission driving circuit, 16×N row analog-to-digital converters, 16×M 8 selection 1 multiplexers, 16×M column analog-to-digital converters and a total time sequence control circuit;
The front transmission driving circuit is connected to the control ends of the input switch and the input inverse switch of each charge calculation unit through the input line and the input line, and connected to the control ends of the drive enable switch and the drive inverse switch of each charge calculation unit through the drive enable line and the drive enable line; each channel of the read-write peripheral circuit is respectively connected to each bus bit line and the bus bit line inverse; simultaneously, each bus bit line is respectively connected to a corresponding input port of an 8-by-1 multiplexer, and an output port of the multiplexer is connected to a column analog-to-digital converter; each row of analog-to-digital converters corresponds to a local array, and each local array is connected to a corresponding input port of each row of analog-to-digital converter through a row computing line; the word line drive registers an inverse transmission input value and has an inverse transmission input drive function, and each channel of the word line drive is connected to a corresponding word line; the total time sequence control circuit is respectively connected to the front transmission driving circuit, the read-write peripheral circuit and the word line driver; the total time sequence control circuit is connected to the pre-charge line, so that the total time sequence control circuit is connected to the pre-charge transistor of each charge calculation unit through the pre-charge line; the word line driving has two configuration modes in calculation, namely a front transmission word line driving mode and a back transmission word line driving mode;
In the initial state, the row enable switch, the row enable inverse switch, the input switch and the input inverse switch are all in an off state, the word line and the word line are inverted, the row enable line and the row enable line are inverted, the input line and the input line are inverted and the pre-charge line are all in a low level, and the bit line are inverted and are in a pre-charge voltage;
in the forward mode, the control of the input line and the input line inverse is related to the forward transmission input value; in the back transmission mode, the control of the input line and the back of the input line is irrelevant to the back transmission input value;
Forward mode: a) A pre-charging stage: the precharge line is at a low level, so that the local bit line and the local bit line are reversely precharged to a precharge voltage through a precharge transistor in the charge calculation unit; b) The total time sequence control circuit applies high level to the pre-charge line, and the pre-charge stage is ended; then word line driving applies high level to word line and word line of row in which weight value is needed to be read in 16 XN local area array, word line and word line with high level are selected in reverse, local bit line and local bit line are kept precharge voltage while being discharged to ground according to weight value stored in six tube memory unit on selected word line and word line in reverse, then word line and word line are reapplied low level in reverse, and weight reading operation is completed; c) The front transmission driving circuit reversely applies high level to the input line or the input line in each channel according to the registered front transmission input value, so as to control the closing of the input switch or the input inverse switch, and the corresponding local bit line or local bit line of the closed input switch or the input inverse switch is reversely discharged to the ground; then the input line with the high level is reapplied with the low level in the opposite direction, and multiplication of the input value and the weight value transmitted before completion; d) The front transmission input driving circuit reversely applies high level to all the line enabling lines and the line enabling lines, the line enabling switch and the line enabling inverse switch are closed, the multiplication results of 128 multiplied by M front transmission input values and corresponding weight values are accumulated on a line calculating line and transmitted to a line analog-to-digital converter, the line analog-to-digital converter carries out quantization, and 16 multiplied by N line outputs are output;
Reverse mode: a) A pre-charging stage: the pre-charge line is at a low potential, so that the local bit line and the local bit line are reversely pre-charged to a pre-charge voltage through a pre-charge transistor in the charge calculation unit; b) The total time sequence control circuit applies high level to the pre-charge line, the pre-charge stage is finished, the word line drive applies high level to the word line of the row where the weight value to be read in the local array is located or keeps the word line as low level according to the registered counter-transmission input value, then the word line applied with high level is reapplied with low level, and multiplication of counter-transmission input value and the weight value is completed; c) The total time sequence control circuit applies high level to all input lines, and the input switch is closed, so that all local bit lines in the same column in different local arrays are connected together to finish accumulation, the accumulated data are transmitted to the column analog-to-digital converter, and the column analog-to-digital converter performs quantization and outputs 16×M rows, thereby realizing transposition calculation.
2. The in-memory-transferable computing circuit of claim 1, wherein the precharge voltage is 0.7-0.9V; the low level is 0.09V, and the high level is 0.7-0.9V.
3. The invertible in-memory computing circuit of claim 1, wherein N satisfies 1.ltoreq.n.ltoreq.4, M satisfies 1.ltoreq.m.ltoreq. 9,K satisfies 1.ltoreq.k.ltoreq.4.
4. A method of implementing the in-memory-capable computing circuit of claim 1, the method comprising the steps of:
1) Initial state:
the driving enabling switch and the driving enabling inverse switch, the input switch and the input inverse switch are all in an off state, the word line and the word line are inverse, the driving enabling line and the driving enabling line are inverse, the input line and the input line are inverse and the pre-charge line are all in a low level, and the bit line are inverse in pre-charge voltage;
in the forward mode, the control of the input line and the input line inverse is related to the forward transmission input value; in the back transmission mode, the control of the input line and the back of the input line is irrelevant to the back transmission input value;
2) Forward mode:
a) A pre-charging stage: the precharge line is at a low level, so that the local bit line and the local bit line are reversely precharged to a precharge voltage through a precharge transistor in the charge calculation unit;
b) The total time sequence control circuit applies high level to the pre-charge line, and the pre-charge stage is ended; then word line driving applies high level to word line and word line of row in which weight value is needed to be read in 16 XN local area array, word line and word line with high level are selected in reverse, local bit line and local bit line are kept precharge voltage while being discharged to ground according to weight value stored in six tube memory unit on selected word line and word line in reverse, then word line and word line are reapplied low level in reverse, and weight reading operation is completed;
c) The front transmission driving circuit reversely applies high level to the input line or the input line in each channel according to the registered front transmission input value, so as to control the closing of the input switch or the input inverse switch, and the corresponding local bit line or local bit line of the closed input switch or the input inverse switch is reversely discharged to the ground; then the input line with the high level is reapplied with the low level in the opposite direction, and multiplication of the input value and the weight value transmitted before completion;
d) The front transmission input driving circuit reversely applies high level to all the line enabling lines and the line enabling lines, the line enabling switch and the line enabling inverse switch are closed, the multiplication results of 128 multiplied by M front transmission input values and corresponding weight values are accumulated on a line calculating line and transmitted to a line analog-to-digital converter, the line analog-to-digital converter carries out quantization, and 16 multiplied by N line outputs are output;
3) Reverse mode:
a) A pre-charging stage: the pre-charge line is at a low potential, so that the local bit line and the local bit line are reversely pre-charged to a pre-charge voltage through a pre-charge transistor in the charge calculation unit;
b) The total time sequence control circuit applies high level to the pre-charge line, the pre-charge stage is finished, the word line drive applies high level to the word line of the row where the weight value to be read in the local array is located or keeps the word line as low level according to the registered counter-transmission input value, then the word line applied with high level is reapplied with low level, and multiplication of counter-transmission input value and the weight value is completed;
c) The total time sequence control circuit applies high level to all input lines, and the input switch is closed, so that all local bit lines in the same column in different local arrays are connected together to finish accumulation, the accumulated data are transmitted to the column analog-to-digital converter, and the column analog-to-digital converter performs quantization and outputs 16×M rows, thereby realizing transposition calculation.
5. The method of claim 4, wherein in step 2) c), when the forward input value is 1, the input switch and the input inverse switch are closed; when the forward input value is 0, the input switch and the input inverse switch are turned off.
6. The implementation method of claim 4, wherein in b) of the step 3), the word line is high when the feedback input value is 1, and remains low when the feedback input value is 0; when the counter input value and the weight value are both 1, the local bit line will be discharged to ground, otherwise the local bit line will remain at the precharge voltage.
CN202111273336.1A 2021-10-29 2021-10-29 Rotatable internal computing circuit and implementation method thereof Active CN114093394B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111273336.1A CN114093394B (en) 2021-10-29 2021-10-29 Rotatable internal computing circuit and implementation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111273336.1A CN114093394B (en) 2021-10-29 2021-10-29 Rotatable internal computing circuit and implementation method thereof

Publications (2)

Publication Number Publication Date
CN114093394A CN114093394A (en) 2022-02-25
CN114093394B true CN114093394B (en) 2024-05-24

Family

ID=80298229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111273336.1A Active CN114093394B (en) 2021-10-29 2021-10-29 Rotatable internal computing circuit and implementation method thereof

Country Status (1)

Country Link
CN (1) CN114093394B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115995256B (en) * 2023-03-23 2023-05-16 北京大学 Self-calibration current programming and current calculation type memory calculation circuit and application thereof

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111512375A (en) * 2017-12-18 2020-08-07 高通股份有限公司 Transposed non-volatile (NV) memory (NVM) bit cells and associated data arrays configured for row and column transposed access operations
CN111816231A (en) * 2020-07-30 2020-10-23 中科院微电子研究所南京智能技术研究院 Memory computing device with double-6T SRAM structure
CN112071343A (en) * 2020-08-18 2020-12-11 安徽大学 SRAM circuit structure for realizing multiplication by combining capacitor in memory
CN112151091A (en) * 2020-09-29 2020-12-29 中科院微电子研究所南京智能技术研究院 8T SRAM unit and memory computing device
CN112509620A (en) * 2020-11-30 2021-03-16 安徽大学 Data reading circuit based on balance pre-charging and group decoding
CN112992223A (en) * 2021-05-20 2021-06-18 中科院微电子研究所南京智能技术研究院 Memory computing unit, memory computing array and memory computing device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111512375A (en) * 2017-12-18 2020-08-07 高通股份有限公司 Transposed non-volatile (NV) memory (NVM) bit cells and associated data arrays configured for row and column transposed access operations
CN111816231A (en) * 2020-07-30 2020-10-23 中科院微电子研究所南京智能技术研究院 Memory computing device with double-6T SRAM structure
CN112071343A (en) * 2020-08-18 2020-12-11 安徽大学 SRAM circuit structure for realizing multiplication by combining capacitor in memory
CN112151091A (en) * 2020-09-29 2020-12-29 中科院微电子研究所南京智能技术研究院 8T SRAM unit and memory computing device
CN112509620A (en) * 2020-11-30 2021-03-16 安徽大学 Data reading circuit based on balance pre-charging and group decoding
CN112992223A (en) * 2021-05-20 2021-06-18 中科院微电子研究所南京智能技术研究院 Memory computing unit, memory computing array and memory computing device

Also Published As

Publication number Publication date
CN114093394A (en) 2022-02-25

Similar Documents

Publication Publication Date Title
US11521051B2 (en) Memristive neural network computing engine using CMOS-compatible charge-trap-transistor (CTT)
CN112183739A (en) Hardware architecture of memristor-based low-power-consumption pulse convolution neural network
US11151439B2 (en) Computing in-memory system and method based on skyrmion racetrack memory
CN115039177A (en) Low power consumption in-memory compute bit cell
CN112599165A (en) Memory computing unit for multi-bit input and multi-bit weight multiplication accumulation
CN114093394B (en) Rotatable internal computing circuit and implementation method thereof
CN115390789A (en) Magnetic tunnel junction calculation unit-based analog domain full-precision memory calculation circuit and method
CN113936717B (en) Storage and calculation integrated circuit for multiplexing weight
CN113703718B (en) Multi-bit memory computing device with variable weight
CN114743580A (en) Charge sharing memory computing device
Liu et al. A 40-nm 202.3 nJ/classification neuromorphic architecture employing in-SRAM charge-domain compute
CN114038492A (en) Multi-phase sampling memory computing circuit
CN117056277A (en) Multiply-accumulate in-memory computing circuit for configuring self-adaptive scanning ADC (analog-to-digital converter) based on read-write separation SRAM (static random Access memory)
CN114882921B (en) Multi-bit computing device
CN114895869B (en) Multi-bit memory computing device with symbols
CN115691613A (en) Charge type memory calculation implementation method based on memristor and unit structure thereof
CN116964675A (en) In-memory computing with ternary activation
CN114944180A (en) Weight-configurable pulse generating device based on copy column
US20230027768A1 (en) Neural network computing device and computing method thereof
CN117636945B (en) 5-bit signed bit AND OR accumulation operation circuit and CIM circuit
CN115995256B (en) Self-calibration current programming and current calculation type memory calculation circuit and application thereof
CN117807021B (en) 2T-2MTJ memory cell and MRAM in-memory computing circuit
CN117877553A (en) In-memory computing circuit for nonvolatile random access memory
CN115658010A (en) Pulse width modulation circuit, quantization circuit, storage circuit and chip
Saragada et al. An in-memory architecture for machine learning classifier using logistic regression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant