CN114093394B - Rotatable internal computing circuit and implementation method thereof - Google Patents
Rotatable internal computing circuit and implementation method thereof Download PDFInfo
- Publication number
- CN114093394B CN114093394B CN202111273336.1A CN202111273336A CN114093394B CN 114093394 B CN114093394 B CN 114093394B CN 202111273336 A CN202111273336 A CN 202111273336A CN 114093394 B CN114093394 B CN 114093394B
- Authority
- CN
- China
- Prior art keywords
- line
- input
- switch
- word line
- charge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 16
- 238000004364 calculation method Methods 0.000 claims abstract description 59
- 230000005540 biological transmission Effects 0.000 claims abstract description 55
- 230000015654 memory Effects 0.000 claims abstract description 38
- 238000003491 array Methods 0.000 claims abstract description 22
- 230000002093 peripheral effect Effects 0.000 claims abstract description 18
- 230000006870 function Effects 0.000 claims abstract description 5
- 230000017105 transposition Effects 0.000 claims abstract description 5
- 230000002441 reversible effect Effects 0.000 claims description 24
- 238000013139 quantization Methods 0.000 claims description 9
- 238000009825 accumulation Methods 0.000 claims description 5
- 230000003071 parasitic effect Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 4
- 239000003990 capacitor Substances 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 239000002184 metal Substances 0.000 description 2
- 229910052751 metal Inorganic materials 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- YZMCKZRAOLZXAZ-UHFFFAOYSA-N sulfisomidine Chemical compound CC1=NC(C)=CC(NS(=O)(=O)C=2C=CC(N)=CC=2)=N1 YZMCKZRAOLZXAZ-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C5/00—Details of stores covered by group G11C11/00
- G11C5/02—Disposition of storage elements, e.g. in the form of a matrix array
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C5/00—Details of stores covered by group G11C11/00
- G11C5/14—Power supply arrangements, e.g. power down, chip selection or deselection, layout of wirings or power grids, or multiple supply levels
- G11C5/147—Voltage reference generators, voltage or current regulators; Internally lowered supply levels; Compensation for voltage drops
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/18—Bit line organisation; Bit line lay-out
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C8/00—Arrangements for selecting an address in a digital store
- G11C8/14—Word line organisation; Word line lay-out
Landscapes
- Engineering & Computer Science (AREA)
- Power Engineering (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Analogue/Digital Conversion (AREA)
Abstract
The invention discloses a memory-turnable internal computing circuit and an implementation method thereof. The invention relates to a memory-transferring internal computing circuit which comprises a memory-transferring internal computing array and a peripheral circuit, wherein the memory-transferring internal computing array comprises 16 local arrays, each local array comprises 128 memory and computing columns, the 128 memory and computing columns are connected together through row computing lines, the memory and computing columns positioned in the same column are reversely connected together through a total bit line and a total bit line, each memory and computing column comprises 8 six-tube memory units and 1 charge computing unit, the memory and computing columns are reversely connected in parallel through the local bit line and the local bit line, and the peripheral circuit comprises a word line drive, a read-write peripheral circuit, a front transmission driving circuit, 16 row analog-digital converters, 16 8 selection 1 multiplexers, 16 column analog-digital converters and a total time sequence control circuit; the transposition function of the invention can enable the intelligent chip at the edge end to realize the retraining of the edge end with lower power consumption; meanwhile, the stability and the accuracy of calculation are improved by charge domain calculation.
Description
Technical Field
The invention relates to the field of integrated circuit design (INTEGRATED CIRCUIT DESIGN), in particular to a computer circuit capable of being transferred and stored and an implementation method thereof.
Background
In recent years, the deep learning (DEEP LEARNING) algorithm has achieved very good results in various fields. At the same time, the parameter scale of the deep neural network (deep neural networks) is becoming larger and larger. This results in a large power consumption in handling the neural network parameters, known as a memory wall (memory wall) problem, when processing deep learning tasks using conventional memory-separation computing architectures. This power consumption problem makes it difficult for deep learning algorithms to be deployed to edge devices (EDGE DEVICE) that are very power hungry. In order to solve the problem of the memory wall, a new computing architecture, in-memory-computing (in-memory-computing), has been proposed by designers in recent years.
The in-memory computing circuit is particularly energy efficient due to its analog computing nature and the nature of parallel processing. In recent years, various new in-memory computing chips have been proposed, which are classified into two types, namely, current domain (current domain) in-memory computing and charge domain (charge domain) in-memory computing, according to the type of analog computing.
In the current type memory calculation chip, the input is controlled by voltage, and the result of the multiplication of the input and the weight is expressed as the magnitude of current. And the currents obtained by multiplying the multiple inputs and the weights are superposed on the same computing node, the capacitance of the computing node is discharged, and the multiply-accumulate (multiply-accumulate) operation of the whole analog domain (analog domian) is completed. However, since there is random fluctuation (variation) in the threshold value of the transistor (transistor), the calculation current may deviate from the ideal calculation result, so that the calculation accuracy is affected. This effect is particularly severe at the advanced process node (advanced technology).
In the charge type in-memory computing chip, the result of the multiplication determines whether to charge a computing capacitor, and the accumulation is to share the charge by connecting the computing capacitors together. The calculation of the capacitance is often implemented by a metal-oxide-metal (metal-oxide-metal) capacitance, which is highly accurate at advanced process nodes, so that the calculation result is very accurate. The charge type structure has the advantage of high precision, but requires an additional transistor to control the capacitor charge and discharge, resulting in a larger cell area (cell area) than the conventional 6-tube cell, and lower storage and computation density.
In addition, some data cannot be uploaded to the cloud (closed) to train the neural network for privacy protection purposes. A more efficient solution is to train a generic neural network model from the public data set, download it locally, and fine-tune (fine-tune) a portion of the model through the neural network with the user's own data to achieve the best results for everyone. This solution requires training at the edge. However, training of neural networks (training) requires taking out the transpose (transfer) of the weight matrix, unlike speculation (inference), which is rarely supported by computing chips in current memories.
Therefore, a transposeable and more area efficient charge-in-memory computational circuit is important for the deployment of neural networks at the edge.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a reversible in-memory computing circuit and an implementation method thereof, which are based on charge sharing and charge domain computation of six-tube memory cells and support transposed computation.
It is an object of the present invention to provide a computer circuit capable of being transferred to memory.
The invention relates to a memory-rotatable internal computing circuit, which comprises a memory-rotatable internal computing array and a peripheral circuit;
the rotatable internal computing array comprises 16 multiplied by N local arrays, wherein N is a natural number;
each local array comprises 128×M storage and calculation columns, M is a natural number; all 128×M storage and computation columns in the same local area array are connected together through row computation lines; all the storage and calculation columns positioned in the same column in the 16 XN local arrays are reversely connected together through a bus bit line and a bus bit line; the number of the row calculation lines is 16×N, and the numbers of the total bit lines and the total bit line inversions are 128×M respectively;
Each storage and calculation column comprises 8 multiplied by K six-tube storage units and 1 charge calculation unit, wherein K is a natural number and is in antiparallel connection with a local bit line; the local bit line has parasitic capacitance thereon; the six-tube memory unit stores weight values; all 128 XM memory cells corresponding to the computing unit columns in each local array are positioned in the same row, and 8 XK rows of six memory cells are arranged; the 16×n local arrays share 16×n×8×k=128×n×k rows of six-pipe memory cells;
All 128 XM memory cells in the same local area array and six memory cells in the same row of the computing cell column are connected in parallel with word lines and word line inversions, and the number of the word lines and the word line inversions is 8 XK respectively; the number of word lines and word line inversions of the 16×n local area arrays is 16×n×8×k=128×n×k, respectively;
Each charge calculation unit internally comprises two pre-charge transistors, a row enable switch, a row enable inverse switch, an input switch and an input inverse switch; one precharge line is connected to gate terminals of the two precharge transistors, a row enable line is connected to a control terminal of a row enable switch, the row enable line is reversely connected to a control terminal of a row enable reverse switch, one ends of the row enable switch and the row enable reverse switch are connected to a row calculation line, the other end of the row enable switch is connected to a local bit line, the other end of the driving enable inverse switch is connected to the local bit line inverse, the input line is connected to the control end of the input switch, the input line is inversely connected to the control end of the input inverse switch, one end of the input switch is connected to the local bit line, one end of the input inverse switch is connected to the local bit line inverse, the other end of the input switch is connected to the bus, and the other end of the input inverse switch is connected to the bus inverse; the input switches and the input reversing switches which are positioned in the same column in the 16 XN local area arrays are respectively connected to the same input line and the control end of the input line reversing, namely the number of the input line and the input reversing line is 128 XM;
The peripheral circuit comprises a word line drive, a read-write peripheral circuit, a front transmission driving circuit, 16×N row analog-to-digital converters, 16×M 8 selection 1 multiplexers, 16×M column analog-to-digital converters and a total time sequence control circuit;
The front transmission driving circuit is connected to the control ends of the input switch and the input inverse switch of each charge calculation unit through the input line and the input line, and connected to the control ends of the drive enable switch and the drive inverse switch of each charge calculation unit through the drive enable line and the drive enable line; each channel of the read-write peripheral circuit is respectively connected to each bus bit line and the bus bit line inverse; simultaneously, each bus bit line is respectively connected to a corresponding input port of an 8-by-1 multiplexer, and an output port of the multiplexer is connected to a column analog-to-digital converter; each row of analog-to-digital converters corresponds to a local array, and each local array is connected to a corresponding input port of each row of analog-to-digital converter through a row computing line; the word line drive registers an inverse transmission input value and has an inverse transmission input drive function, and each channel of the word line drive is connected to a corresponding word line; the total time sequence control circuit is respectively connected to the front transmission driving circuit, the read-write peripheral circuit and the word line driver; the total time sequence control circuit is connected to the pre-charge line, so that the total time sequence control circuit is connected to the pre-charge transistor of each charge calculation unit through the pre-charge line; the word line driving has two configuration modes in calculation, namely a front transmission word line driving mode and a back transmission word line driving mode;
In the initial state, the row enable switch, the row enable inverse switch, the input switch and the input inverse switch are all in an off state, the word line and the word line are inverted, the row enable line and the row enable line are inverted, the input line and the input line are inverted and the pre-charge line are all in a low level, and the bit line are inverted and are in a pre-charge voltage;
in the forward mode, the control of the input line and the input line inverse is related to the forward transmission input value; in the back transmission mode, the control of the input line and the back of the input line is irrelevant to the back transmission input value;
Forward mode: a) A pre-charging stage: the precharge line is at a low level, so that the local bit line and the local bit line are reversely precharged to a precharge voltage through a precharge transistor in the charge calculation unit; b) The total time sequence control circuit applies high level to the pre-charge line, and the pre-charge stage is ended; then word line driving applies high level to word line and word line of row in which weight value is needed to be read in 16 XN local area array, word line and word line with high level are selected in reverse, local bit line and local bit line are kept precharge voltage while being discharged to ground according to weight value stored in six tube memory unit on selected word line and word line in reverse, then word line and word line are reapplied low level in reverse, and weight reading operation is completed; c) The front transmission driving circuit reversely applies high level to the input line or the input line in each channel according to the registered front transmission input value, so as to control the closing of the input switch or the input inverse switch, and the corresponding local bit line or local bit line of the closed input switch or the input inverse switch is reversely discharged to the ground; then the input line with the high level is reapplied with the low level in the opposite direction, and multiplication of the input value and the weight value transmitted before completion; d) The front transmission input driving circuit reversely applies high level to all the line enabling lines and the line enabling lines, the line enabling switch and the line enabling inverse switch are closed, the multiplication results of 128 multiplied by M front transmission input values and corresponding weight values are accumulated on a line calculating line and transmitted to a line analog-to-digital converter, the line analog-to-digital converter carries out quantization, and 16 multiplied by N line outputs are output;
Reverse mode: a) A pre-charging stage: the pre-charge line is at a low potential, so that the local bit line and the local bit line are reversely pre-charged to a pre-charge voltage through a pre-charge transistor in the charge calculation unit; b) The total time sequence control circuit applies high level to the pre-charge line, the pre-charge stage is finished, the word line drive applies high level to the word line of the row where the weight value to be read in the local array is located or keeps the word line as low level according to the registered counter-transmission input value, then the word line applied with high level is reapplied with low level, and multiplication of counter-transmission input value and the weight value is completed; c) The total time sequence control circuit applies high level to all input lines, and the input switch is closed, so that all local bit lines in the same column in different local arrays are connected together to finish accumulation, the accumulated data are transmitted to the column analog-to-digital converter, and the column analog-to-digital converter performs quantization and outputs 16×M rows, thereby realizing transposition calculation.
The pre-charging voltage is 0.7-0.9V; the low level is 0.0V, and the high level is 0.7-0.9V.
N is more than or equal to 1 and less than or equal to 4, M is more than or equal to 1 and less than or equal to 9,K, and K is more than or equal to 1 and less than or equal to 4.
Another object of the present invention is to provide a method for implementing a computing circuit in a memory.
The invention discloses a realization method of a rotatable internal computing circuit, which comprises the following steps:
1) Initial state:
the driving enabling switch and the driving enabling inverse switch, the input switch and the input inverse switch are all in an off state, the word line and the word line are inverse, the driving enabling line and the driving enabling line are inverse, the input line and the input line are inverse and the pre-charge line are all in a low level, and the bit line are inverse in pre-charge voltage;
in the forward mode, the control of the input line and the input line inverse is related to the forward transmission input value; in the back transmission mode, the control of the input line and the back of the input line is irrelevant to the back transmission input value;
2) Forward mode:
a) A pre-charging stage: the precharge line is at a low level, so that the local bit line and the local bit line are reversely precharged to a precharge voltage through a precharge transistor in the charge calculation unit;
b) The total time sequence control circuit applies high level to the pre-charge line, and the pre-charge stage is ended; then word line driving applies high level to word line and word line of row in which weight value is needed to be read in 16 XN local area array, word line and word line with high level are selected in reverse, local bit line and local bit line are kept precharge voltage while being discharged to ground according to weight value stored in six tube memory unit on selected word line and word line in reverse, then word line and word line are reapplied low level in reverse, and weight reading operation is completed;
c) The front transmission driving circuit reversely applies high level to the input line or the input line in each channel according to the registered front transmission input value, so as to control the closing of the input switch or the input inverse switch, and the corresponding local bit line or local bit line of the closed input switch or the input inverse switch is reversely discharged to the ground; then the input line with the high level is reapplied with the low level in the opposite direction, and multiplication of the input value and the weight value transmitted before completion;
d) The front transmission input driving circuit reversely applies high level to all the line enabling lines and the line enabling lines, the line enabling switch and the line enabling inverse switch are closed, the multiplication results of 128 multiplied by M front transmission input values and corresponding weight values are accumulated on a line calculating line and transmitted to a line analog-to-digital converter, the line analog-to-digital converter carries out quantization, and 16 multiplied by N line outputs are output;
3) Reverse mode:
a) A pre-charging stage: the pre-charge line is at a low potential, so that the local bit line and the local bit line are reversely pre-charged to a pre-charge voltage through a pre-charge transistor in the charge calculation unit;
b) The total time sequence control circuit applies high level to the pre-charge line, the pre-charge stage is finished, the word line drive applies high level to the word line of the row where the weight value to be read in the local array is located or keeps the word line as low level according to the registered counter-transmission input value, then the word line applied with high level is reapplied with low level, and multiplication of counter-transmission input value and the weight value is completed;
c) The total time sequence control circuit applies high level to all input lines, and the input switch is closed, so that all local bit lines in the same column in different local arrays are connected together to finish accumulation, the accumulated data are transmitted to the column analog-to-digital converter, and the column analog-to-digital converter performs quantization and outputs 16×M rows, thereby realizing transposition calculation.
Wherein in step 2) c), when the current input value is 1, the input switch and the input inverse switch are closed; when the forward input value is 0, the input switch and the input inverse switch are turned off.
In step 3) b), the word line is high when the counter input value is 1, and remains low when the counter input value is 0; when the counter input value and the weight value are both 1, the local bit line will be discharged to ground, otherwise the local bit line will remain at the precharge voltage.
The invention has the advantages that:
compared with the traditional non-transfer internal computing circuit, the transposed computing function of the invention can enable the intelligent chip at the edge to realize the retraining of the edge with lower power consumption; meanwhile, the stability and the accuracy of calculation are improved by charge domain calculation.
Drawings
FIG. 1 is a block diagram of one embodiment of a computer circuit in a portable memory according to the present invention;
FIG. 2 is a schematic diagram of a local array of one embodiment of the inventive in-memory computing circuit, wherein (a) is a block diagram of the local array and (b) is a block diagram of the store and compute column;
FIG. 3 is a flow chart of a forward word line driving mode of an embodiment of a method for implementing a memory-capable internal computing circuit according to the present invention, wherein (a) to (d) are respectively four steps of the flow chart, the left is a schematic diagram, and the right is a timing diagram;
Fig. 4 is a flowchart of an implementation method of the memory-transferable in-memory computing circuit according to an embodiment of the present invention, in which (a) to (c) are flowcharts of three steps, respectively, and the left is a schematic diagram and the right is a timing diagram.
Detailed Description
The invention will be further elucidated by means of specific embodiments in conjunction with the accompanying drawings.
As shown in fig. 1, in the present embodiment, n=m=k=1, and the in-memory-capable computing circuit of the present embodiment includes a 128×128 in-memory-capable computing array and peripheral circuits;
The in-memory-transferable computing array includes first through sixteenth local area arrays;
Each local area array includes first through 128 th storage and computation columns; all 128 storage and calculation columns in the same local area array are connected together through row calculation lines; the storage and calculation columns positioned in the same column in all 16 local arrays are reversely connected together through a bus bit line and a bus bit line; the number of the row calculation lines is 16, and the numbers of the total bit lines and the total bit line inversions are 128, namely the first to 128 th total bit lines and the first to 128 th total bit line inversions;
Each storage and calculation column comprises a first to an eighth six-pipe storage units and 1 charge calculation unit, and is in antiparallel connection with the local bit line through the local bit line; the local bit line has parasitic capacitance thereon; the six-tube memory unit stores weight values; all 128 corresponding six-tube storage units of all storage and calculation unit columns in each local array are positioned in the same row, and 8 rows of six-tube storage units are arranged; 16 local arrays have 16×8=128 rows of six-pipe memory cells;
Each six-pipe memory unit internally comprises two cross-coupled inverters and two access pipes reversely controlled by word lines and word lines; all 128 memory cells in the same local area array and six memory cells in the same row of the computing cell column are connected in parallel with word lines and word line inversions, and the number of the word lines and the word line inversions is 8 respectively; the number of word lines and word line inversions of the 16 local area arrays is 16×8=128 respectively;
Each charge calculation unit internally comprises two pre-charge transistors, a row enable switch, a row enable inverse switch, an input switch and an input inverse switch; the device comprises a precharge line, a row enable switch, a row computation line, a local bit line, an input switch, a local bit line, an input switch and a global bit line, wherein the precharge line is connected to gate ends of two precharge transistors; the input switches and the input reversing switches which are positioned in the same column in the 16 local arrays are respectively connected to the same input line and the control end of the input line reversing, namely the number of the input lines and the input reversing lines is 128;
The peripheral circuit comprises a word line drive, a read-write peripheral circuit, a front transmission driving circuit, 16 row analog-to-digital converters, 16 8-to-1 multiplexers, 16 column analog-to-digital converters and a total time sequence control circuit;
The front transmission driving circuit is connected to the control ends of the input switch and the input inverse switch of each charge calculation unit through the input line and the input line, and connected to the control ends of the drive enable switch and the drive inverse switch of each charge calculation unit through the drive enable line and the drive enable line; each channel of the read-write peripheral circuit is respectively connected to each bus bit line and the bus bit line inverse; simultaneously, each bus bit line is respectively connected to a corresponding input port of an 8-by-1 multiplexer, and an output port of the multiplexer is connected to a column analog-to-digital converter; each row of analog-to-digital converters corresponds to a local array, and each local array is connected to a corresponding input port of each row of analog-to-digital converter through a row computing line; the word line drive registers an inverse transmission input value and has an inverse transmission input drive function, and each channel of the word line drive is connected to a corresponding word line; the total time sequence control circuit is respectively connected to the front transmission driving circuit, the read-write peripheral circuit and the word line driver; the total time sequence control circuit is connected to the pre-charge line, so that the total time sequence control circuit is connected to the pre-charge transistor of each charge calculation unit through the pre-charge line; the word line driving has two configuration modes in calculation, namely a forward word line driving mode and a reverse word line driving mode.
The implementation method of the computer circuit in the transferable memory of the embodiment comprises the following steps:
1) Initial state:
the driving enabling switch and the driving enabling inverse switch, the input switch and the input inverse switch are all in an off state, the word line and the word line are inverse, the driving enabling line and the driving enabling line are inverse, the input line and the input line are inverse and the pre-charge line are all in a low level, and the bit line are inverse in pre-charge voltage;
in the forward mode, the control of the input line and the input line inverse is related to the forward transmission input value; in the back transmission mode, the control of the input line and the back of the input line is irrelevant to the back transmission input value;
2) Forward mode, as shown in fig. 3:
a) A pre-charging stage: the precharge line is at a low level (0.0 v) so that the local bit line and the local bit line are reversely precharged to a precharge voltage (0.9 v) through the precharge transistor in the charge calculation unit, as shown in fig. 3 (a);
b) The total time sequence control circuit applies a high level (0.9V) to the pre-charge line, and the pre-charge stage is ended; then, the word line driving simultaneously applies high level to the word line and the word line of the row where the weight value is required to be read in the 16 local area arrays, the local bit line and the local bit line are reversely discharged to the ground according to the weight value stored in the six-tube memory unit on the selected word line and the word line, the precharge voltage is kept, and then the word line and the word line are reversely reapplied with low level to complete the weight reading operation, as shown in fig. 3 (b);
c) The front transmission driving circuit reversely applies high level to the input line or the input line in each channel according to the registered front transmission input value, so as to control the input switch or the input inverse switch to be closed, the front transmission input value is closed when 1, and the front transmission input value is opened when 0, and the corresponding local bit line or the local bit line of the closed input switch or the input inverse switch is reversely discharged to the ground;
the input line to which the high level is applied and the input line to which the low level is applied are reversed, and multiplication of the input value and the weight value before completion is performed, as shown in fig. 3 (c);
d) The front transmission-in driving circuit reversely applies high level to all the row enable lines and the row enable lines, the row enable switch and the row enable reverse switch are closed, the results of multiplication of 128 front transmission-in values and corresponding weight values are accumulated on a row calculation line and transmitted to a row analog-to-digital converter, and the row analog-to-digital converter performs quantization, as shown in fig. 3 (d);
3) Reverse mode, as shown in fig. 4:
a) A pre-charging stage: the precharge line is at a low potential, thereby reversely precharging the local bit line and the local bit line to a precharge voltage (0.9 v) through the precharge transistor in the charge calculation unit, as shown in fig. 4 (a);
b) The general time sequence control circuit applies a high level to a pre-charge line, the pre-charge stage is finished, a word line drive applies a high level to a word line of a row where a weight value to be read in a local array is located or keeps the word line as a low level according to a registered counter-transmission input value, if the counter-transmission input value is 1, the word line is at a high level (0.9), if the counter-transmission input value is 0, the word line is kept at a low level, if the counter-transmission input value and the weight value are both 1, a local bit line is discharged to the ground, otherwise, the local bit line is kept at a pre-charge voltage, and then the word line to which the high level is applied is reapplied with a low level, so that multiplication of the counter-transmission input value and the weight value is completed, as shown in fig. 4 (b);
c) The overall timing control circuit applies a high level to all input lines, the input switches are closed, so that all local bit lines in the same column in different local arrays are connected together, the accumulated data is transferred to the column analog-to-digital converter and quantized by the column analog-to-digital converter as shown in fig. 4 (c).
Finally, it should be noted that the examples are disclosed for the purpose of aiding in the further understanding of the present invention, but those skilled in the art will appreciate that: various alternatives and modifications are possible without departing from the spirit and scope of the invention and the appended claims. Therefore, the invention should not be limited to the disclosed embodiments, but rather the scope of the invention is defined by the appended claims.
Claims (6)
1. A pivotable in-memory computing circuit, wherein the pivotable in-memory computing circuit comprises a pivotable in-memory computing array and a peripheral circuit;
the rotatable internal computing array comprises 16 multiplied by N local arrays, wherein N is a natural number;
each local array comprises 128×M storage and calculation columns, M is a natural number; all 128×M storage and computation columns in the same local area array are connected together through row computation lines; all the storage and calculation columns positioned in the same column in the 16 XN local arrays are reversely connected together through a bus bit line and a bus bit line; the number of the row calculation lines is 16×N, and the numbers of the total bit lines and the total bit line inversions are 128×M respectively;
Each storage and calculation column comprises 8 multiplied by K six-tube storage units and 1 charge calculation unit, wherein K is a natural number and is in antiparallel connection with a local bit line; the local bit line has parasitic capacitance thereon; the six-tube memory unit stores weight values; all 128 XM memory cells corresponding to the computing unit columns in each local array are positioned in the same row, and 8 XK rows of six memory cells are arranged; the 16×n local arrays share 16×n×8×k=128×n×k rows of six-pipe memory cells;
All 128 XM memory cells in the same local area array and six memory cells in the same row of the computing cell column are connected in parallel with word lines and word line inversions, and the number of the word lines and the word line inversions is 8 XK respectively; the number of word lines and word line inversions of the 16×n local area arrays is 16×n×8×k=128×n×k, respectively;
Each charge calculation unit internally comprises two pre-charge transistors, a row enable switch, a row enable inverse switch, an input switch and an input inverse switch; one precharge line is connected to gate terminals of the two precharge transistors, a row enable line is connected to a control terminal of a row enable switch, the row enable line is reversely connected to a control terminal of a row enable reverse switch, one ends of the row enable switch and the row enable reverse switch are connected to a row calculation line, the other end of the row enable switch is connected to a local bit line, the other end of the driving enable inverse switch is connected to the local bit line inverse, the input line is connected to the control end of the input switch, the input line is inversely connected to the control end of the input inverse switch, one end of the input switch is connected to the local bit line, one end of the input inverse switch is connected to the local bit line inverse, the other end of the input switch is connected to the bus, and the other end of the input inverse switch is connected to the bus inverse; the input switches and the input reversing switches which are positioned in the same column in the 16 XN local area arrays are respectively connected to the same input line and the control end of the input line reversing, namely the number of the input line and the input reversing line is 128 XM;
The peripheral circuit comprises a word line drive, a read-write peripheral circuit, a front transmission driving circuit, 16×N row analog-to-digital converters, 16×M 8 selection 1 multiplexers, 16×M column analog-to-digital converters and a total time sequence control circuit;
The front transmission driving circuit is connected to the control ends of the input switch and the input inverse switch of each charge calculation unit through the input line and the input line, and connected to the control ends of the drive enable switch and the drive inverse switch of each charge calculation unit through the drive enable line and the drive enable line; each channel of the read-write peripheral circuit is respectively connected to each bus bit line and the bus bit line inverse; simultaneously, each bus bit line is respectively connected to a corresponding input port of an 8-by-1 multiplexer, and an output port of the multiplexer is connected to a column analog-to-digital converter; each row of analog-to-digital converters corresponds to a local array, and each local array is connected to a corresponding input port of each row of analog-to-digital converter through a row computing line; the word line drive registers an inverse transmission input value and has an inverse transmission input drive function, and each channel of the word line drive is connected to a corresponding word line; the total time sequence control circuit is respectively connected to the front transmission driving circuit, the read-write peripheral circuit and the word line driver; the total time sequence control circuit is connected to the pre-charge line, so that the total time sequence control circuit is connected to the pre-charge transistor of each charge calculation unit through the pre-charge line; the word line driving has two configuration modes in calculation, namely a front transmission word line driving mode and a back transmission word line driving mode;
In the initial state, the row enable switch, the row enable inverse switch, the input switch and the input inverse switch are all in an off state, the word line and the word line are inverted, the row enable line and the row enable line are inverted, the input line and the input line are inverted and the pre-charge line are all in a low level, and the bit line are inverted and are in a pre-charge voltage;
in the forward mode, the control of the input line and the input line inverse is related to the forward transmission input value; in the back transmission mode, the control of the input line and the back of the input line is irrelevant to the back transmission input value;
Forward mode: a) A pre-charging stage: the precharge line is at a low level, so that the local bit line and the local bit line are reversely precharged to a precharge voltage through a precharge transistor in the charge calculation unit; b) The total time sequence control circuit applies high level to the pre-charge line, and the pre-charge stage is ended; then word line driving applies high level to word line and word line of row in which weight value is needed to be read in 16 XN local area array, word line and word line with high level are selected in reverse, local bit line and local bit line are kept precharge voltage while being discharged to ground according to weight value stored in six tube memory unit on selected word line and word line in reverse, then word line and word line are reapplied low level in reverse, and weight reading operation is completed; c) The front transmission driving circuit reversely applies high level to the input line or the input line in each channel according to the registered front transmission input value, so as to control the closing of the input switch or the input inverse switch, and the corresponding local bit line or local bit line of the closed input switch or the input inverse switch is reversely discharged to the ground; then the input line with the high level is reapplied with the low level in the opposite direction, and multiplication of the input value and the weight value transmitted before completion; d) The front transmission input driving circuit reversely applies high level to all the line enabling lines and the line enabling lines, the line enabling switch and the line enabling inverse switch are closed, the multiplication results of 128 multiplied by M front transmission input values and corresponding weight values are accumulated on a line calculating line and transmitted to a line analog-to-digital converter, the line analog-to-digital converter carries out quantization, and 16 multiplied by N line outputs are output;
Reverse mode: a) A pre-charging stage: the pre-charge line is at a low potential, so that the local bit line and the local bit line are reversely pre-charged to a pre-charge voltage through a pre-charge transistor in the charge calculation unit; b) The total time sequence control circuit applies high level to the pre-charge line, the pre-charge stage is finished, the word line drive applies high level to the word line of the row where the weight value to be read in the local array is located or keeps the word line as low level according to the registered counter-transmission input value, then the word line applied with high level is reapplied with low level, and multiplication of counter-transmission input value and the weight value is completed; c) The total time sequence control circuit applies high level to all input lines, and the input switch is closed, so that all local bit lines in the same column in different local arrays are connected together to finish accumulation, the accumulated data are transmitted to the column analog-to-digital converter, and the column analog-to-digital converter performs quantization and outputs 16×M rows, thereby realizing transposition calculation.
2. The in-memory-transferable computing circuit of claim 1, wherein the precharge voltage is 0.7-0.9V; the low level is 0.09V, and the high level is 0.7-0.9V.
3. The invertible in-memory computing circuit of claim 1, wherein N satisfies 1.ltoreq.n.ltoreq.4, M satisfies 1.ltoreq.m.ltoreq. 9,K satisfies 1.ltoreq.k.ltoreq.4.
4. A method of implementing the in-memory-capable computing circuit of claim 1, the method comprising the steps of:
1) Initial state:
the driving enabling switch and the driving enabling inverse switch, the input switch and the input inverse switch are all in an off state, the word line and the word line are inverse, the driving enabling line and the driving enabling line are inverse, the input line and the input line are inverse and the pre-charge line are all in a low level, and the bit line are inverse in pre-charge voltage;
in the forward mode, the control of the input line and the input line inverse is related to the forward transmission input value; in the back transmission mode, the control of the input line and the back of the input line is irrelevant to the back transmission input value;
2) Forward mode:
a) A pre-charging stage: the precharge line is at a low level, so that the local bit line and the local bit line are reversely precharged to a precharge voltage through a precharge transistor in the charge calculation unit;
b) The total time sequence control circuit applies high level to the pre-charge line, and the pre-charge stage is ended; then word line driving applies high level to word line and word line of row in which weight value is needed to be read in 16 XN local area array, word line and word line with high level are selected in reverse, local bit line and local bit line are kept precharge voltage while being discharged to ground according to weight value stored in six tube memory unit on selected word line and word line in reverse, then word line and word line are reapplied low level in reverse, and weight reading operation is completed;
c) The front transmission driving circuit reversely applies high level to the input line or the input line in each channel according to the registered front transmission input value, so as to control the closing of the input switch or the input inverse switch, and the corresponding local bit line or local bit line of the closed input switch or the input inverse switch is reversely discharged to the ground; then the input line with the high level is reapplied with the low level in the opposite direction, and multiplication of the input value and the weight value transmitted before completion;
d) The front transmission input driving circuit reversely applies high level to all the line enabling lines and the line enabling lines, the line enabling switch and the line enabling inverse switch are closed, the multiplication results of 128 multiplied by M front transmission input values and corresponding weight values are accumulated on a line calculating line and transmitted to a line analog-to-digital converter, the line analog-to-digital converter carries out quantization, and 16 multiplied by N line outputs are output;
3) Reverse mode:
a) A pre-charging stage: the pre-charge line is at a low potential, so that the local bit line and the local bit line are reversely pre-charged to a pre-charge voltage through a pre-charge transistor in the charge calculation unit;
b) The total time sequence control circuit applies high level to the pre-charge line, the pre-charge stage is finished, the word line drive applies high level to the word line of the row where the weight value to be read in the local array is located or keeps the word line as low level according to the registered counter-transmission input value, then the word line applied with high level is reapplied with low level, and multiplication of counter-transmission input value and the weight value is completed;
c) The total time sequence control circuit applies high level to all input lines, and the input switch is closed, so that all local bit lines in the same column in different local arrays are connected together to finish accumulation, the accumulated data are transmitted to the column analog-to-digital converter, and the column analog-to-digital converter performs quantization and outputs 16×M rows, thereby realizing transposition calculation.
5. The method of claim 4, wherein in step 2) c), when the forward input value is 1, the input switch and the input inverse switch are closed; when the forward input value is 0, the input switch and the input inverse switch are turned off.
6. The implementation method of claim 4, wherein in b) of the step 3), the word line is high when the feedback input value is 1, and remains low when the feedback input value is 0; when the counter input value and the weight value are both 1, the local bit line will be discharged to ground, otherwise the local bit line will remain at the precharge voltage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111273336.1A CN114093394B (en) | 2021-10-29 | 2021-10-29 | Rotatable internal computing circuit and implementation method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111273336.1A CN114093394B (en) | 2021-10-29 | 2021-10-29 | Rotatable internal computing circuit and implementation method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114093394A CN114093394A (en) | 2022-02-25 |
CN114093394B true CN114093394B (en) | 2024-05-24 |
Family
ID=80298229
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111273336.1A Active CN114093394B (en) | 2021-10-29 | 2021-10-29 | Rotatable internal computing circuit and implementation method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114093394B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115995256B (en) * | 2023-03-23 | 2023-05-16 | 北京大学 | Self-calibration current programming and current calculation type memory calculation circuit and application thereof |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111512375A (en) * | 2017-12-18 | 2020-08-07 | 高通股份有限公司 | Transposed non-volatile (NV) memory (NVM) bit cells and associated data arrays configured for row and column transposed access operations |
CN111816231A (en) * | 2020-07-30 | 2020-10-23 | 中科院微电子研究所南京智能技术研究院 | Memory computing device with double-6T SRAM structure |
CN112071343A (en) * | 2020-08-18 | 2020-12-11 | 安徽大学 | SRAM circuit structure for realizing multiplication by combining capacitor in memory |
CN112151091A (en) * | 2020-09-29 | 2020-12-29 | 中科院微电子研究所南京智能技术研究院 | 8T SRAM unit and memory computing device |
CN112509620A (en) * | 2020-11-30 | 2021-03-16 | 安徽大学 | Data reading circuit based on balance pre-charging and group decoding |
CN112992223A (en) * | 2021-05-20 | 2021-06-18 | 中科院微电子研究所南京智能技术研究院 | Memory computing unit, memory computing array and memory computing device |
-
2021
- 2021-10-29 CN CN202111273336.1A patent/CN114093394B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111512375A (en) * | 2017-12-18 | 2020-08-07 | 高通股份有限公司 | Transposed non-volatile (NV) memory (NVM) bit cells and associated data arrays configured for row and column transposed access operations |
CN111816231A (en) * | 2020-07-30 | 2020-10-23 | 中科院微电子研究所南京智能技术研究院 | Memory computing device with double-6T SRAM structure |
CN112071343A (en) * | 2020-08-18 | 2020-12-11 | 安徽大学 | SRAM circuit structure for realizing multiplication by combining capacitor in memory |
CN112151091A (en) * | 2020-09-29 | 2020-12-29 | 中科院微电子研究所南京智能技术研究院 | 8T SRAM unit and memory computing device |
CN112509620A (en) * | 2020-11-30 | 2021-03-16 | 安徽大学 | Data reading circuit based on balance pre-charging and group decoding |
CN112992223A (en) * | 2021-05-20 | 2021-06-18 | 中科院微电子研究所南京智能技术研究院 | Memory computing unit, memory computing array and memory computing device |
Also Published As
Publication number | Publication date |
---|---|
CN114093394A (en) | 2022-02-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11521051B2 (en) | Memristive neural network computing engine using CMOS-compatible charge-trap-transistor (CTT) | |
CN112183739A (en) | Hardware architecture of memristor-based low-power-consumption pulse convolution neural network | |
US11151439B2 (en) | Computing in-memory system and method based on skyrmion racetrack memory | |
CN115039177A (en) | Low power consumption in-memory compute bit cell | |
CN112599165A (en) | Memory computing unit for multi-bit input and multi-bit weight multiplication accumulation | |
CN114093394B (en) | Rotatable internal computing circuit and implementation method thereof | |
CN115390789A (en) | Magnetic tunnel junction calculation unit-based analog domain full-precision memory calculation circuit and method | |
CN113936717B (en) | Storage and calculation integrated circuit for multiplexing weight | |
CN113703718B (en) | Multi-bit memory computing device with variable weight | |
CN114743580A (en) | Charge sharing memory computing device | |
Liu et al. | A 40-nm 202.3 nJ/classification neuromorphic architecture employing in-SRAM charge-domain compute | |
CN114038492A (en) | Multi-phase sampling memory computing circuit | |
CN117056277A (en) | Multiply-accumulate in-memory computing circuit for configuring self-adaptive scanning ADC (analog-to-digital converter) based on read-write separation SRAM (static random Access memory) | |
CN114882921B (en) | Multi-bit computing device | |
CN114895869B (en) | Multi-bit memory computing device with symbols | |
CN115691613A (en) | Charge type memory calculation implementation method based on memristor and unit structure thereof | |
CN116964675A (en) | In-memory computing with ternary activation | |
CN114944180A (en) | Weight-configurable pulse generating device based on copy column | |
US20230027768A1 (en) | Neural network computing device and computing method thereof | |
CN117636945B (en) | 5-bit signed bit AND OR accumulation operation circuit and CIM circuit | |
CN115995256B (en) | Self-calibration current programming and current calculation type memory calculation circuit and application thereof | |
CN117807021B (en) | 2T-2MTJ memory cell and MRAM in-memory computing circuit | |
CN117877553A (en) | In-memory computing circuit for nonvolatile random access memory | |
CN115658010A (en) | Pulse width modulation circuit, quantization circuit, storage circuit and chip | |
Saragada et al. | An in-memory architecture for machine learning classifier using logistic regression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |