CN112581996B

CN112581996B - Time domain memory internal computing array structure based on magnetic random access memory

Info

Publication number: CN112581996B
Application number: CN202011515150.8A
Authority: CN
Inventors: 蔡浩; 周永亮; 张优优; 刘波
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2023-07-25
Anticipated expiration: 2040-12-21
Also published as: CN112581996A

Abstract

The invention discloses a time domain memory internal computing array structure based on a Magnetic Random Access Memory (MRAM), which belongs to the field of integrated circuit design and is characterized in that: the system comprises a dual-mode memory array, an adaptive pipeline decoder, a pre-charge circuit, a column selector, a sense amplifier, an input/output unit, a delay delta unit, a counting unit, a time sequence control circuit and a mode selection module. The invention has standard read-write mode and in-memory computing mode. The read-write operation of the data in the storage array can be realized in the standard read-write mode; the in-memory computing mode can realize multiply-accumulate operation in binary neural network computing. The invention can read data and complete multiply-accumulate calculation, and the delay quantization unit and the memory array are integrated together to reduce access memory energy consumption.

Description

Time domain memory internal computing array structure based on magnetic random access memory

Technical Field

The invention relates to the field of integrated circuit design, in particular to a Magnetic Random Access Memory (MRAM) -based time domain memory computing array structure and an energy-efficient circuit design method for realizing binary neural network convolution computation based on memory.

Background

In recent years, convolutional neural network (Convolutional NeuralNetworks, CNN) has been attracting a new round of artificial intelligence hot flashes to enlarge the wonderful colors in the fields of image recognition and the like. Convolutional neural networks are hierarchical network structures, as shown in fig. 1, mainly comprising the following hierarchical structures: the system comprises a data input layer, a convolution calculation layer, a function excitation layer, a pooling layer and a full connection layer. As shown in fig. 2, the calculation process can be summarized as that the current network layer performs weighted summation on the activation values of the previous layer, then adds the bias term, and finally obtains the activation values of the next layer through an activation function, and the process can be organized into a form of matrix and vector multiplication, as shown in a formula (1). However, under limited hardware conditions, the bandwidth bottleneck of the memory and the computational power consumption problem severely limit the development pace of CNN. Therefore, how to break through the limitation of the memory, and it is significant to realize the convolution calculation with high energy efficiency. One idea is to limit the activation value and the weight to 1bit number by using a binary neural network (BinaryNeural Network, BNN), the accumulation operation is simplified to bit counting operation, namely counting the number of 1 s and the number of 0 s in the sum or result, the activation function is a sign function, namely judging whether the sum of the difference of the number of 1 s minus the number of 0 s and the offset in the sum or result is larger than 0, outputting 1 if the difference is larger than or equal to 0, otherwise outputting 0, and reducing the calculated data quantity, thereby reducing the power consumption expenditure.

In the conventional von neumann architecture, the memory and the computing unit are two independent parts, and when the computer performs a computing operation, data needs to be fetched from the memory, transferred to the computing unit for computing, and then written back to the memory. All functions of the computer are made up of a similar process that is repeated in numerous cycles. This, among other things, accounts for 92% of the total energy consumption for memory accesses and calculations on data. The in-memory computing (Computing In Memory, CIM) breaks through the von Noodyma architecture of the traditional computer, a computing circuit is embedded into a memory, and the memory and the computing are integrated, so that the data migration and the memory access consumption of the memory are greatly reduced, and the CIM structure adopts an analog computing mode, so that the computing power consumption is further reduced, but the realization of an analog computing circuit which is flexible enough and reliable enough is a key challenge in the CIM architecture. Most CIM analog computing techniques are based on the voltage domain (i.e., use voltage as the processing signal, converting a digital quantity into a linear voltage value). These techniques typically achieve control of the voltage by manipulating the current or charge. But cannot be infinitely subdivided due to the upper limit of the circuit operating voltage. Thus, the press-domain CIM presents a bottleneck, while in time-domain-based CIM architectures, the digital quantity is represented as a linear pulse width or path delay. Since there is no upper limit on time, in theory, time domain computation has great advantages over press domain computation.

Disclosure of Invention

Technical problems: aiming at the defects in the prior art, the invention discloses a time domain memory internal computing array structure based on a Magnetic Random Access Memory (MRAM), the disclosed computing array is a time domain CIM structure based on delay accumulation, and the MRAM computing array integrating computation and storage is realized by reserving an MRAM read-write working mode while realizing convolution computation. The MRAM computing array is optimized in power consumption in the aspects of circuit structure, network structure and the like. And based on the delay delta unit disclosed by the invention, the difficult problem of low calculation quantization precision in the traditional memory is solved.

The technical scheme is as follows: the invention discloses a magnetic random access memory-based time domain memory internal computing array structure, which is realized by the following technical scheme:

the array structure comprises a dual-mode storage array, a self-adaptive pipeline decoder, a pre-charging circuit, a column selector, a sensitive amplifier, an input/output unit, a delay delta unit, a counting unit, a time sequence control circuit and a mode selection module;

the dual mode memory array includes: the nonvolatile memory units formed by the magnetic tunnel structures are arranged in a matrix, delay accumulation logic is designed among each column of memory columns, and the memory array can realize function switching between a standard read-write mode and a binary neural network convolution calculation mode; nonvolatile memory cell

The mode selection module is used for switching the standard read-write function and the binary neural network convolution calculation function of the dual-mode storage array;

the adaptive pipeline decoder and the column selector are used for decoding the rows and columns of the memory array in a standard read-write mode; in a binary neural network convolution calculation mode, the binary neural network convolution calculation mode is used for starting word lines so as to realize discharge of capacitance of the bit lines and traverse the whole array row by row, so that an accumulation delay1 is generated;

the pre-charge circuit, the sense amplifier, the input/output unit and the time sequence control circuit are used for realizing read-write operation and convolution calculation operation of the dual-mode memory array; the precharge circuit charges the bit line capacitance in a standard read-write function mode and a convolution calculation function mode;

the delay delta unit and the counting unit are used for quantizing the delay obtained by traversing the storage array in a binary neural network convolution calculation mode, so that a digital result is obtained, and the quantized result is the result of the binary neural network convolution calculation.

The dual-mode memory array comprises N columns of M rows of memory cells, and logic control circuits are inserted between word lines of adjacent memory cells in each row; the logic control circuit comprises a first AND gate and a second AND gate, wherein the first AND gate and the second AND gate are both double-end input single-end output; the first input end of the first AND gate is connected with a mode selection signal, and the second input end of the first AND gate is connected with a former-stage word line; the first input end of the second AND gate is connected with the word line of the previous stage, and the second input end of the second AND gate is connected with the output signal of the sense amplifier of the previous stage; the output ends of the first AND gate and the second AND gate are connected with the word line of the next stage.

The mode selection module generates the mode selection signal according to an external enabling signal, and when the mode selection signal is high level 1, the dual-mode storage array has a standard read-write function; when the mode selection signal is low level 0, the dual mode memory array is a convolution calculation function.

In a standard read-write function mode, the self-adaptive pipelined decoder and the column selector perform read-write access on corresponding memory cells in the array according to external address signals; in the convolution calculation function mode, the self-adaptive pipeline decoder accesses all storage units step by step to generate a delay accumulated signal delay1.

The nonvolatile memory unit comprises a first NMOS tube and a first magnetic tunnel junction device, wherein the first magnetic tunnel junction device can be in a high-resistance state and a low-resistance state according to different directions of write operation current; one end of the first magnetic tunnel junction device is connected with a bit line, and the other end of the first magnetic tunnel junction device is connected with the source electrode end of the first NMOS tube; and the grid end of the first NMOS tube is connected with the word line, and the drain electrode is connected with the source line.

The sense amplifier includes

The grid electrode of the first PMOS tube is connected with the first node, the source electrode of the first PMOS tube is connected with the power supply, and the drain electrode of the first PMOS tube is connected with the first node;

the grid electrode of the second PMOS tube is connected with the first node, the source electrode of the second PMOS tube is connected with the power supply, and the drain electrode of the second PMOS tube is connected with the second node;

the grid electrode of the first NMOS tube is connected with an external reference voltage source, the source electrode of the first NMOS tube is connected with the third node, and the drain electrode of the first NMOS tube is connected with the first node;

a second NMOS transistor, the grid electrode of which is connected with the bit line, the source electrode of which is connected with the third node, and the drain electrode of which is connected with the second node;

a grid electrode of the third NMOS tube is connected with an external enabling signal, a source electrode of the third NMOS tube is connected with the ground, and a drain electrode of the third NMOS tube is connected with a third node;

and the input end of the first CMOS inverter is connected with the first node, and the output signal is Q.

The delay delta unit comprises a reference delay generating circuit and a unit clock generating circuit; wherein the reference delay generating circuit includes: the array structure comprises N columns and M rows of memory cells, and logic control circuits are inserted between word lines of each row of adjacent memory cells. The logic control circuit comprises a first AND gate and a second AND gate, wherein the first AND gate and the second AND gate are both double-end input single-end output; the first input end of the first AND gate is connected with a mode selection signal, and the second input end of the first AND gate is connected with a former-stage word line; the first input end of the second AND gate is connected with the word line of the previous stage, and the second input end of the second AND gate is connected with the output signal Q of the sensitive amplifier of the previous stage; the output ends of the first AND gate and the second AND gate are connected with a next-stage word line; the data in the storage units are all 0; in a binary neural network convolution calculation mode, the reference delay generating circuit generates a reference delay0 by executing the same bit line capacitor discharging operation as the dual-mode storage array and traversing the whole array step by step;

the unit clock generation circuit includes:

the first copying column, the second copying column and the logic control unit have the same structure as the storage column structure in the dual-mode storage array; the first replica column comprises a first differential resistor, a first replica bit line, a first replica source line and a first sense amplifier; the second replica column includes a second differential resistor, a second replica bit line, a second replica source line, and a second sense amplifier.

The first replication column includes:

the first differential resistor is a double-end device, and the resistance value of the first differential resistor is the difference between the high resistance state and the low resistance state of the nonvolatile memory unit; a first end of the first differential resistor is connected with a replica bit line of the first replica column; the second end of the first differential resistor is connected with the source end of the first NMOS tube, the grid electrode is connected with the first word line, and the drain electrode is connected with the first copy source line; the grid electrode of the first charging PMOS tube is connected with the first word line, the source electrode of the first charging PMOS tube is connected with the power supply, and the drain electrode of the first charging PMOS tube is connected with the first replica bit line; the word line signals of the redundant memory cells in the first replica column are all set to 0 and the first sense amplifier output signal is C1.

The second replication column includes:

the second differential resistor is a double-end device, and the resistance value of the second differential resistor is the difference between the high resistance state and the low resistance state of the nonvolatile memory unit; the first end point of the second differential resistor is connected with a replica bit line of a second replica column; the second end point of the second differential resistor is connected with the source electrode end of the second NMOS tube, the grid electrode is connected with the second word line, and the drain electrode is connected with the second copy source line; a second charge PMOS transistor, the grid electrode of which is connected with a second word line, the source electrode of which is connected with a power supply, and the drain electrode of which is connected with a second replica bit line; the word line signals of the redundant memory cells in the second replica column are all set to 0 and the second sense amplifier output signal is C2.

The logic control unit includes:

the first NOR gate is a single-ended output circuit with two ends input, the first input end of the first NOR gate is connected with an external control signal, the second input end of the first NOR gate is connected with an output signal C2 of the second sensitive amplifier, and the output end of the first NOR gate is connected with a node;

the grid electrode of the first PMOS tube is connected with the node, the source electrode of the first PMOS tube is connected with the power supply, and the drain electrode of the first PMOS tube is connected with the node;

the grid electrode of the first NMOS tube is connected with the output signal C2 of the second sensitive amplifier, the source electrode of the first NMOS tube is connected with the ground, and the drain electrode of the first NMOS tube is connected with a third node;

a second NMOS tube, the grid electrode of which is connected with an external control signal, the source electrode of which is connected with the ground, and the drain electrode of which is connected with a third node;

the second NOR gate is a single-ended output circuit with two ends input, the first input end of the second NOR gate is connected with a node, the second input end of the second NOR gate is connected with an external control signal, and the output end of the second NOR gate is connected with the first word line;

the input end of the first inverter is connected with the output signal C1 of the first sense amplifier, and the output end of the first inverter is connected with a node;

a grid electrode of the third NMOS tube is connected with an output signal C1 of the first sense amplifier, a source electrode of the third NMOS tube is connected with the ground, and a drain electrode of the third NMOS tube is connected with a node;

the grid electrode of the second PMOS tube is connected with the node, the source electrode of the second PMOS tube is connected with the power supply, and the drain electrode of the second PMOS tube is connected with the node;

and the input end of the second inverter is connected with the node, and the output of the second inverter is connected with the second word line.

The beneficial effects are that: the invention adopts the technical scheme and has the following beneficial effects:

(1) The invention completes the multiplication accumulation operation needed in the binary neural network through bit line discharge delay accumulation and delay quantization by using the replication columns in the MRAM array, namely completes convolution operation while accessing memory, and the access memory power consumption is calculated and shared, thereby reducing the access memory power consumption, and compared with an accelerator adopting a von neumann architecture, the invention can obviously reduce the overall power consumption of the network and improve the energy efficiency.

(2) According to the invention, accumulation and activation operation of the binarization neural network is realized in a time domain, and based on a time domain CIM structure of delay accumulation, the convolution calculation is realized, meanwhile, the working mode of the MRAM is reserved, and the MRAM calculation array integrating calculation and storage is realized.

(3) The invention utilizes the copy column to realize the delay delta unit, and the time difference quantization unit quantizes accumulated delay generated in the array calculation mode and converts the accumulated delay into a digital signal. The copying column can effectively track the process fluctuation of the storage array, and has higher quantization precision, so that the system can work under wide voltage, and the power consumption is reduced and the energy efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of the development history of an artificial neural network in the prior art provided in the background art;

FIG. 2 is a schematic diagram of the basic structure of a convolutional neural network in the prior art provided in the background art;

FIG. 3 is a block diagram of a computational array in a time domain memory based on a MRAM according to an embodiment of the present invention;

FIG. 4 is a truth table of a binary neural network operation in a computational array structure in a time domain memory based on a MRAM according to an embodiment of the present invention;

FIG. 5 is a circuit diagram of a memory array in a time domain memory based computational array structure according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a 2bit time domain calculation example in a time domain memory based on a magnetic random access memory in a calculation array structure according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a sense amplifier in a time domain memory based on a magnetic random access memory in a computational array structure according to an embodiment of the present invention;

FIG. 8 is a diagram of a delay delta unit in a time domain memory based on a magnetic random access memory in an array structure according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of transient simulation waveforms of a delay delta unit in a computational array structure in a time domain memory based on a MRAM according to an embodiment of the present invention;

FIG. 10 is a diagram of simulation results of a 4bit convolution operation Monte Carlo in a computational array structure in a time domain memory based on a magnetic random access memory according to an embodiment of the present invention;

FIG. 11 is a graph of transient simulation results of convolution operation in a computational array structure in a time domain memory based on a magnetic random access memory according to an embodiment of the present invention;

fig. 12 is a schematic diagram of calculating system power consumption and energy efficiency along with working voltage in an array structure in a time domain memory based on a magnetic random access memory according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

the dual mode memory array includes: the nonvolatile memory units formed by the Magnetic Tunnel Junctions (MTJs) are arranged in a matrix, delay accumulation logic is designed among each column of memory columns, and the memory array can realize function switching between a standard read-write mode and a binary neural network convolution calculation mode; nonvolatile memory cell

the adaptive pipeline decoder and the column selector are used for decoding the rows and columns of the memory array in a standard read-write mode; is used for starting word lines in a binary neural network convolution calculation mode so as to realize the bit line capacitance C _BL Discharging and traversing the whole array row by row, thereby generating an accumulated delay1;

The dual-mode memory array comprises N columns of M rows of memory cells, and logic control circuits are inserted between word lines of adjacent memory cells in each row; the logic control circuit comprises a first AND gate AND1 AND a second AND gate AND2, wherein the first AND gate AND the second AND gate are both double-ended input AND single-ended output; a first input end of the first AND gate is connected with a mode selection signal MEN, and a second input end of the first AND gate is connected with a former-stage word line; the first input end of the second AND gate is connected with the word line of the previous stage, and the second input end of the second AND gate is connected with the output signal Q of the sensitive amplifier of the previous stage; the output ends of the first AND gate and the second AND gate are connected with the word line of the next stage.

The mode selection module generates the mode selection signal MEN according to an external enabling signal, and when the mode selection signal MEN is high level 1, the dual-mode storage array is a standard read-write function; when the mode selection signal MEN is low level 0, the dual mode memory array is a convolution calculation function.

The nonvolatile memory unit comprises a first NMOS tube N1 and a first magnetic tunnel junction device MTJ, wherein the first magnetic tunnel junction device MTJ can be in a high-resistance state and a low-resistance state according to different directions of write operation currents; one end of the first magnetic tunnel junction device MTJ is connected with the bit line BL, and the other end of the first magnetic tunnel junction device MTJ is connected with the source end of the first NMOS tube N1; the gate end of the first NMOS transistor N1 is connected to the word line WL, and the drain is connected to the source line SL.

The sense amplifier includes

The grid electrode of the first PMOS tube P1 is connected with the first node NET1, the source electrode of the first PMOS tube P1 is connected with the power supply VDD, and the drain electrode of the first PMOS tube P1 is connected with the first node NET1;

the grid electrode of the second PMOS tube P2 is connected with the first node NET1, the source electrode is connected with the power supply VDD, and the drain electrode is connected with the second node NET2;

the grid electrode of the first NMOS tube N1 is connected with an external reference voltage source REF, the source electrode of the first NMOS tube N1 is connected with a third node NET3, and the drain electrode of the first NMOS tube N1 is connected with the first node NET1;

a second NMOS transistor N2, wherein the gate electrode of the second NMOS transistor is connected with the bit line BL, the source electrode of the second NMOS transistor is connected with the third node NET3, and the drain electrode of the second NMOS transistor is connected with the second node NET2;

a third NMOS transistor N3 having a gate connected to the external enable signal SAE, a source connected to the ground GND, and a drain connected to the third node NET3;

the input end of the first CMOS inverter INV1 is connected with the first node NET2, and the output signal is Q.

The delay delta unit comprises a reference delay generating circuit and a unit clock generating circuit; wherein the reference delay generating circuit includes: the array structure comprises N columns and M rows of memory cells, and logic control circuits are inserted between word lines of each row of adjacent memory cells. The logic control circuit comprises a first AND gate AND1 AND a second AND gate AND2, wherein the first AND gate AND the second AND gate are both double-ended input AND single-ended output; a first input end of the first AND gate is connected with a mode selection signal MEN, and a second input end of the first AND gate is connected with a former-stage word line; the first input end of the second AND gate is connected with the word line of the previous stage, and the second input end of the second AND gate is connected with the output signal Q of the sensitive amplifier of the previous stage; the output ends of the first AND gate and the second AND gate are connected with a next-stage word line; the data in the storage units are all 0; in the binary neural network convolution calculation mode, the reference delay generation circuit performs the same bit line capacitance C as the dual mode memory array _BL Discharging operation, and traversing the whole array step by step, so as to generate a reference delay0;

the unit clock generation circuit includes:

the first copying column, the second copying column and the logic control unit have the same structure as the storage column structure in the dual-mode storage array; the first replica column comprises a first differential resistor, a first replica bit line RBL, a first replica source line RSL and a first sense amplifier SA1; the second replica column includes a second differential resistor, a second replica bit line RBLB, a second replica source line RSLB, and a second sense amplifier SA2.

The first replication column includes:

the first differential resistor R1 is a double-end device, and the resistance value of the first differential resistor R1 is the difference between the high resistance state and the low resistance state of the nonvolatile memory cell; a first end of the first differential resistor R1 is connected with a replica bit line RBL of a first replica column; the second end of the first differential resistor R1 is connected with the source end of the first NMOS tube N1, the grid electrode is connected with the first word line WL1, and the drain electrode is connected with the first copy source line RSL; the first charging PMOS tube P1, the grid electrode of which is connected with the first word line WL1, the source electrode of which is connected with the power supply VDD, and the drain electrode of which is connected with the first replica bit line RBL; the word line signals of the redundant memory cells in the first replica column are all set to 0, and the first sense amplifier SA1 outputs a signal C1.

The second replication column includes:

the second differential resistor R2 is a double-end device, and the resistance value of the second differential resistor R2 is the difference between the high resistance state and the low resistance state of the nonvolatile memory unit; the first end point of the second differential resistor R2 is connected with a replica bit line RBLB of a second replica column; the second end point of the second differential resistor is connected with the source end of the second NMOS tube N2, the grid electrode is connected with the second word line WL2, and the drain electrode is connected with the second copy source line RSLB; a second charge PMOS transistor P2, the grid electrode of which is connected with a second word line WL2, the source electrode of which is connected with a power supply VDD, and the drain electrode of which is connected with a second replica bit line RBLB; the word line signals of the redundant memory cells in the second replica column are all set to 0, and the second sense amplifier SA2 outputs a signal C2.

The logic control unit includes:

the first Nor gate Nor1 is a single-ended output circuit with two ends input, a first input end of the first Nor gate is connected with an external control signal EN, a second input end of the first Nor gate is connected with an output signal C2 of the second sense amplifier SA2, and an output end of the first Nor gate is connected with a node NET1;

the grid electrode of the first PMOS tube is connected with the node NET2, the source electrode of the first PMOS tube is connected with the power supply VDD, and the drain electrode of the first PMOS tube is connected with the node NET3;

the grid electrode of the first NMOS tube is connected with the output signal C2 of the second sense amplifier SA2, the source electrode of the first NMOS tube is connected with the ground GND, and the drain electrode of the first NMOS tube is connected with the third node NET3;

the grid electrode of the second NMOS tube is connected with an external control signal EN, the source electrode of the second NMOS tube is connected with the ground GND, and the drain electrode of the second NMOS tube is connected with a third node NET3;

the second Nor gate Nor2 is a two-end input single-end output circuit, the first input end of the second Nor gate Nor2 is connected with the node NET3, the second input end of the second Nor gate Nor2 is connected with an external control signal EN, and the output end of the second Nor gate Nor2 is connected with the first word line WL1;

a first inverter INV1, an input end of which is connected to the output signal C1 of the first sense amplifier SA1, and an output end of which is connected to the node NET2;

a third NMOS transistor having a gate connected to the output signal C1 of the first sense amplifier SA1, a source connected to the ground GND, and a drain connected to the node NET4;

the grid electrode of the second PMOS tube is connected with the node NET1, the source electrode of the second PMOS tube is connected with the power supply VDD, and the drain electrode of the second PMOS tube is connected with the node NET4;

the input end of the second inverter INV2 is connected to the node NET4, and the output end of the second inverter INV is connected to the second word line WL2.

Examples

In-memory computing circuit suitable for full-connection binarization neural network specifically includes: the system comprises a dual-mode memory array, an adaptive pipeline decoder, a pre-charge circuit, a column selector, a sense amplifier, an input/output unit, a delay delta unit, a counting unit, a time sequence control circuit and a mode selection module.

The weight matrix M in equation (2) is mapped in the dual mode storage array disclosed in the present invention as:

the mapping mode is that the weight matrix M in the formula (2) is transposed along the diagonal line as in the formula (3), and the transposed matrix coordinate is stored in a storage unit in the dual-mode storage array disclosed by the invention.

The activation value vector V in equation (2) is mapped in the dual mode storage array disclosed herein as:

the mapping mode is that an activation value vector V in a formula (2) is applied to the dual-mode memory array disclosed by the invention in the form of a word line signal, as in a formula (4), when the element in the activation value vector V is 1, the word line is opened; when the element in the activation value vector V is 0, the word line is turned off.

A time domain memory internal computing array structure based on a magnetic random access memory comprises a dual-mode memory array, a self-adaptive pipeline decoder, a pre-charging circuit, a column selector, a sense amplifier, an input/output unit, a delay delta unit, a counting unit, a time sequence control circuit and a mode selection module;

the dual mode memory array includes: the nonvolatile memory units formed by Magnetic Tunnel Junctions (MTJs) are arranged in a matrix, delay accumulation logic is designed between each column of memory columns, and the memory array can realize function switching between a standard read-write mode and a binary neural network convolution calculation mode.

The mode selection module is used for switching the standard read-write function and the binary neural network convolution calculation function of the dual-mode storage array.

The adaptive pipeline decoder and the column selector are used for decoding the rows and columns of the memory array in a standard read-write mode; is used for starting word lines in a binary neural network convolution calculation mode so as to realize the bit line capacitance C _BL The discharge is performed and the entire array is traversed row by row, thereby creating an accumulated delay.

The precharge circuit, sense amplifier, input-output unit and timing control circuit are used to implement read-write operations and convolution calculation operations of the dual mode memory array.

The dual mode memory array as shown in fig. 3 includes: the nonvolatile memory units formed by the Magnetic Tunnel Junctions (MTJs) are arranged in a matrix, delay accumulation logic is designed among each column of memory columns, and the memory array can realize function switching between a standard read-write mode and a binary neural network convolution calculation mode; the mode selection module is used for switching the standard read-write function and the binary neural network convolution calculation function of the dual-mode storage array; an adaptive pipeline decoder and the column selector, in a standard read-write mode, are used to decode the rows and columns of the memory array; is used for starting word lines in a binary neural network convolution calculation mode so as to realize the bit line capacitance C _BL Discharging and traversing the whole array row by row, thereby generating an accumulated delay1; the precharge circuit, the sense amplifier, the input/output unit and the time sequence control circuit are used for realizing the read-write operation and the convolution calculation operation of the dual-mode memory array; the delay delta unit and the counting unit are used for quantizing the delay obtained by traversing the storage array in a binary neural network convolution calculation mode, so that a digital result is obtained, and the quantized result is the result of the binary neural network convolution calculation.

In this embodiment, a 32×32 full-connection layer is used as a design object (m=n=32), that is, the weight value is a 32×32 matrix, the activation value vector is a 32-element ordered array, and the system will complete the following calculation:

the weight matrix M in equation (5) is mapped in the dual mode storage array disclosed in the present invention as:

the mapping mode is that the weight matrix M in the formula (5) is transposed along the diagonal line as in the formula (6), and the transposed matrix coordinate is stored in a storage unit in the dual-mode storage array disclosed by the invention.

The activation value vector V in equation (5) is mapped in the dual mode storage array disclosed herein as:

As shown in fig. three, the present invention discloses a time domain memory internal computing array structure based on a magnetic random access memory, which is characterized by a dual-mode memory array, an adaptive pipeline decoder, a pre-charge circuit, a column selector, a sense amplifier, an input/output unit, a delay delta unit, a counting unit, a timing control circuit and a mode selection module.

The invention discloses a time domain memory internal computing array structure based on a magnetic random access memory, which utilizes delay information to realize convolution operation on a binary neural network in an internal computing mode, as shown in a truth table of the binary neural network operation in fig. 4.

As shown in FIG. 5, when the activation value is 0, the word line is turned off, and the delay amount is not generated by the calculation array; when the activation value is 1, the word line is turned on, and the delay time generated by the memory cell storing data 0 is T _LRS The delay is measured by the bit line capacitance C _BL The resistance of the memory cell is low resistance LRS and the precharge voltage V _pre And reference voltage V _REF And (3) determining:

T _LRS ＝R _LRS C _BL ln[V _pre /(V _pre -V _ref )] (8)

the delay amount generated by the memory cell storing the data 1 is T _HRS The delay is measured by the bit line capacitance C _BL Memory cell resistanceIs in a high-resistance state HRS and a precharge voltage V _pre And reference voltage V _ref And (3) determining:

T _HRS ＝R _HRS C _BL ln[V _pre /(V _pre -V _ref )] (9)

FIG. 6 is a schematic diagram of a 2bit time domain calculation example provided in an embodiment of the present invention, when the word line WL is turned on, BL [1] starts to discharge, when the bit line BL [1] voltage decreases below the sense amplifier threshold reference voltage REF, the output Q is set to high level 1, BL [2] starts to discharge, and when the bit line BL [2] voltage decreases below the sense amplifier threshold reference voltage REF, the output Delay is set to high level 1. The data stored in the storage units are different in output delay, the data stored in the 2bit storage units are shortest in 0 delay, and the data stored in the 2bit storage units are longest in 1 delay. The discharge delay difference Δτ between the memory cell memory data 0 and the memory data 1 can be expressed as:

Δτ＝R _HRE C _BL ln[V _pre /(V _pre -V _ref )]-R _LRS C _BL ln[V _pre /(V _pre -V _ref )] (10)

Δτ＝(R _HRS -R _LRS )C _BL ln[V _pre /(V _pre -V _ref )] (11)

as shown in FIG. 7, the structure of the sense amplifier provided by the embodiment of the invention is that the first PMOS transistor P1 has a gate connected to the first node NET1, a source connected to the power supply VDD, and a drain connected to the first node NET1. The second PMOS transistor P2 has a gate connected to the first node NET1, a source connected to the power supply VDD, and a drain connected to the second node NET2. The first NMOS transistor N1 has a gate connected to the external reference voltage source REF, a source connected to the third node NET3, and a drain connected to the first node NET1. The second NMOS transistor N2 has a gate connected to the bit line BL, a source connected to the third node NET3, and a drain connected to the second node NET2. The third NMOS transistor N3 has a gate connected to the external enable signal SAE, a source connected to the ground GND, and a drain connected to the third node NET3. The input end of the first CMOS inverter INV1 is connected with the first node NET2, and the output signal is Q. When the reference voltage is smaller than the bit line voltage, the output Q is low level 0; when the reference voltage is greater than the bit line voltage, the output Q is high 1.

Fig. 8 shows a block diagram of a delay delta unit in a time domain memory based on a magnetic random access memory in an array structure according to an embodiment of the present invention. The working process of the delay delta unit disclosed by the invention is as follows: in the first step, when EN signal arrives, the switch discharging module first enables the first word line WL1, turns off the second WL2, discharges the replica bit line RBL of the first replica column through the first differential resistor R1, and charges the replica bit line RBLB of the second replica column high. In the second step, when the RBL is discharged, the logic control unit turns off WL1 while enabling WL2, and RBLB is discharged through the second differential resistor R2 while recharging the RBL to a high level. Third, after the RBLB discharge is completed, the logic control unit re-enables WL1, turns off WL2, and RBL discharge, similar to the first step. The next step is to alternately enable WL1 and WL2 by switching the discharge module, and alternately discharge RBL and RBLB as shown in fig. 9. The delay delta unit generates a periodic CK signal whose period is the sum of the times that RBL and RBLB are discharged once. Since the resistance of the differential resistor is the difference between the high and low resistances of the memory cell, the period of the CK signal is 2 times the discharge delay difference (2Δτ) of the memory cell.

As shown in FIG. 10, a simulation result diagram of a 4bit convolution operation Monte Carlo in an array structure is calculated in a time domain memory based on a magnetic random access memory according to an embodiment of the present invention, and a delay difference average value between adjacent data is 0.4ns.

As shown in fig. 11, in the calculation array structure in the time domain memory based on the magnetic random access memory according to the embodiment of the present invention, after the word line signal WL is turned on, each level of bit line is serially discharged, the calculation array generates the Delay1, the reference array generates the Delay1, the Delay difference is Δt, the Delay difference is quantized by the Delay delta unit, and the quantized result is converted into a digital result by the technology unit, where the result is the same as the theoretical value.

Fig. 12 is a schematic diagram showing the system power consumption and energy efficiency variation along with the working voltage in the array structure calculated in the time domain memory based on the magnetic random access memory according to the embodiment of the invention. The abscissa is the operating voltage, the left ordinate represents the power consumption, and the right ordinate represents the energy efficiency. As can be seen from the figure, as the working voltage is reduced, the power consumption is reduced, the energy efficiency is improved, the lowest working voltage of the circuit can reach 0.5V, the power consumption and the energy efficiency at the moment are respectively 38.71uW and 98.74TOPS/W, and the circuit has larger improvement compared with the common digital implementation mode.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. The time domain memory internal computing array structure based on the magnetic random access memory is characterized by comprising a dual-mode storage array, a self-adaptive pipeline decoder, a pre-charging circuit, a column selector, a sense amplifier, an input/output unit, a delay delta unit, a counting unit, a time sequence control circuit and a mode selection module;

the dual mode memory array includes: the nonvolatile memory units formed by Magnetic Tunnel Junctions (MTJs) are arranged in a matrix, delay accumulation logic is designed among each column of memory columns, and the dual-mode memory array realizes function switching between a standard read-write mode and a binary neural network convolution calculation mode;

the adaptive pipeline decoder and the column selector are used for decoding the rows and columns of the dual mode memory array in a standard read-write mode; is used to turn on the word line in a binary neural network convolution calculation mode, thereby implementing a bit line capacitance (C _BL ) Discharging and traversing the whole array row by row, thereby generating an accumulated delay1;

the pre-charge circuit, the sense amplifier, the input/output unit and the time sequence control circuit are used for realizing read-write operation and convolution calculation operation of the dual-mode memory array; the precharge circuit charges the bit line capacitor in a standard read-write mode and a binary neural network convolution calculation mode;

the delay delta unit and the counting unit are used for quantizing the delay obtained by traversing the dual-mode storage array in a binary neural network convolution calculation mode, so that a digital result is obtained, and the quantized result is the result of the binary neural network convolution calculation;

the dual-mode memory array comprises N columns of M rows of memory cells, and logic control circuits are inserted between word lines of adjacent memory cells in each row; the logic control circuit comprises a first AND gate (AND 1) AND a second AND gate (AND 2), wherein the first AND gate AND the second AND gate are both double-ended input AND single-ended output; a first input end of the first AND gate is connected with a mode selection signal (MEN), and a second input end of the first AND gate is connected with a former-stage word line; the first input end of the second AND gate is connected with the word line of the previous stage, and the second input end of the second AND gate is connected with the output signal (Q) of the sensitive amplifier of the previous stage; the output ends of the first AND gate and the second AND gate are connected with the word line of the next stage.

2. The mram-based time domain memory computational array structure of claim 1 wherein: the mode selection module generates the mode selection signal (MEN) according to an external enabling signal, and when the mode selection signal (MEN) is high-level 1, the dual-mode storage array has a standard read-write function; when the mode selection signal (MEN) is low level 0, the dual-mode memory array is a binary neural network convolution calculation function.

3. The mram-based time domain memory computational array structure of claim 1 wherein: in a standard read-write mode, the self-adaptive pipelined decoder and the column selector perform read-write access on corresponding memory cells in the array according to external address signals; in the binary neural network convolution calculation mode, the self-adaptive pipeline decoder accesses all storage units step by step to generate a delay accumulated signal delay1.

4. The mram-based time domain memory computational array structure of claim 1 wherein: the nonvolatile memory unit comprises a first NMOS tube (N1) and a first magnetic tunnel junction device (MTJ), wherein the first magnetic tunnel junction device (MTJ) can present two states of high resistance and low resistance according to different directions of write operation currents; one end of the first magnetic tunnel junction device (MTJ) is connected with a Bit Line (BL), and the other end of the first magnetic tunnel junction device (MTJ) is connected with the source electrode end of the first NMOS tube (N1); the grid end of the first NMOS tube (N1) is connected with a Word Line (WL), and the drain electrode is connected with a Source Line (SL).

5. The mram-based time domain memory computational array structure of claim 1 wherein: the sense amplifier includes

A first PMOS tube (P1), the grid electrode of which is connected with a first node (NET 1), the source electrode of which is connected with a power supply (VDD), and the drain electrode of which is connected with the first node (NET 1);

a second PMOS tube (P2), the grid electrode of which is connected with the first node (NET 1), the source electrode of which is connected with the power supply (VDD), and the drain electrode of which is connected with the second node (NET 2);

a first NMOS tube (N1), wherein the grid electrode of the first NMOS tube is connected with an external reference voltage source (REF), the source electrode of the first NMOS tube is connected with a third node (NET 3), and the drain electrode of the first NMOS tube is connected with the first node (NET 1);

a second NMOS transistor (N2) having a gate connected to the Bit Line (BL), a source connected to the third node (NET 3), and a drain connected to the second node (NET 2);

a third NMOS transistor (N3) having a gate connected to an external enable Signal (SAE), a source connected to Ground (GND), and a drain connected to a third node (NET 3);

the first CMOS inverter (INV 1) has an input end connected to the second node (NET 2) and an output signal of Q.

6. A magnetic random access memory-based time as recited in claim 1The domain in-memory calculation array structure is characterized in that the delay delta unit comprises a reference delay generating circuit and a unit clock generating circuit, wherein the reference delay generating circuit comprises: the array structure comprises N columns of M rows of memory cells, and logic control circuits are inserted between word lines of each row of adjacent memory cells; the logic control circuit comprises a first AND gate (AND 1) AND a second AND gate (AND 2), wherein the first AND gate AND the second AND gate are both double-ended input AND single-ended output; a first input end of the first AND gate is connected with a mode selection signal (MEN), and a second input end of the first AND gate is connected with a former-stage word line; the first input end of the second AND gate is connected with the word line of the previous stage, and the second input end of the second AND gate is connected with the output signal Q of the sensitive amplifier of the previous stage; the output ends of the first AND gate and the second AND gate are connected with a next-stage word line; the data in the storage units are all 0; in a binary neural network convolution calculation mode, the reference delay generation circuit generates a reference delay by performing the same bit line capacitance (C _BL ) Discharging operation, and traversing the whole array step by step, so as to generate a reference delay0;

the unit clock generation circuit includes:

the first copying column, the second copying column and the logic control unit have the same structure as the storage column structure in the dual-mode storage array; the first replica column comprises a first differential resistor (R1), a first Replica Bit Line (RBL), a first Replica Source Line (RSL) and a first sense amplifier (SA 1); the second replica column includes a second differential resistor (R2), a second Replica Bit Line (RBLB), a second Replica Source Line (RSLB) and a second sense amplifier (SA 2).

7. The mram-based time domain memory computing array structure of claim 6, wherein the first replica column comprises:

the first differential resistor (R1) is a double-end device, and the resistance value of the first differential resistor (R1) is the difference between the high resistance state and the low resistance state of the nonvolatile memory cell; a first end of the first differential resistor (R1) is connected with a Replica Bit Line (RBL) of the first replica column; the second end of the first differential resistor (R1) is connected with the source end of the first NMOS tube (N1), the grid electrode is connected with the first word line (WL 1), and the drain electrode is connected with the first copy source line (RSL); a first charge PMOS transistor (P1) having a gate connected to a first word line (WL 1), a source connected to a power supply (VDD), and a drain connected to a first Replica Bit Line (RBL); the word line signals of the redundant memory cells in the first replica column are all set to 0, and the first sense amplifier (SA 1) output signal is C1.

8. The mram-based time domain memory computing array structure of claim 7, wherein the second replica column comprises:

the second differential resistor (R2) is a double-end device, and the resistance value of the second differential resistor (R2) is the difference between the high resistance state and the low resistance state of the nonvolatile memory unit; a first end point of the second differential resistor (R2) is connected with a Replica Bit Line (RBLB) of a second replica column; the second end point of the second differential resistor is connected with the source electrode end of the second NMOS tube (N2), the grid electrode is connected with the second word line (WL 2), and the drain electrode is connected with the second copy source line (RSLB); a second charge PMOS transistor (P2) having a gate connected to a second word line (WL 2), a source connected to a power supply (VDD), and a drain connected to a second Replica Bit Line (RBLB); the word line signals of the redundant memory cells in the second replica column are all set to 0, and the second sense amplifier (SA 2) output signal is C2.

9. The mram-based time domain memory computing array architecture of claim 8, wherein the logic control unit comprises:

a first Nor gate (Nor 1) having a first input terminal connected to an external control signal (EN), a second input terminal connected to an output signal C2 of the second sense amplifier (SA 2), and an output terminal connected to a first node (NET 1);

the grid electrode of the first PMOS tube is connected with the second node (NET 2), the source electrode of the first PMOS tube is connected with the power supply (VDD), and the drain electrode of the first PMOS tube is connected with the third node (NET 3);

a first NMOS transistor having a gate connected to the output signal C2 of the second sense amplifier (SA 2), a source connected to Ground (GND), and a drain connected to a third node (NET 3);

a second NMOS transistor, the gate of which is connected to an external control signal (EN), the source of which is connected to Ground (GND), and the drain of which is connected to a third node (NET 3);

a second Nor gate (Nor 2) having a first input terminal connected to the third node (NET 3), a second input terminal connected to an external control signal (EN), and an output terminal connected to the first word line (WL 1);

a first inverter (INV 1) having an input connected to the output signal C1 of the first sense amplifier (SA 1) and an output connected to a second node (NET 2);

a third NMOS transistor having a gate connected to the output signal C1 of the first sense amplifier (SA 1), a source connected to Ground (GND), and a drain connected to a fourth node (NET 4);

the second PMOS tube is connected with a node (NET 1) through a grid electrode, a source electrode is connected with a power supply (VDD), and a drain electrode is connected with a fourth node (NET 4);

and the input end of the second inverter (INV 2) is connected with the fourth node (NET 4), and the output end of the second inverter is connected with the second word line (WL 2).