CN112183739B - Hardware architecture of memristor-based low-power-consumption pulse convolution neural network - Google Patents

Hardware architecture of memristor-based low-power-consumption pulse convolution neural network Download PDF

Info

Publication number
CN112183739B
CN112183739B CN202011203894.6A CN202011203894A CN112183739B CN 112183739 B CN112183739 B CN 112183739B CN 202011203894 A CN202011203894 A CN 202011203894A CN 112183739 B CN112183739 B CN 112183739B
Authority
CN
China
Prior art keywords
memristor
pulse
neural network
input
neuron
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011203894.6A
Other languages
Chinese (zh)
Other versions
CN112183739A (en
Inventor
吴启樵
孙文浩
蔡元鹏
陈松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chen Song
Cheng Lin
Hefei Chengling Microelectronics Co ltd
Wu Feng
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202011203894.6A priority Critical patent/CN112183739B/en
Publication of CN112183739A publication Critical patent/CN112183739A/en
Application granted granted Critical
Publication of CN112183739B publication Critical patent/CN112183739B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

Compared with a digital circuit multiplier, a digital-to-analog converter and an analog-to-digital converter which are adopted by the conventional neural network acceleration hardware architecture, the hardware power consumption and the area are greatly consumed, and the requirement that the traditional neural network acceleration hardware architecture is difficult to adapt to the network scale and the calculation complexity is increased continuously is met; and the array and the sensitive amplifier are used for simulating the IF neuron of the pulse convolution neural network, so that a digital-to-analog converter and an analog-to-digital converter are omitted, and a large amount of hardware area consumption is saved. Moreover, compared with the defects of large operation amount and overhigh energy consumption of the traditional convolutional neural network, the pulse convolutional neural network has the characteristic of high-energy-efficiency calculation, and compared with the prior art, the memristor-based pulse convolutional neural network hardware architecture provided by the invention has the advantage of greatly improving the performance.

Description

Hardware architecture of memristor-based low-power-consumption pulse convolution neural network
Technical Field
The invention relates to the technical field of neural networks, in particular to a hardware architecture of a memristor-based low-power-consumption pulse convolution neural network.
Background
In recent years, artificial neural networks have become the hot direction for research in the field of artificial intelligence. Due to the development of big data and high-performance computing hardware, the convolutional neural network has attracted attention in the fields of image recognition, target detection, pattern recognition and the like. However, today's Computing Processors (CPUs) and Graphics Processing Units (GPUs), based on the traditional von neumann computing system, consume a great deal of money in the transfer of data from memory to the processor. With the increase of the processing data amount and the calculation amount of the neural network, the traditional von neumann computing system consumes a great amount of hardware energy when processing a great amount of multiply-accumulate operations. It is difficult to apply it to low power consumption devices such as embedded devices, portable mobile devices, etc. The pulse neural network (SNN) which is closer to the biological neural network and the novel nonvolatile memory device memristor (RRAM) are adopted to design hardware, which is expected to become a breakthrough for solving the problem of hardware energy consumption, and the method is continuously concerned by scholars at home and abroad in recent years.
The pulse convolutional neural network (SCNN) realizes higher biological neural simulation level due to the principle of the pulse neural network, adopts the network structure of the Convolutional Neural Network (CNN), and has the characteristics of local perception and parameter sharing. In addition to the neuronal and synaptic states that exist in traditional convolutional neural networks, the impulse convolutional neural network incorporates a notion of time into its operation, intended to bridge the gap between neuroscience and machine learning, computed using a model that best fits the biological neuronal mechanisms. Compared with the traditional convolutional neural network, the pulse convolutional neural network combines the advantages of high recognition precision of the convolutional neural network and low energy consumption of the pulse neural network, and is very suitable for designing a neural network accelerator and neural morphology hardware.
Because the specific learning method adopted by the biological neural network is not deeply known, the pulse convolution neural network is lack of an efficient training algorithm at present and a certain biological basis, and therefore the recognition effect in the fields of image recognition and the like is different from that of the traditional convolution neural network. Moreover, the hardware design power consumption and resource consumption of the traditional von neumann architecture are large. Therefore, a hardware architecture for realizing the pulse convolutional neural network based on the nonvolatile memory is designed, a novel non-von Neumann architecture can be adopted, the high accuracy rate of the convolutional neural network and the low energy consumption of the pulse neural network are considered, and the hardware architecture is expected to be applied to application scenes with high energy consumption requirements, such as portable mobile equipment and the like.
Disclosure of Invention
The invention aims to provide a hardware architecture of a memristor-based low-power-consumption pulse convolution neural network, which can realize the hardware architecture of the pulse convolution neural network with lower power consumption and is adaptive to different network structures.
The purpose of the invention is realized by the following technical scheme:
a hardware architecture for a memristor-based low-power pulse convolutional neural network, comprising: the memristor-based power amplifier comprises an input buffer, a memristor control module, a memristor array, a sensitive amplifier module and an output buffer;
the input buffer is used for storing picture input data and outputting a decoding bus signal and a control bus signal during calculation;
the memristor control module is used for writing corresponding picture input data into the memristor array according to the decoding bus signal and the control bus signal;
the memristor array is internally stored with weight data, memory calculation is carried out by combining picture input data, and a collection current value is output;
the sensitive amplifier module is used for combining a collection current value output by the memristor array and a reference threshold current and outputting a high-level voltage or a low-level voltage;
the output buffer receives and senses the output result of the amplifier module and stores or outputs the output result to the outside.
According to the technical scheme provided by the invention, the memristor array with low power consumption is adopted, high-efficiency simulation calculation is used, and the neuron function of the pulse convolution neural network is simulated on hardware. The defects that an existing neural network hardware acceleration architecture is complex in calculation, low in energy consumption, large in area and the like are overcome, the hardware architecture using a novel nonvolatile device memristor as a calculation and storage unit is provided, hardware acceleration of the pulse convolution neural network is achieved, and power consumption is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic diagram of a hardware architecture of a memristor-based low-power-consumption pulse convolutional neural network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating an embodiment of converting input data into pulse input;
FIG. 3 is a schematic diagram of a memristor control module provided by an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a pulse simulation calculation performed by a memristor according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a memristor array architecture provided by an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a pulse convolution neural network according to an embodiment of the present invention;
fig. 7 is a schematic diagram of an IF neuron according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a hardware architecture of a memristor-based low-power-consumption pulse convolution neural network, which mainly comprises the following components as shown in fig. 1: an INPUT buffer (INPUT buffer), a memristor control module (RRAM control module), a memristor array (RRAM array), a Sense Amplifier (SA) module, and an OUTPUT buffer (OUTPUT buffer); the input buffer is used for storing picture input data and outputting a decoding bus signal and a control bus signal during calculation; the memristor control module is used for writing corresponding picture input data into the memristor array according to the decoding bus signal and the control bus signal; the memristor array is internally stored with weight data, and is combined with picture input data to perform memory calculation and output a collection current value; the sensitive amplifier module is used for combining a collection current value output by the memristor array and a reference threshold current and outputting a high-level voltage or a low-level voltage; the output buffer receives and senses the output result of the amplifier module and stores or outputs the output result to the outside.
For ease of understanding, the following detailed description is directed to various portions of the hardware architecture.
1. And inputting the buffer.
The input buffer stores the picture input data in a memory outside or inside the chip, controls the address and the time sequence of the memory during calculation, and sequentially sends the input data to the memristor array for calculation.
The input buffer includes: a memory and a controller.
1) A memory.
The memory is used for storing picture input data, wherein the picture input data is an impulse input (Spike lnput) picture stored by 0/1 bit data obtained by converting an original input (original lnput) picture in a floating point data form.
The image conversion process is shown in fig. 2, and the principle is as follows: each pixel in the original input image is converted into a pulse sequence with a discharge frequency proportional to the gray value of the pixel by a frequency coding mode. The conversion process is as follows: each pixel in the original input picture is compared to a random number in the range 0 to Max, max being the maximum value of the pixel, typically 255. The formula for converting a picture in the form of floating point data is represented by the following judgment expression:
Spike input (x,y)=(Orignal input (x,y)>Random(0,Max))?1∶0
wherein, spike input (x, y) is the x-th row, y-th column pixel value, original, of the binary input data input (x, y) are the x-th row and y-th column pixel values of the original input picture, max is the maximum value of the pixel, and Random (0, max) is a Random number between 0 and the maximum value of the pixel; the part after the equal sign and before the question mark is a judgment formula, and when the judgment formula is established, the expression result is 1; when the judgment expression does not hold, the expression result is 0.
Based on the formula, one original input picture can be converted into a plurality of pulse input pictures, each pulse input picture is a binary picture, and the result of accumulation of the plurality of pulse input pictures is similar to that of the original picture and can be used as an input pulse sequence of a pulse convolution neural network.
2) And a controller.
The controller is used for providing decoding bus signals and control bus signals for the memristor control module so as to write data into the memristor array.
2. A memristor control module.
The memristor control module is a module which is designed in a matched mode according to the architecture and the performance of the memristor array. In general, there are four operating modes for memristors, namely initialization (FORM), write 1 (SET), write O (RESET), and READ (READ). The initialization of the memristor, the reduction of the resistance of the memristor, the increase of the resistance of the memristor and the reading of the current value on the memristor can be respectively completed. The memristor control module uses a plurality of transmission gates to transmit analog voltages of Word lines (Word lines), bit lines (Bit lines) and Source lines (Source lines) of the memristor array in four different operation modes, the number of the transmission gates is matched with the size of the memristor array, and one or more transmission gates are selected to be conducted by changing decoding signals and control signals. The inputs to the memristor control modules thus include a variety of different analog signals and a series of control signals, which are typically decoded signals selected according to differences in the architecture of the memristor array. The output of the memristor control module transmits the analog signal to the memristor array.
As shown in fig. 3, the memristor control module includes: a BL module, a SL module and a WL module; the three modules receive decoding bus signals and control bus signals output by the input buffer, and respectively control bit lines, source lines and word lines of the memristor array. Solid arrows in fig. 3 represent digital signals, and open arrows represent analog signals. The three modules receive a decoding bus signal and a control bus signal transmitted by the upper stage, and control the opening and closing of the transmission gate, so that the transmission of an analog signal is controlled. The analog signal input to each module is divided into V FROM 、V SET 、V RESET And V READ Input voltage under four different operating modes. For a memristor, the FROM voltage generally needs 3-4V, the SET and RESET voltages need 2-3V, and the READ voltage can be selected to be about several hundred millivolts in order to not change the resistance of the RRAM during reading operation.
3. A memristor array.
In order to eliminate the influence of sneak current, in the memristor array, a 1T1R three-terminal device unit formed by connecting a single memristor and an NMOS tube in series serves as a basic unit and is used for completing multiplication calculation. In a single 1T1R three-terminal device unit, the grid electrode of an NMOS tube is a character end, the source electrode of the NMOS tube is a source end, the drain electrode of the NMOS tube is connected with one end of a memristor, and the other end of the memristor is a bit end; the word terminal, the source terminal and the bit terminal of different 1T1R three-terminal device units are connected respectively to form a memristor array bit line, a source line and a word line.
As shown in FIG. 4, the bit terminals of two 1T1R three-terminal device units are connected to form a simple pulse convolution neural network multiply-accumulate calculating unit. The input data of the pulse convolution neural network is a logic 0 or logic 1 pulse sequence, and the input is applied to the word end of the 1T1R three-terminal device unit; the weight of the network is then stored into the memristor, as indicated by the conductance value of the memristor. According to kirchhoff's law, neglecting the voltage drop between NMOS source and drain, calculating the convergence of the two unitsCurrent is I out =V read (G1 0/1+ G2 + 0/1), wherein V read Voltage drop between the word end and the source end of the 1T1R device in the multiply-accumulate calculation process, and G1 and G2 are conductance values of memristors in the two 1T1R three-terminal devices respectively. The truth values are shown in table 1:
input device lout
00 0
01 Vread*G1
10 Vread*G2
11 Vread*G1+Vread*G2
Table 1 truth values at each input
In the embodiment of the invention, the memristor array is a low-power-consumption memristor array, and the calculation method is different from that of the traditional memristor array. According to the traditional calculation method, gating of memristor units in a memristor array is controlled through word lines, input data are transmitted into the memristor array by applying voltage to source lines, and multiplication calculation of source line voltage values and memristor conductance values is completed by detecting current magnitude on the bit lines according to kirchhoff's law. In a low power memristor array module, input data is transferred by applying a voltage to a word line.
In the embodiment of the invention, the hardware architecture of the pulse convolution neural network based on the memristor combines the analog calculation function of the memristor and the characteristic of the pulse convolution neural network for conducting binary 0 and 1 data, a digital-to-analog converter (DAC) and an analog-to-digital converter (ADC) in the traditional memristor hardware architecture are not used, and the hardware power consumption and the area consumption can be reduced to a great extent. And convolution layer calculation, pooling layer calculation and full link layer calculation can be set according to the number of gated rows in the array, and the selection of the network structure is very flexible.
As shown in fig. 5, the structure of the memristor array provided by the present invention is shown. Compared with a traditional memristor array, the power consumption of hardware and the area of the hardware are greatly reduced. First, in terms of power consumption, when the input is 0, the NMOS transistor remains off while the NMOS transistor remains on regardless of the input in the conventional memristor array architecture. In the calculation process of the pulse convolution neural network, the more the number of 0 s in the data set converted into the pulse sequence input is, the less the power consumption consumed by the memristor array in the calculation process is; in terms of area, a digital-to-analog converter and an analog-to-digital converter are required to be used as input and output interfaces of data in the traditional memristor array, the input of the memristor array design provided by the invention is a pulse sequence, digital-to-analog conversion is not required, the output is also a pulse sequence, the output current is acquired by a capacitor, and high and low logic levels are output through a sensitive amplifier without analog-to-digital conversion, so that the hardware area is saved.
4. And a sense amplifier module.
As shown in fig. 5, each column of memristors of the memristor array is respectively connected with a sense amplifier and a corresponding capacitor; all the sensitive amplifiers and the corresponding capacitors form a sensitive amplifier module.
The sense amplifier module is a necessary module for converting an analog signal output by the nonvolatile device into a digital signal. The function of the sense amplifier module is to compare the magnitude of the input voltage at both ends, thereby outputting a high level voltage or a low level voltage. The sense amplifier module can simulate the calculation function of the neurons in the pulse convolution neural network. The pulse neuron collects the output of the preceding stage neuron, and emits a pulse when the collection amount exceeds a threshold value; the capacitor in the sense amplifier module can collect current values of a plurality of memristor units and convert the current values into voltages, when the total voltage value exceeds a threshold voltage, a high level is output, and meanwhile, the capacitor discharges.
In the embodiment of the invention, the design of the memristor array is completed by simulating multiplication and accumulation calculation, each row of memristors, the sensitive amplifier and the corresponding capacitor in the memristor array form a pulse nerve calculation unit, and the calculation formula is as follows:
Figure BDA0002756381180000061
wherein, I j (t-1) is the collection current of the jth column of the memristor array at the moment of t-1, G ij Conductance values of memristors in ith row and jth column of memristor array, S i (t-1) is the input of the ith row at time t-1, V READ For reading the operating voltage value, V c (t) and V c (t-1) is the voltage value of the capacitor at the time t and the time t-1, respectively, C is the capacitance value of the capacitor, delta t is the time difference between the time t and the time t-1, and V j For the output of the sense amplifier of column j, V refj Is the reference voltage supplied to the sense amplifier of the jth column. V j The formula between equal sign and question mark in the formula is a judgment formula, if V is in t time c (t) is greater than V refj If so, the jth column of sense amplifiers outputs a digital level 1; conversely, the sense amplifier of column j outputs a digital level of 0.
The impulse nerve calculation unit can calculate the multiplication and accumulation operation of n input n weights; n input is noted as S 1 ~S n As inputs to the 1 st to nth rows of the memristor array, the n weights are conductance values of memristors in the jth column, 1 st to nth rows in the memristor array; assuming that the total time length of the input is T, namely the input of each row is a 0, 1 sequence with the total length of T; in each clock cycle, the jth column of the memristor array is subjected to analog calculation to obtain the collecting current of the column, the capacitor is charged, and the voltage value of the capacitor is increased; if the voltage value on the capacitor exceeds the sensitive amplifier at the time tAnd when the voltage is referenced, the sensitive amplifier outputs a high-level pulse, and leads the NMOS tube to be conducted, so that the capacitor is discharged, and the voltage value of the capacitor is reset.
In the embodiment of the invention, the memristor array can adapt to different network structures; a typical artificial neural network includes a convolutional layer, a pooling layer, and a fully-connected layer. Fig. 6 is an example of an adaptable pulse convolutional neural network structure that contains two convolutional layers, two downsampled (pooled) layers, and three fully-connected layers (including one layer of gaussian connections).
For the convolution layer, the size of the convolution kernel is KxK, and then the multiplication accumulation calculation of KxK input data and KxK weight data is completed by single convolution calculation; and writing the KxK weight data into a certain column of the memristor array, converting the KxK input data into pulse sequences, and then sending the pulse sequences to the corresponding KxK rows, thereby completing the KxK convolution operation. Illustratively, K may be 3, 5, 11, etc.
For the pooling layer, the size of the pooled region is denoted as P × P, and the pooling operation corresponds to P × P input data and P × P weight data. For example, P may be 2, and the weight value is 0.25.
For the fully connected layer, it is noted that the fully connected layer has a inputs and b outputs, the fully connected layer has a × b total weight data, the a × b weight data is written into the array space of the (i + 1) th to (i + a) th rows and the (j + 1) th to (j + b) th columns in the array, and a inputs are sent to the (i + 1) th to (i + a) th rows, and b calculation results of the fully connected layer are read from the (j + 1) th to (j + b) th columns.
The pulse convolution neural network is a pulse neural network with convolution layers, and at present, no better training method can train the pulse convolution neural network, so that the training of the network is carried out on the traditional convolution neural network, and then the network model conversion is needed, and the pulse convolution neural network can normally work. Not all convolutional neural networks can be converted to impulse convolutional neural networks. In general, the convolution kernel is unbiased, the activation function is ReLU, and the pooling layer is an average pooled convolutional neural network, which is a better network to convert to an impulse convolutional neural network. There are two kinds of conversion algorithms for the network: 1. normalizing the trained network model; 2. and normalizing the data set processed by the network. The dataset-based algorithm is less time-delayed than the model-based algorithm. Thus, in this embodiment, a data set-based conversion algorithm is used to convert the convolutional neural network to a pulse convolutional neural network.
IF neurons were simulated using a spiking neural computation unit, as shown in fig. 7, a single IF neuron works as follows:
Figure BDA0002756381180000081
wherein V (t) is the membrane potential of the IF neuron at time t, vr is the membrane potential after reset of the IF neuron, w k Is the strength of the synaptic connection between the IF neuron and its pre-synaptic neuron k, S (t) is the value of the pulse transmitted by the pre-synaptic IF neuron k to the IF neuron at time t, V ref Is the membrane potential threshold at which the IF neuron emits a pulse; the expression between equal sign and question sign in S (t) expression is judgment expression, if V (t) is greater than V at t time ref If S (t) is equal to 1, S (t) is equal to 0.
When the membrane potential of the IF neuron exceeds a membrane potential threshold value, the IF neuron emits a pulse, the membrane potential is reset, and the reset voltage of the IF neuron is simulated by setting a reference voltage of a sensitive amplifier; after resetting, the pulse emitted by the preceding IF neuron and the value multiplied by the synaptic connection strength of the current IF neuron are accumulated to the membrane voltage until the membrane voltage exceeds the membrane voltage threshold, the IF neuron emits the pulse again, and the process is repeated; when the value of S (t) is 1, the IF neuron transmits a pulse; a value of 0 indicates that the IF neuron is not transmitting a pulse.
Comparing the working mode of the IF neuron with the calculation mode of the pulse neuron formed by the memristor array, the pulse neuron calculation unit formed by the memristor array and the sensitive amplifier module can well simulate the working mode of the IF neuron.
5. And an output buffer.
The output buffer is a memory that stores intermediate calculation results and output data. The input signals are the output of the sensitive amplifier and external digital control signals, and the received data is written into the memory by controlling the address and the time sequence of the memory.
The working process of the invention is further illustrated below with reference to examples.
First, a network structure, such as a convolutional neural network including convolutional layers, pooling layers, and fully-connected layers, and a data set used by the training network are selected as needed. When a convolutional neural network is converted to an impulse convolutional neural network, it is usually necessary to remove the bias terms of the convolutional kernels, use the ReLU activation function, and use average pooling in a pooling layer. Thus, when training the convolutional neural network, it is set as above. And after obtaining better verification precision, storing the parameters and the weights in the network. And then updating the trained weight of the convolutional neural network by using a data set-based normalization algorithm, wherein the updated value is used as a synapse weight of the pulse convolutional neural network.
And secondly, writing the synaptic weights into the input buffer, and correspondingly setting the conductance values of the memristors in the memristor array. After all memristor devices in the array are initialized, control signals of the memristor control module are generated through the relation between the memristor write voltage pulse and the resistance value of the memristor write voltage pulse, and write operation is carried out on the memristor array, so that the device conductance values of the corresponding positions in the memristor array correspond to the synaptic weights.
And thirdly, performing forward reasoning verification on the pulse convolution neural network by using the hardware architecture. The original picture of the used data set is converted into a plurality of binary pictures, which constitute a pulse sequence as input to the network. The higher the number of converted binary pictures, the higher the verification accuracy, but the longer the delay. The analog multiply-accumulate operations are completed in the memristor array, including the computation of the convolutional layer, the pooling layer, and the fully-connected layer. The array outputs current to charge the capacitor, the voltage value of the capacitor is collected by the sensitive amplifier, and when the sensitive amplifier outputs high level, the transistor is conducted, and the capacitor is discharged and reset. And storing the output pulse sequence of the sensitive amplifier into an output buffer as an intermediate calculation result, and continuously repeating the process until the whole network calculation is completed.
Compared with a digital circuit multiplier, a digital-to-analog converter and an analog-to-digital converter which are adopted by the conventional neural network acceleration hardware architecture, the pulse convolution neural network hardware architecture based on the memristor brings huge consumption of hardware power consumption and area, and the requirement that the traditional neural network acceleration hardware architecture is difficult to adapt to network scale and the calculation complexity rises continuously is met; and the array and the sensitive amplifier are used for simulating the IF neuron of the pulse convolution neural network, so that a digital-to-analog converter and an analog-to-digital converter are omitted, and a large amount of hardware area consumption is saved. Moreover, compared with the defects of large operation amount and overhigh energy consumption of the traditional convolutional neural network, the pulse convolutional neural network has the characteristic of high-energy-efficiency calculation, and compared with the prior art, the memristor-based pulse convolutional neural network hardware architecture provided by the invention has the advantage of greatly improving the performance.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (9)

1. A hardware architecture of a memristor-based low-power-consumption pulse convolution neural network, the hardware architecture comprising: the memristor-based power amplifier comprises an input buffer, a memristor control module, a memristor array, a sensitive amplifier module and an output buffer;
the input buffer is used for storing picture input data and outputting a decoding bus signal and a control bus signal during calculation;
the memristor control module is used for writing corresponding picture input data into the memristor array according to the decoding bus signal and the control bus signal;
the memristor array is internally stored with weight data, memory calculation is carried out by combining picture input data, and a collection current value is output;
the sensitive amplifier module is used for combining a collection current value output by the memristor array and a reference voltage to output a high-level voltage or a low-level voltage;
the output buffer receives and senses the output result of the amplifier module and stores or outputs the output result to the outside;
each column of memristors of the memristor array is respectively connected with a sensitive amplifier and a corresponding capacitor; each column of memristors, the sensitive amplifier and the corresponding capacitor form a pulse nerve calculation unit, and the calculation formula is as follows:
Figure FDA0003699639080000011
wherein, I j (t-1) is the collection current of the jth column of the memristor array at the time t-1, G ij Conductance values of memristors in ith row and jth column of memristor array, S i (t-1) is the input of the ith row at time t-1, V READ For reading the operating voltage value, V c (t) and V c (t-1) is the voltage value of the capacitor at the time t and the time t-1 respectively, C is the capacitance value of the capacitor, delta t is the time difference between the time t and the time t-1, and V j For the jth column of sense amplifier output, V refj Is a reference voltage supplied to a sense amplifier of a jth column; v j The formula between equal sign and question mark in the formula is a judgment formula, if V is in t time c (t) is greater than V refj If yes, the j-th column of sense amplifiers outputs a digital level 1; in the opposite case, the sense amplifier of the j-th column outputs a digital level 0.
2. The hardware architecture of a memristor-based low-power-consumption pulse convolutional neural network according to claim 1, wherein the input buffer comprises: a memory and a controller;
the memory is used for storing picture input data, and the picture input data is a pulse input picture stored by 0/1 bit data and obtained by converting an original input picture in a floating point data form;
the controller is used for providing decoding bus signals and control bus signals for the memristor control module.
3. The hardware architecture of the memristor-based low-power-consumption pulse convolution neural network, according to claim 2, characterized in that a formula for converting pictures in a floating-point data form is as follows:
Spike input (x,y)=(Orignal input (x,y)>Random(0,Max))?1:0
wherein, spike input (x, y) is the x-th row and y-th column pixel value, original, of the binary input data input (x, y) are the x-th row and y-th column pixel values of the original input picture, max is the maximum value of the pixel, and Random (0, max) is a Random number between 0 and the maximum value of the pixel; the part after the equal sign and before the question mark is a judgment formula, and when the judgment formula is established, the expression result is 1; when the judgment formula is not satisfied, the expression result is 0;
based on the above formula, one original input picture can be converted into a plurality of pulse input pictures, and the pulse input pictures are used as an input pulse sequence of the pulse convolution neural network.
4. The hardware architecture of a memristor-based low-power-consumption pulse convolution neural network according to claim 1, wherein the memristor control module comprises: a BL module, a SL module and a WL module; the three modules receive decoding bus signals and control bus signals output by the input buffer, and respectively control bit lines, source lines and word lines of the memristor array.
5. The hardware architecture of the memristor-based low-power-consumption pulse convolutional neural network is characterized in that in the memristor array, a 1T1R three-terminal device unit formed by connecting a single memristor and an NMOS (N-channel metal oxide semiconductor) tube in series is used as a basic unit for completing multiplication calculation; picture input data are used for controlling the conduction or the closing of an NMOS tube in a 1T1R three-terminal device unit;
in a single 1T1R three-terminal device unit, a grid electrode of an NMOS (N-channel metal oxide semiconductor) tube is a word end, a source electrode of the NMOS tube is a source end, a drain electrode of the NMOS tube is connected with one end of a memristor, and the other end of the memristor is a bit end; the word end, the source end and the bit end of different 1T1R three-terminal device units are respectively connected to form a memristor array bit line, a source line and a word line.
6. The hardware architecture of the memristor-based low-power-consumption pulse convolution neural network is characterized in that the pulse neural calculation unit can calculate multiplication and accumulation operation of n input n weights; n input is denoted as S 1 ~S n As inputs to rows 1 to n of the memristor array, n weights are conductance values of the memristors in the jth column, rows 1 to n in the memristor array; assuming that the total time length of the input is T, namely the input of each row is a 0, 1 sequence with the total length of T; in each clock cycle, the jth column of the memristor array is subjected to analog calculation to obtain the collecting current of the column, the capacitor is charged, and the voltage value of the capacitor is increased; if the voltage value of the capacitor exceeds the reference voltage of the sensitive amplifier at the time t, the sensitive amplifier outputs a high-level pulse and leads the NMOS tube to be conducted, so that the capacitor is discharged, and the voltage value of the capacitor is reset.
7. The hardware architecture of a memristor-based low-power-consumption pulse convolutional neural network according to claim 1, wherein the memristor array is capable of adapting to different network structures;
for the convolutional layer, the size of the convolutional kernel is KxK, and then the multiplication and accumulation calculation of KxK input data and KxK weight data is completed by single convolution calculation; writing the KxK weight data into a certain column of the memristor array, converting the KxK input data into pulse sequences, and then sending the pulse sequences to the corresponding KxK rows, thereby completing the KxK convolution operation;
for the pooling layer, the size of the pooled region is denoted as P × P, and the pooling operation corresponds to P × P input data and P × P weight data;
for the fully connected layer, it is noted that the fully connected layer has a inputs and b outputs, the fully connected layer has a × b total weight data, the a × b weight data is written into the array space of the (i + 1) th to (i + a) th rows and the (j + 1) th to (j + b) th columns in the array, and a inputs are sent to the (i + 1) th to (i + a) th rows, and b calculation results of the fully connected layer are read from the (j + 1) th to (j + b) th columns.
8. The hardware architecture of the memristor-based low-power-consumption pulse convolutional neural network is characterized in that the trained convolutional neural network is converted into the pulse convolutional neural network by using a data set-based conversion algorithm; simulating the IF neuron by using the impulse nerve computing unit, wherein the single IF neuron works as follows:
Figure FDA0003699639080000031
wherein V (t) is the membrane potential of the IF neuron at time t, V r Is the membrane potential, w, after the IF neuron has been reset k Is the strength of the synaptic connection between the IF neuron and its pre-synaptic neuron k, S (t) is the value of the pulse transmitted by the pre-synaptic IF neuron k to the IF neuron at time t, V ref Is the membrane potential threshold at which the neuron transmits a pulse; the expression between equal sign and question sign in S (t) expression is judgment expression, if V (t) is greater than V at t time ref If the S (t) is equal to 1, otherwise, the S (t) is equal to 0;
when the membrane potential of the IF neuron is larger than the membrane potential threshold value, the IF neuron transmits a pulse, the membrane potential is reset, and the reset voltage of the IF neuron is simulated by setting the reference voltage of the sensitive amplifier; after resetting, the value multiplied by the synaptic connection strength of the preceding IF neuron and the pulse transmitted by the preceding IF neuron is accumulated on the membrane voltage until the membrane voltage exceeds the membrane voltage threshold value, the IF neuron transmits the pulse again, and the process is repeated; when the value of S (t) is 1, the IF neuron emits a pulse; a value of 0 indicates that the IF neuron is not transmitting a pulse.
9. The hardware architecture of the memristor-based low-power-consumption pulse convolutional neural network according to claim 8, wherein synaptic weights of the pulse convolutional neural network correspond to conductance values of memristors in the memristor array, resistance values of the memristors in the memristor array are written into the corresponding values by controlling write voltage pulses, and then inference verification of the network is performed.
CN202011203894.6A 2020-11-02 2020-11-02 Hardware architecture of memristor-based low-power-consumption pulse convolution neural network Active CN112183739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011203894.6A CN112183739B (en) 2020-11-02 2020-11-02 Hardware architecture of memristor-based low-power-consumption pulse convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011203894.6A CN112183739B (en) 2020-11-02 2020-11-02 Hardware architecture of memristor-based low-power-consumption pulse convolution neural network

Publications (2)

Publication Number Publication Date
CN112183739A CN112183739A (en) 2021-01-05
CN112183739B true CN112183739B (en) 2022-10-04

Family

ID=73917412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011203894.6A Active CN112183739B (en) 2020-11-02 2020-11-02 Hardware architecture of memristor-based low-power-consumption pulse convolution neural network

Country Status (1)

Country Link
CN (1) CN112183739B (en)

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10678244B2 (en) 2017-03-23 2020-06-09 Tesla, Inc. Data synthesis for autonomous control systems
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
US11157441B2 (en) 2017-07-24 2021-10-26 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
US10671349B2 (en) 2017-07-24 2020-06-02 Tesla, Inc. Accelerated mathematical engine
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US11215999B2 (en) 2018-06-20 2022-01-04 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
US11361457B2 (en) 2018-07-20 2022-06-14 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
CN115512173A (en) 2018-10-11 2022-12-23 特斯拉公司 System and method for training machine models using augmented data
US11196678B2 (en) 2018-10-25 2021-12-07 Tesla, Inc. QOS manager for system on a chip communications
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US10997461B2 (en) 2019-02-01 2021-05-04 Tesla, Inc. Generating ground truth for machine learning from time series elements
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US10956755B2 (en) 2019-02-19 2021-03-23 Tesla, Inc. Estimating object properties using visual image data
CN112819036B (en) * 2021-01-12 2024-03-19 华中科技大学 Spherical data classification device based on memristor array and operation method thereof
CN112766469B (en) * 2021-01-22 2024-01-30 北京灵汐科技有限公司 Method and device for generating pulse signal
CN112949819B (en) * 2021-01-26 2023-11-24 首都师范大学 Memristor-based self-powered equipment and pulse neural network optimization method thereof
CN113159277B (en) * 2021-03-09 2022-12-23 北京大学 Target detection method, device and equipment
CN113159276B (en) * 2021-03-09 2024-04-16 北京大学 Model optimization deployment method, system, equipment and storage medium
CN113011574B (en) * 2021-03-22 2022-11-04 西安交通大学 Convolutional neural network system, memristor array and convolutional neural network
CN113033795B (en) * 2021-03-29 2022-10-14 重庆大学 Pulse convolution neural network hardware accelerator of binary pulse diagram based on time step
CN113222131B (en) * 2021-04-30 2022-09-06 中国科学技术大学 Synapse array circuit capable of realizing signed weight coefficient based on 1T1R
CN113255905B (en) * 2021-07-16 2021-11-02 成都时识科技有限公司 Signal processing method of neurons in impulse neural network and network training method
CN113688984B (en) * 2021-08-25 2024-01-30 东南大学 Memory binarization neural network calculation circuit based on magnetic random access memory
CN113987985B (en) * 2021-11-15 2024-04-16 武汉理工大学 Memristor cross array neural network-based accelerator thermal effect optimization method
CN114638349A (en) * 2022-01-27 2022-06-17 清华大学 Photosensitive neuron and sensing and storage integrated intelligent sensing device, method and medium
CN114399037B (en) * 2022-03-24 2022-07-15 之江实验室 Memristor-based convolutional neural network accelerator core simulation method and device
CN115424646A (en) * 2022-11-07 2022-12-02 上海亿铸智能科技有限公司 Memory and computation integrated sparse sensing sensitive amplifier and method for memristor array
CN115879530B (en) * 2023-03-02 2023-05-05 湖北大学 RRAM (remote radio access m) memory-oriented computing system array structure optimization method
CN116720551B (en) * 2023-08-02 2023-09-29 苏州浪潮智能科技有限公司 Convolution acceleration method and convolution accelerator of impulse neural network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015001697A1 (en) * 2013-07-04 2015-01-08 パナソニックIpマネジメント株式会社 Neural network circuit and learning method thereof
WO2017139342A1 (en) * 2016-02-08 2017-08-17 Spero Devices, Inc. Analog co-processor
US11354568B2 (en) * 2017-06-30 2022-06-07 Intel Corporation In-memory spiking neural networks for memory array architectures
WO2019019920A1 (en) * 2017-07-26 2019-01-31 The Hong Kong University Of Science And Technology Hybrid memristor/field-effect transistor memory cell and its information encoding scheme
US11586884B2 (en) * 2018-02-08 2023-02-21 University Of Massachusetts Artificial neurons using diffusive memristor
CN110797063B (en) * 2019-09-17 2021-05-25 华中科技大学 Memristor memory chip and operation method thereof

Also Published As

Publication number Publication date
CN112183739A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN112183739B (en) Hardware architecture of memristor-based low-power-consumption pulse convolution neural network
US11361216B2 (en) Neural network circuits having non-volatile synapse arrays
CN111656368B (en) Hardware accelerated discrete neural network
US10692570B2 (en) Neural network matrix multiplication in memory cells
US11501130B2 (en) Neural network hardware accelerator architectures and operating method thereof
US11604974B2 (en) Neural network computation circuit including non-volatile semiconductor memory element
US11615299B2 (en) Neural network computation circuit including non-volatile semiconductor memory element
US11531898B2 (en) Training of artificial neural networks
JP2021500646A (en) A method for training an artificial neural network and a device for implementing an artificial neural network (training for an artificial neural network)
US11620505B2 (en) Neuromorphic package devices and neuromorphic computing systems
CN109165730B (en) State quantization network implementation method in cross array neuromorphic hardware
CN110852429B (en) 1T 1R-based convolutional neural network circuit and operation method thereof
CN111656371A (en) Neural network circuit with non-volatile synapse array
Vianello et al. Resistive memories for spike-based neuromorphic circuits
WO2023217021A1 (en) Data processing method based on memristor array, and data processing apparatus
Dbouk et al. KeyRAM: A 0.34 uJ/decision 18 k decisions/s recurrent attention in-memory processor for keyword spotting
Sun et al. Low-consumption neuromorphic memristor architecture based on convolutional neural networks
Zhou et al. QuantBayes: Weight optimization for memristive neural networks via quantization-aware Bayesian inference
Liu et al. A 40-nm 202.3 nJ/classification neuromorphic architecture employing in-SRAM charge-domain compute
Lu et al. NVMLearn: a simulation platform for non-volatile-memory-based deep learning hardware
Saraswat et al. Hardware-friendly synaptic orders and timescales in liquid state machines for speech classification
CN115796252A (en) Weight writing method and device, electronic equipment and storage medium
CN112734022B (en) Four-character memristor neural network circuit with recognition and sequencing functions
CN114861902A (en) Processing unit, operation method thereof and computing chip
Wei et al. Emerging Memory-Based Chip Development for Neuromorphic Computing: Status, Challenges, and Perspectives

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220124

Address after: 230026 Jinzhai Road, Baohe District, Hefei, Anhui Province, No. 96

Applicant after: University of Science and Technology of China

Applicant after: Cheng Lin

Applicant after: Wu Feng

Applicant after: Chen Song

Address before: 230026 Jinzhai Road, Baohe District, Hefei, Anhui Province, No. 96

Applicant before: University of Science and Technology of China

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221028

Address after: 230026 Jinzhai Road, Baohe District, Hefei, Anhui Province, No. 96

Patentee after: Cheng Lin

Patentee after: Wu Feng

Patentee after: Chen Song

Address before: 230026 Jinzhai Road, Baohe District, Hefei, Anhui Province, No. 96

Patentee before: University of Science and Technology of China

Patentee before: Cheng Lin

Patentee before: Wu Feng

Patentee before: Chen Song

Effective date of registration: 20221028

Address after: Room 1216, Main Building, Future Center, Advanced Technology Research Institute, University of Science and Technology of China, No. 5089, West Wangjiang Road, High tech Zone, Hefei, Anhui 230031

Patentee after: Hefei Chengling Microelectronics Co.,Ltd.

Address before: 230026 Jinzhai Road, Baohe District, Hefei, Anhui Province, No. 96

Patentee before: Cheng Lin

Patentee before: Wu Feng

Patentee before: Chen Song