CN117010466A

CN117010466A - Memristor-based hardware convolutional neural network model suitable for implementation on FPGA

Info

Publication number: CN117010466A
Application number: CN202310714956.7A
Authority: CN
Inventors: 翟亚红; 王健竹
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2023-06-16
Filing date: 2023-06-16
Publication date: 2023-11-07

Abstract

The invention discloses a hardware convolution neural network model based on a memristor, which is suitable for being realized on an FPGA, and relates to the field of semiconductor integrated circuits and neural networks. The core of the convolutional neural network system is a convolutional layer and a full-connection layer which are formed by memristors, a mapping mode of memristor differential pair conductance and neural network weight, a closed-loop programming scheme of the memristors, an interlayer data compression method suitable for FPGA, a line buffer used for further hardware acceleration of the system and the like. The method utilizes the characteristic that the memristor can realize multiple resistance states, maps the weight in the convolutional neural network to the conductance of the memristor, maps the input in the convolutional neural network to the voltage at two ends of the memristor, and reads the current flowing through the memristor to obtain the result after the operation of the convolutional layer and the full connection layer; in the transmission of interlayer data, a high-efficiency data compression method suitable for running in an FPGA is designed and the feasibility of the method is proved. In addition, a line buffer structure is adopted to further accelerate hardware of the hardware neural network. Compared with the traditional convolutional neural network, the network provided by the invention can be realized based on hardware, fully utilizes the multi-resistance state characteristics of the memristor, and has the advantages of low power consumption, high efficiency, high integration level, good compatibility with CMOS (complementary metal oxide semiconductor) process and the like.

Description

Memristor-based hardware convolutional neural network model suitable for implementation on FPGA

Technical Field

The invention relates to the field of semiconductor integrated circuits and neural networks, in particular to an application of a memristor-based hardware convolutional neural network suitable for being realized on an FPGA in image recognition.

Background

In convolutional neural networks, there are a large number of convolutional (or multiply-add) operations, which consume most of the resources in the network operation and are the most significant reasons for affecting the network operation speed. Currently, convolution (or multiply-add) operations in neural networks are mostly done in software.

The conductance value of the memristor, or RRAM (Resistive Random Access Memory ), varies with the voltage applied to the upper and lower poles, and retains the previous conductance value when no voltage is applied. Memristors are double-ended passive devices, are compatible with CMOS technology, and are in the mature working modes at present: switching between two highly-differentiated electrical conduction state HRS and LRS for data storage as a nonvolatile memory; or two conductivity states thereof are utilized to construct a binary neural network, etc. However, the conductance of memristors does not merely jump between two states, but rather there is a process of continuously monotonically increasing/decreasing with voltage. In consideration of the characteristics of high integration, passive property, non-volatility and compatibility with a CMOS process, the conductance value of the memristor can be used for representing the weight after the neural network is trained, so that the kirchhoff law is used for carrying out convolution (or multiply-add) operation with high speed and low power consumption on a hardware level.

Disclosure of Invention

In order to optimize calculation of a convolutional neural network at a hardware level by using memristors, the invention aims to provide a hardware convolutional neural network model based on memristors, which is suitable for being realized on an FPGA, and can be used for (but not limited to) carrying out image recognition on MNIST data sets, and comprises the following modules.

Memristor array module. The module is a calculation unit of the whole model, a target memristor is selected by controlling voltages on WL, BL and WL, then the conductance of the target memristor is programmed by applying pulse voltage on BL or convolution operation is carried out on input data by using kirchhoff law by reading current on SL, and acceleration of a neural network is realized by innovation of a calculation mode.

A line buffer module. The module comprises a shift register and a corresponding control module, and the neural network is further accelerated through a line buffer structure.

And the peripheral circuit module is matched with the memristor array. The module serves as a memristor array module and comprises a voltage generation module, a digital-to-analog conversion module and the like, wherein the voltage generation module is used for generating specific value voltage required by driving the memristor array.

And an FPGA module. The module mainly serves as a control function, controls the work of the memristor array module and the peripheral circuit module, and realizes basic data processing functions such as filling, pooling, interlayer data compression and the like.

Drawings

Fig. 1 is a method of representing weights by RRAM differential pairs.

FIG. 2 is a schematic diagram of a closed-loop programming method of a memristor.

FIG. 3 is a schematic diagram of a memristor circuit implementation for convolution/full connection layer computation.

FIG. 4 is a schematic diagram of a method of convolutional layer/full link layer input data through voltage transfer.

Fig. 5 is a schematic diagram of an interlayer data compression method.

FIG. 6 is a schematic diagram illustrating the operation of the line buffer structure.

Detailed Description

The present invention will be further described in detail below with reference to the drawings and examples for the purpose of facilitating understanding and practicing the present invention by those of ordinary skill in the art. For ease of understanding, the following description will take as an example a convolutional neural network for an MNIST dataset, it being understood that the implementation examples described herein are merely illustrative and explanatory of the invention, but the invention is not limited to use in implementing a convolutional neural network based on an MNIST dataset.

The method of using RRAM differential representation of neural network weights is shown in FIG. 1. The neural network weight obtained after software training is a floating point number between-1 and +1, and the floating point number needs to be mapped with the conductance of the RRAM so as to be represented by the conductance weight of the RRAM. For example, the RRAM capable of realizing 32 relatively independent and highly differentiated electrically conductive states (2 μs to 20 μs, Δ=0.58 μs) was selected from 8 electrically conductive states (2 μs to 20 μs, Δ=2.5 μs). In addition to the selection of the conductance state, one problem that needs to be solved is the positive and negative of the weight. In the neural network, the weight range is-1 to +1, positive and negative represent the excitation/inhibition of the upper level neuron to the neuron, but the conductance of the RRAM device used for representing the weight cannot be negative. It is therefore contemplated herein to shift the negative sign in the operation of multiplying the input by the weight from weight to input, i.e., without changing the conductance of the RRAM, but the reverse voltage can be input to achieve a result equivalent to a negative weight. In this regard, a pair of RRAM devices is required to form a differential pair to represent a weight, and the two RRAMs are respectively referred to as "positive RRAM" and "negative RRAM" (note that, in the analysis herein and below, the conductance of the pair of RRAMs is positive in reality, so called only for distinction and easy understanding), in actual operation, it is required that, for one input, the positive voltage and the negative voltage which are respectively corresponding to the programming are input twice, the two inputs are respectively added to the positive RRAM and the negative RRAM, and the currents of the two inputs are collected and added, so that a multiplication and addition operation capable of distinguishing the positive and negative weights can be realized. Thus, the positive RRAM can achieve a conductance interval of +2.5μs to +20 μs, and Δ=2.5μs; the negative RRAM can achieve an equivalent conduction interval of-2.5 μs to-20 μs, delta=2.5 μs, while the RRAM differential pair achieves a conduction interval of-17.5 μs to +17.5 μs, delta=2.5 μs for a total of 15 conductivity states. Then the weights after the neural network training can be mapped onto the conductance of the RRAM differential pair one by one according to intervals, for example, the conductance of the RRAM differential pair corresponding to the weights between-1/15 and +1/15 is 0 mu S, the conductance of the RRAM differential pair corresponding to the weights between +1/15 and +3/15 is 2.5 mu S, and the method for representing the network weights by using RRAM conductance values is obtained by analogy.

The selection principle of RRAM differential pair conductance. If a pair of RRAM differential pairs is used to combine the target conductance, there may be multiple combinations for the same target conductance, e.g., 5 μS for the target conductance, and the equivalent conductance of two RRAMs (meaning the conductance represented in the operation, rather than the conductance in real sense) may have multiple combinations of 7.5 μS/-2.5 μS, 10 μS/-5 μS, 12.5 μS/-7.5 μS, etc. Considering ease of programming, the first two conductances can be chosen to be programmed to the same 2.5 μs (2.5 μs in the sense of reality, but representing equivalent conductances of 2.5 μs and-2.5 μs, respectively), to represent a weight of 0 μs; if the weight is positive, keeping the conductance of the negative RRAM unchanged, and heightening the conductance corresponding to the positive RRAM; if the weight is negative, then the reverse is true. Thus, after all RRAM are uniformly reset to 2.5 mu S initially, for one differential pair, no matter what the representative weight is, only the conductance value corresponding to one RRAM is required to be adjusted, so that the complexity of the flow is simplified, the precision reduction caused by the non-ideal performance due to the conductance adjustment is avoided, the number of times of overwriting the device can be reduced to the greatest extent, and the service life of the device is prolonged (one of the most important indexes of RRAM is the number of times of overwriting, which is one of the factors to be considered in the practical use of the RRAM device).

A closed loop programming method of RRAM conductance, see fig. 2. After the target conductance of the differential pair is determined, the device needs to be programmed to the corresponding conductance with a proper method and with a high degree of accuracy. Based on the foregoing, a method of closed loop programming is devised herein. Due to the inconvenient immediate measurement of the conductance, but at a given read voltage V _read In the same case, the conductance g of the device is linearly related to the read current I, so I can be used as a criterion for conductance adjustment. First, the target conductance is calculated at the applied read voltage V _read Current I at _target And determines the maximum programming pulse number N and the target current margin Δi. All RRAM devices are refreshed to a uniform reference state (i.e., 2.5 μS, as the RRAM selected herein) before programming begins, after which programming of the target RRAM begins. The programming times n are recorded as 0 in the initial state, and a reading pulse V is applied to the device _read And reads the real-time current I, if I meets I _target -ΔI≤I≤I _target If the current programming frequency N is not less than or equal to N, the programming frequency is not more than the maximum programming frequency, and the programming of the RRAM is considered to be failed; if so, programming times n+1 and judging whether I satisfies I > I _target +ΔI, if satisfied, is applied to the current programming cellAdding a pulse voltage V _Reset To reduce conductance, otherwise applying a pulse voltage V to the current programming unit _Set The conductivity is increased, after which the read pulse V is continued to be applied _read And reading the conductance value I of the target RRAM after the round of programming period, judging whether the target is reached, and repeating the flow until the programming of the target RRAM is finished.

The convolution/full connection computation is mapped to hardware circuitry, see fig. 3. The array design of one convolution kernel or even a plurality of convolution kernels can be realized by arranging the cells in fig. 1 side by side, as shown in fig. 3, for the convolution window in the figure, x _2-2 Equal to the gray value of the original input image, w _2-2 And the like are the weights of the convolution kernel. After programming the conductance values of the RRAM to the weights corresponding to the convolution kernels according to the method, voltages representing gray values of different pixels of the image are input to each BL, and SL ⁺ With SL (subscriber line) ^- The difference current of (2) is the result of convolution of nine points covered by the convolution kernel. The sampled current value can be used for subsequent pooling and calculation of other layer networks. The whole connection layer is the same.

The convolutional layer/full link layer input data is passed through the voltage transfer method, see fig. 4. Since the control of the input data (i.e., voltage amplitude) of the neural network cannot meet both the normal operating conditions of the RRAM device and the software-like accuracy (too high a voltage may cause breakdown of the device, and too low a voltage accuracy control is difficult), programming the input by digitizing the voltage is also considered. For the first convolution layer, the input data is the gray value of each pixel point of the picture, the value range is 0-255, the representation can be just performed by an 8-bit binary value, and then the data can be programmed into a pulse voltage of 8 clock cycles, and the '1' and the '0' of a certain bit in the binary system respectively indicate whether the pulse voltage has pulses in the cycle. With pixel point x in the convolution operation shown in fig. 3 _2-2 For example, assuming that the gray value is 109, the corresponding binary value is 01101101, then the pixel x is convolved _2-2 The pulse voltage waveforms corresponding to the inputted RRAM array are shown in fig. 4. Let the current collected in the kth period SL beI _k Sampling the currents on SL in 8 periods respectively, and adding the current shifts to obtain a current I _SL Namely the pixel point x _2-2 Corresponding convolution-after-result, I _SL The calculation method of (2) is shown in the formula (1).

The compression method of the interlayer data is shown in fig. 5. Because the multiplication operation can increase the data bit width, the data after the upper operation can still be sent to the lower layer according to the method described above, and the data is compressed by a proper method during interlayer transmission. Can be carried out in a quantization mode, and the basic operation is as follows:

wherein q is a low-precision fixed point number obtained after quantization, r is a high-precision floating point number before quantization, S is a scaling factor, and is a proportional relationship between the floating point number before quantization and a fixed point number value range after quantization; z is a fixed point zero point and represents an integer corresponding to 0 in a real number after quantization. S and Z are calculated from the following formulas (3) and (4), respectively.

However, the calculation of the design operates in the FPGA, and division in the quantization operation consumes excessive resources, so that a compression method suitable for the FPGA is designed: cutting off the position. Taking a certain convolution layer of a designed certain neural network as an example, firstly, calculating the calculated data distribution of each layer in a software layer. If the most data is found to be distributed in the interval of 0-4095 after the calculation of the data of the statistical convolution layer (for the convenience of shifting, the upper limit of the interval is the power of 2-1), 4-11 bits of the calculated data are directly intercepted as the input of the lower layer, which is equivalent to compressing the data by 16 times. The result obtained through prototype verification of the FPGA shows that even for a convolutional neural network with a simpler structure, under the condition that the recognition accuracy rate is 88.6% after software training, after the weight is replaced by the conductance, the recognition accuracy rate is 87.58%, and the recognition accuracy rate is only reduced by 1.02%; and after the interlayer data is further compressed, the recognition accuracy rate becomes 83.63%, and the accuracy rate is reduced by 3.95%. This small decrease in accuracy is fully acceptable compared to the saved resources. The results of prototype verification indicate that weights are expressed in terms of conductance and that the inter-layer data compression method is viable.

A method for further acceleration of the neural network using a line buffer is shown in fig. 6. Assuming that the size of the input image of the neural network is n×n, and the size of the convolution kernel is k×k, a shift register with a depth of 2n+k may be selected in the FPGA to buffer the input image, and the bit depth of each register is equal to the bit width of one pixel point data of the input image, so that a line buffer structure is formed. The working logic of the line buffers will be described next taking the design of the first convolutional layer as an example. The size of the input image after filling is 30×30, the convolution kernel size is 3×3, the data flows into the line buffer from the end until the shift register is full, at this time, the data in the registers with the depths of 0, 1, 2, 30, 31, 32, 60, 61, 62 are just 9 data corresponding to one convolution window at the leftmost upper corner of the input image, at this time, the 9 data can be read out for convolution calculation, as shown in fig. 6 (a); after the calculation is completed, new data continues to flow from the tail end of the shift register, and then the data in the registers with the depths of 0, 1, 2, 30, 31, 32, 60, 61 and 62 are updated to 9 data corresponding to the next convolution window, and the data can be continuously read out to perform convolution calculation, so that the sliding of the convolution window with the step length of 1 is equivalently realized on the input data, as shown in fig. 6 (b). And so on, the convolution of the remainder can be completed. Compared with the structure without using the line buffer, the method has the advantages that the consumption resources are greatly reduced, meanwhile, the input data are effectively multiplexed, and the network efficiency is further improved under the condition that the existing memristor is utilized to realize network calculation acceleration.

And a peripheral circuit module matched with the RRAM array. Under the drive of the FPGA, the peripheral circuit generates corresponding programming voltage to program RRAM conductance, inputs corresponding pulse sequences during calculation operation to enable the array to carry out convolution operation and read calculation results, and communicates with the FPGA through the ADC and the DAC module.

FPGA exploits the role of the board. The FPGA development board bears the task of interaction between hardware circuit parts consisting of RRAM arrays and peripheral circuits in a model, and simultaneously generates control signals to drive orderly operation of each module of the hardware circuit parts, and performs data compression calculation, basic pooling, filling and other calculations which are introduced above and consume less resources.

After the above parts are completely configured, a complete convolutional neural network which is completed in a hardware circuit consisting of memristors can be realized by convolutional calculation.

Claims

1. The implementation method of the memristor-based hardware convolutional neural network model suitable for being implemented on the FPGA comprises a memristor array module serving as a whole model core, a row buffer module used for further accelerating the neural network, a peripheral circuit module serving the memristor array and used for generating specific voltage to drive the memristor array and realizing communication between the memristor array and the FPGA and the computer, and the FPGA module used for controlling the operation of the memristor array module and the peripheral circuit module and realizing information interaction and basic operation functions of the peripheral circuit and the RRAM array.

2. The implementation method of the memristor-based hardware convolutional neural network model, which is suitable for being implemented on an FPGA, according to claim 1, is characterized in that: and further accelerating the hardware convolutional neural network formed by the memristors by using the row buffer memory.

3. The implementation method of the memristor-based hardware convolutional neural network model, which is suitable for being implemented on an FPGA (field programmable gate array), is characterized in that a method expressed by a group of differential pairs is adopted for mapping the conductance of the memristor and the weight of the neural network, the initial conductance states of the differential pairs of the memristor are the same, and when programming, only one memristor needs to be programmed, the conductance of the differential pair can be used for expressing the weight of the neural network.

4. The implementation method of the memristor-based hardware convolutional neural network model, which is suitable for being implemented on an FPGA, is characterized in that a closed-loop programming strategy is adopted when the conductance of a certain memristor is programmed so as to ensure that a target memristor can be programmed to a required conductance value on the premise of convergence of a programming scheme.

5. The implementation method of the memristor-based hardware convolutional neural network model, which is suitable for being implemented on the FPGA, is characterized in that each layer of data is compressed by a bit-cutting method, so that the calculated amount is greatly reduced on the premise of only sacrificing very little precision, the calculation efficiency is improved, and the bit-cutting method only needs to consume very little resources, thereby being convenient to be implemented in the FPGA; the quantized data is sent into the memristor array in the form of digital pulse voltage to be calculated through the form of pulse voltage, and then the obtained current is weighted and summed according to the bit to obtain a convolution calculation result, which is matched with the performance that the memristor can only realize partial higher-partitioned conductivity states but cannot realize accurate continuous adjustment of the conductivity states at present.