CN109784483B

CN109784483B - FD-SOI (field-programmable gate array-silicon on insulator) process-based binary convolution neural network in-memory computing accelerator

Info

Publication number: CN109784483B
Application number: CN201910068644.7A
Authority: CN
Inventors: 胡绍刚; 刘爽; 邓阳杰; 罗鑫; 于奇; 刘洋
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-01-24
Filing date: 2019-01-24
Publication date: 2022-09-09
Anticipated expiration: 2039-01-24
Also published as: CN109784483A

Abstract

The invention belongs to the technical field of neural networks, and relates to a binary convolution neural network in-memory computing accelerator based on an FD-SOI (field-of-flight diffraction-silicon on insulator) process. The invention realizes the exclusive-OR processing of data by utilizing the adjustment of the back gate voltage of FD-SOI-MOSFET to the threshold voltage thereof. And (3) performing one-dimensional processing on the convolution kernel parameters of the convolution neural network, storing the one-dimensional processing in a memory, and performing exclusive OR operation on the convolution kernel by using an FD-SOI-MOSFET to realize the convolution process of the convolution kernel on the neural network. On the premise of adopting in-memory calculation, compared with the traditional convolution process, the convolution process is completed by utilizing exclusive-or operation, the high precision is kept, the convolution processing speed of the neural network is greatly improved, the storage space of the neural network parameters is saved, data transmission is realized, and the operation power consumption is reduced.

Description

FD-SOI (field-programmable gate array-silicon on insulator) process-based binary convolution neural network in-memory computing accelerator

Technical Field

The invention belongs to the technical field of neural networks, and relates to a binary convolution neural network in-memory computing accelerator based on an FD-SOI (field-of-flight diffraction-silicon on insulator) process.

Background

The Convolutional Neural Network (CNN) is a common deep learning architecture, is inspired by a biological natural visual cognition mechanism (animal visual cortical cells are responsible for detecting optical signals), and is a special multilayer feedforward Neural Network. Its artificial neuron can respond to peripheral units in a part of coverage range, and has excellent performance for large-scale image processing. The CNN mainly consists of a Convolutional Layer (Convolutional Layer), a Pooling Layer (pond Layer), and a Full Connection Layer (Full Connection Layer), the Convolutional Layer is used for extracting different input features, the first Convolutional Layer may only extract some low-level features such as edges, lines, and corners, and more layers of networks can iteratively extract more complex features from the low-level features. According to the traditional binarization convolutional neural network, the weight value and the activation value of the hidden layer are binarized by 1 or-1, and the parameters of the neural network occupy smaller storage space through binarization.

As semiconductor processing has progressed to 22 nm, both FinFET and FD-SOI processing techniques have been extended to meet performance, cost, and power consumption requirements. FD-SOI is a planar technology, and has a relatively narrow application range because it has not been a complete industrial form. However, the FD-SOI technology is more and more concerned by the industry in recent years, the ecosystem is gradually formed, and the technical advantages and the application prospect are more and more attractive.

Traditional data is stored in a disk, and data needs to be extracted into a memory when operation is performed, and a large number of I/O connections are needed in the process. And by adopting the in-memory calculation, the calculation process can be sent to the data for local execution, so that the calculation speed is greatly improved, the storage area is saved, the data transmission is realized, and the calculation power consumption is reduced.

At present, no circuit for improving the CNN convolution speed by realizing the exclusive OR processing of data based on the FD-SOI process exists.

Disclosure of Invention

Aiming at the problems, the invention provides an autonomous learning impulse neural network weight quantization method.

The technical scheme of the invention is as follows:

the invention provides a binary convolution neural network in-memory calculation accelerator based on an FD-SOI (field-programmable gate insulator-silicon on insulator) process. The technical scheme is as follows:

a binarization convolution neural network memory computing accelerator based on FD-SOI technology comprises the following steps:

the in-memory calculation module is used for storing convolution kernel parameters of the convolution neural network and completing convolution processing on input data;

a shift register module for storing convolution neural network input data and having a shift function;

a controller module for logically controlling the shift register module and the in-memory computing module;

a detection conversion module for converting the calculation result of the calculation module in the memory into the conventional convolution calculation result;

the normalization module is used for adding the convolution results of different convolution kernels according to weights;

and adding an activation function module of a non-linear factor to the neural network.

Furthermore, the in-memory computing module is constructed by an SRAM module, an input and reverse phase input module, a pre-charge module and the like;

the SRAM module is a module which is constructed by 6 MOSFETs and can store one bit of data, two P-type MOSFETs and two N-type MOSFETs form two CMOS inverters and are connected end to end, the structure can be used for storing one bit of data, an FD-SOI-MOSFET is respectively connected to the output ends of the two inverters, back gate signals of the two inverters are respectively connected with input signals and inverted signals of the input signals, and due to the fact that the FD-SOI-MOSFET back gate signals have an adjusting effect on threshold voltages of the FD-SOI-MOSFET, the NAND operation of the input signals and the stored signals can be completed by the adjusting effect;

the input and reverse phase input module provides an input signal and a reverse phase signal of the input signal for the SRAM module;

the pre-charging module is used for charging the pre-charging capacitor before the XOR operation is carried out by the calculation module in the memory.

Furthermore, the in-memory calculation module is used for storing data after convolution kernel parameters of the convolution neural network are subjected to one-dimensional processing, the front (n × n) column of each row can be used for storing convolution kernels of n rows and n columns, and similarly, other convolution kernels can be stored afterwards, each row can be used for storing a plurality of convolution kernel parameters, the convolution kernel parameters of one convolution layer can be stored by one or more rows, and the output signals of the SRAM module are inverted by the skew inverter and then subjected to OR processing, so that the XOR operation of the input signals and the storage signals of the SRAM module can be realized.

Furthermore, the shift register module is used for storing input data of the convolutional neural network, and can shift and output corresponding input data for performing convolution operation of the calculation module in the memory.

Further, the controller module controls the shift register module to output corresponding data; controlling the enabling and closing of the pre-charging module; and controlling the output data of the shift register and convolution kernel parameters stored in corresponding lines of the calculation module in the memory to perform exclusive OR operation, wherein the control function is realized by a decoder.

Further, the detection conversion module detects the number of '1' in the output data of the calculation module in the memory, the conversion is the result of subtracting 2 from the bit width of the output data of the calculation module in the memory and multiplying the result by the 'detection', the binarization convolutional neural network binarizes the weight and the hidden layer activation value by 0 or 1, and after passing through the calculation module in the memory and the detection conversion module, the realized function is equal to the binarization convolutional neural network which binarizes the weight and the hidden layer activation value by-1 or 1.

Further, the normalization module adds the convolution results of different convolution kernels according to weights to obtain a normalized result.

Furthermore, the activating function module adds a nonlinear factor to the neural network through the module, so that the problem of insufficient expression and classification capability of the linear model is solved.

Further, the present invention also provides a convolution process of the calculation accelerator in the memory of the binarization convolution neural network based on the FD-SOI technology, which includes:

step 1, a controller module sends out a control instruction to control a shift register to output corresponding data to an input and inverting input module of a calculation module in a memory;

step 2, the input and inverting input module transmits data to the SRAM module in the same row;

step 3, the controller module controls the pre-charging module, enables the pre-charging module, charges the pre-charging capacitor to enable the BL and BLB potentials to be at high potentials, and then closes the pre-charging module;

step 4, the controller module selects one signal line in the VWL1-VWLc to enable the signal line to be at a high potential, and even if the SRAM module connected with the signal line is enabled, the transmission data in the step 2 and the storage data of the SRAM module are subjected to NAND operation;

step 5, transmitting the SRAM calculation result to a skew inverter, and transmitting the calculation result of the skew inverter to an OR gate, wherein the step 4-5 completes the XOR operation of the transmission data in the step 2 and the storage data of the SRAM module;

step 6, transmitting the OR gate calculation result in each in-memory calculation module to a detection conversion module, carrying out

binarization

0 or 1 on the weight and the hidden layer activation value by the binarization convolutional neural network, and after passing through the in-memory calculation module and the detection conversion module, realizing the function equivalent to the binarization convolutional neural network for carrying out binarization-1 or 1 on the weight and the hidden layer activation value;

step 7, transmitting output results of all detection conversion modules to a normalization module;

step 8, transmitting the output result of the normalization module to an activation function module;

and 9, transmitting the output result of the activation function module to a shift register module for storage, skipping to the step 1 if the convolution operation is not finished, otherwise, finishing.

The invention has the beneficial effects that:

the core of the method provided by the invention is that the exclusive OR operation of data is realized by utilizing the adjustment effect of the FD-SOI-MOSFET back gate voltage on the threshold voltage of the FD-SOI-MOSFET. On the premise of adopting calculation in the memory, compared with the convolution process of the traditional convolution neural network, the convolution process is completed by utilizing exclusive-or operation, the high precision is kept, the convolution processing speed of the neural network is greatly improved, the parameter storage space of the neural network is saved, data transmission is realized, and the operation power consumption is reduced.

Drawings

FIG. 1 is a schematic diagram of a memory computing accelerator for a binary convolution neural network based on FD-SOI technology according to an embodiment of the present invention;

FIG. 2 is a schematic illustration of CNN in FIG. 1;

FIG. 3 is a schematic diagram of the SRAM module of FIG. 1;

FIG. 4 is a schematic diagram of the FD-SOI-MOSFET of FIG. 3;

FIG. 5 is a schematic diagram of the threshold of the FD-SOI-MOSFET of FIG. 3;

FIG. 6 is a timing diagram illustrating a data NAND operation performed on the data in FIG. 3;

FIG. 7 is a schematic diagram of a data completion NAND operation truth table of FIG. 3;

FIG. 8 is a schematic diagram of the skewed inverter of FIG. 1;

FIG. 9 is a schematic diagram of the skewed inverter of FIG. 1;

FIG. 10 is a convolution flow chart of a calculation accelerator in a binarization convolution neural network memory based on FD-SOI technology according to the present invention.

Detailed Description

The present invention is described in detail below with reference to the attached drawings so that those skilled in the art can better understand the present invention.

When the existing binarization convolutional neural network is researched, multiplication and addition are used in the convolutional neural network when the convolution process is realized, wherein the multiplication calculation greatly consumes a storage area, reduces the operation speed and generates larger power consumption, and the defects greatly reduce the performance index of the binarization convolutional neural network.

The invention provides a convolution algorithm based on the prior art and realizes the convolution algorithm by using a circuit, and realizes the convolution calculation by using XOR and the like to replace the traditional convolution calculation.

In order to achieve the above-mentioned purpose, the invention provides a binary convolution neural network memory computing accelerator based on FD-SOI technology, comprising:

the in-memory computing module is used for storing convolution kernel parameters of the convolution neural network and completing convolution processing on input data;

the detection conversion module is used for converting the calculation result of the calculation module in the memory into the calculation result of the traditional convolution;

To make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail by specific embodiments with reference to the attached drawings, and it should be understood that the specific embodiments described herein are only for explaining the present invention and are not intended to limit the present invention.

As shown in fig. 1, the in-memory calculation module 17 is configured to store data after convolution kernel parameters of the convolutional neural network are "one-dimensionally" processed, where the first (n × n) column of each row can be used to store convolution kernels of n rows and n columns, and similarly, other convolution kernels can be stored afterwards, each row can be used to store multiple convolution kernel parameters, and convolution kernel parameters of one convolution layer can be stored by one or multiple rows. The shift register module 3 is used for storing input data of the convolutional neural network, and can shift and output corresponding input data 4 for performing convolution operation of the in-memory calculation module 17. The controller module 1 controls the shift register module 3 to input the corresponding input data of CNN into the in-memory computing module 17, and one input data received by each column outputs the input data 6 and its inverted data 6 to the SRAM module 12 through the input and inverted input module 5. Before data operation, the precharge capacitor 10 needs to be charged by the precharge block 8, the BL potential 9 and the BLB potential 9 are set at high potential, and then the precharge block 8 is turned off. The controller module 1 then enables one of the rows of SRAM modules 12 (this part of the control function is implemented by the decoder) by controlling the VWL signal 11. The output 6 of the selected and enabled row of SRAM modules 12 passes through the skew inverter 13 and the or gate 15, and the xor operation of the input data and the stored data can be realized. In this example, the CNN implemented by the circuit binarizes the weight and the hidden layer activation value by 0 or 1, whereas the conventional CNN binarizes the weight and the hidden layer activation value by-1 or 1, and the calculation result 18 of the in-memory calculation module 17 is input to the detection conversion module 19, so that the process of the convolution equivalent to the conventional CNN by the exclusive or operation in this example can be implemented, and the result 20 of the corresponding convolution kernel convolution is obtained. The normalization module 21 weights the results 20 of the different convolution kernels together to obtain a normalized result 22. The normalization result 22 adds a nonlinear factor to the neural network through the activation function module 23, and solves the problem of insufficient expression and classification capability of the linear model.

As shown in fig. 2, which shows an example of CNN, the input is a picture with two convolutional layers, two pooling layers, and shows its fully connected layers.

As shown in fig. 3, two CMOS inverters 26-27 are connected end-to-end, i.e., one bit of data B is stored, B' being the inverted data of B. The outputs of the two inverters 26-27 are connected to an FD-SOI-MOSFET28, where a and a' are the input data 6 and its inverse data 6, respectively, VWL is the select enable signal 11 that controls the FD-SOI-MOSFET28 to turn on and off, and the BL potential 9 and the BLB potential 9 are the two outputs of the SRAM module 12. The nand operation of the input signal and the storage signal (see fig. 6 and 7) can be performed by adjusting the threshold voltage of the FD-SOI-MOSFET back-gate signal (see fig. 5).

As shown in fig. 4, which shows a cross-sectional view of an n-type FD-SOI-MOSFET, compared to a conventional MOSFET, with extremely thin doped regions and channels (silicon films) separated by buried oxide layers between the doped regions and channels and the back gate, the threshold voltage is adjusted by the back gate voltage adjusting effect on the channel.

As shown in fig. 5, this graph shows the adjustment effect of the back gate voltage on the threshold voltage, and shows that the larger the back gate voltage, the smaller the threshold voltage.

As shown in fig. 6, before the in-memory computing module 17 performs operation, the pre-charge module 8 is enabled, the pre-charge capacitor 10 is charged to make the BL electric potential 9 and the BLB electric potential 9 at high electric potentials, and then the pre-charge module 8 is turned off. The controller module 1 then enables one of the rows of SRAM modules 12 by controlling VWL signal 11, when two FD-SOI-MOSFETs 28 are turned on for a particular SRAM module 12, where a is the input data 6 and B is the stored data (see fig. 3). When a is 0, the threshold voltage is large, when a is 1, the threshold voltage is small, and when the threshold voltage is large, the discharging speed of the pre-charging capacitor is slower than that when the threshold voltage is small; when B is 0, the pre-charge capacitor discharges at a high speed and its potential drops to a very low level, and when B is 1, the pre-charge capacitor discharges at a low speed and its potential change drops to a small level. Finally, combining the 4 cases, the potential BL potential 9 of the pre-charge capacitor 10 changes as shown in the figure, and it can be seen that after the inversion level 40 is realized, the curves 32 to 34 are "1" and the curve 35 is "0". Similarly, the change in the potential BLB 9 of the pre-charge capacitor 10 can be obtained, after the inversion level 40 is achieved, the curves 36-38 are "1" and the curve 39 is "0".

As shown in FIG. 7, in conjunction with the above analysis of FIG. 6, a truth table for the SRAM module 12 can be derived. The function realized by the SRAM module 12 of fig. 3, i.e., BL ═ AB ')', BLB ═ a 'b', can be obtained from the truth table.

As shown in fig. 8, the diagram shows a circuit of the skew inverter 13 in fig. 1, which is a CMOS inverter composed of two FD-SOI-MOSFETs 41-42, whose back gate voltages are Vp and Vn, respectively. The most important to implement the function of fig. 6 is the implementation of the toggle level 40 shown in fig. 6. The switching level 40 of the skewed inverter 13, i.e., the switching level 40 shown in fig. 6, can be achieved by adjusting Vp and Vn.

As shown in fig. 9, which shows the input curve 45 and output curve 46 of the skewed inverter 13 shown in fig. 8, and its toggle level 40.

As shown in fig. 10, the figure is a convolution flow chart of a binary convolution neural network memory computation accelerator based on FD-SOI technology, and includes:

step S1, the controller module sends out control instruction to control the shift register to output corresponding data to the input and inverting input module of the in-memory computing module;

step S2, the input and inverse phase input module transmits data to the SRAM module in the same row;

step S3, the controller module controls the pre-charge module, enables the pre-charge module, charges the pre-charge capacitor to make the BL and BLB potentials at high potentials, and then closes the pre-charge module;

step S4, the controller module selects one signal line of VWL1-VWLc to make it at high potential, even if the SRAM module line connected with the signal line is enabled, the transmission data of step 2 and the storage data of the SRAM module are completed with NAND operation;

step S5, the SRAM calculation result is transmitted to the skew inverter, and then the calculation result of the skew inverter is transmitted to the OR gate, and the step 4-5 completes the XOR operation of the transmission data of the step 2 and the storage data of the SRAM module;

step S6, the OR gate calculation result in each in-memory calculation module is transmitted to the detection conversion module, the binary convolution neural network binarizes the weight and the hidden layer activation value to 0 or 1, and after passing through the in-memory calculation module and the detection conversion module, the function of the binary convolution neural network is equal to the binary convolution neural network binarizing the weight and the hidden layer activation value to-1 or 1;

step S7, transmitting the output results of all detection conversion modules to a normalization module;

step S8, the output result of the normalization module is transmitted to the activation function module;

and step S9, transmitting the output result of the activation function module to the shift register module for storage, jumping to step 1 if the convolution operation is not finished, otherwise, finishing.

Claims

1. The FD-SOI technology-based binary convolution neural network in-memory computing accelerator is characterized by comprising an in-memory computing module, a shift register module, a controller module, a detection conversion module, a normalization module and an activation function module; wherein the content of the first and second substances,

the shift register module is used for storing input data of the convolutional neural network and has a shift function, and the output of the shift register module is connected with the input of the in-memory computing module;

the in-memory computing module is used for storing convolution kernel parameters of the convolution neural network and completing convolution processing on input data, and the output of the in-memory computing module is connected with the input of the detection conversion module; the in-memory calculation module consists of an SRAM module, an input and reverse phase input module and a pre-charging module;

the SRAM module is a module which is constructed by 6 MOSFETs and can store one-bit data, and comprises two CMOS inverters which are composed of two P-type MOSFETs and two N-type MOSFETs, the two CMOS inverters are connected end to end, the structure is used for storing one-bit data, an FD-SOI-MOSFET is respectively connected to the output ends of the two inverters, back gate signals of the FD-SOI-MOSFET are respectively connected with input signals and inverted signals of the input signals, and due to the fact that the FD-SOI-MOSFET back gate signals have an adjusting effect on threshold voltages of the FD-SOI-MOSFET, the NAND operation of the input signals and the stored signals can be completed by the adjusting effect;

the input and inverting input module provides an input signal and an inverting signal of the input signal for the SRAM module;

the pre-charging module charges a pre-charging capacitor before the calculation module in the memory performs exclusive-or operation;

the memory internal calculation module is used for storing data processed by convolution kernel parameters of a convolution neural network in a one-dimensional mode, the front (n multiplied by n) column of each row can be used for storing convolution kernels of n rows and n columns, and similarly, other convolution kernels can be stored later, each row can be used for storing a plurality of convolution kernel parameters, the convolution kernel parameters of one convolution layer are stored by one or more rows, and the output signals of the SRAM module are inverted by a skew inverter and then subjected to OR processing, so that the XOR operation of the input signals and the stored signals of the SRAM module can be realized;

the detection conversion module is used for converting the calculation result of the calculation module in the memory into a convolution calculation result, and the output of the detection conversion module is connected with the input of the normalization module; the specific working mode is as follows: detecting the number of '1' in the output data of the calculation module in the memory, and multiplying the bit width of the output data of the calculation module in the memory by subtracting 2 to detect, namely, the binary convolution neural network is equivalent to binary-1 or 1 of the weight and the activation value of the hidden layer;

the normalization module is used for adding the convolution results of different convolution kernels according to weights, and the output of the normalization module is connected with the input of the activation function module;

the activation function module is used for adding a nonlinear factor to the neural network, and the output of the activation function module is respectively connected with the input of the controller module and the input of the shift register module;

the controller module is used for carrying out logic control on the shift register module and the in-memory computing module, and the specific control mode is as follows: controlling the shift register module to output corresponding data; controlling the enabling and closing of the pre-charging module; and controlling the output data of the shift register and convolution kernel parameters stored by the corresponding row of the calculation module in the memory to carry out exclusive OR operation.

2. A binary convolution neural network in-memory computation accelerator based on FD-SOI technology according to claim 1, wherein the convolution process of the in-memory computation accelerator is:

step 2, the input and reverse phase input module transmits data to the SRAM module in the same column;

step 3, the controller module controls the pre-charging module, enables the pre-charging module, charges the pre-charging capacitor to enable the output end of the SRAM module to be at a high potential, and then closes the pre-charging module;

step 4, the controller module selects one signal line in the SRAM module enabling signal to enable the signal line to be at a high potential, and even if the SRAM module line connected with the signal line enables, the transmission data in the step 2 and the storage data of the SRAM module complete NAND operation;

step 5, transmitting the calculation result of the SRAM to a skew inverter, and transmitting the calculation result of the skew inverter to an OR gate, wherein the step 4-5 completes the XOR operation of the transmission data of the step 2 and the storage data of the SRAM module;

step 6, transmitting the OR gate calculation result in each in-memory calculation module to a detection conversion module, setting a binarization convolution neural network to binarize the weight and the hidden layer activation value to 0 or 1, and after passing through the in-memory calculation module and the detection conversion module, realizing the function equivalent to the binarization convolution neural network which binarizes the weight and the hidden layer activation value to-1 or 1;