Neuron network unit, convolution operation module and convolution neural network
Technical Field
The invention relates to the field of integrated circuit design, in particular to a neuron network unit, a convolution operation module and a convolution neural network.
Background
With the development of the big data era, artificial intelligence has become a very important subject field, and a neural network special chip is an important hardware tool for a computing system to efficiently complete neural network computation. Traditional computing architectures employ a von neumann architecture with separate computation and storage, and memory bandwidth and memory power consumption in von neumann architectures have begun to dominate computing bandwidth and energy under big data trends. A significant portion of this power consumption is spent on memory and data handling by the compute unit. The memory calculation which takes the memory as the leading part reduces huge time and power consumption expense brought by data transportation to a great extent through the combination of the neural network algorithm and the storage hardware architecture.
The conventional convolutional neural network memory (CIM SRAM) uses the word line and bit line architecture of the conventional SRAM, so that when data is input, the data received by each row of word lines is the same set of input data. Although different convolution kernels can be used for performing the multiply-add operation when the convolution operation is performed on the same group of data, based on the current simple convolution neural network algorithm, the maximum number of 16-20 groups of convolution kernels are used for the data of the same group of convolution windows, so that for the SRAM array, the resource waste is caused by the fact that only 16-20 columns of bit lines are used for performing the operation.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a neuron network unit, a convolution operation module and a convolution neural network, so that the resource utilization rate of an array unit is improved, and the power consumption is reduced.
In order to solve the above problems, the present invention provides a neural network unit, which includes a static random access memory unit, a forward sense isolation branch, and a reverse sense isolation branch; the static random access memory unit comprises a first transmission transistor and a second transmission transistor which are electrically connected in series, and two opposite interlocked first inverters and second inverters which are connected between the first transmission transistor and the second transmission transistor in parallel, wherein the forward direction reading isolation branch circuit is connected between the first transmission transistor and the two opposite interlocked inverters and is used for converting an externally input digital voltage into an analog current to be output according to a control signal stored in the static random access memory unit; the reverse readout isolation branch is connected between the second transmission transistor and the two opposite interlocked inverters and used for converting an externally input digital voltage into an analog current according to a control signal stored in the static random access memory unit and outputting the analog current.
Optionally, the forward read isolation branch comprises a first readout transistor and a second readout transistor connected in series; the grid electrode of the first reading transistor is connected between the first transmission transistor and the two oppositely interlocked inverters and is used as the control end of the forward reading isolation branch circuit, and the drain/source electrode of the first reading transistor is connected with the working voltage; the grid electrode of the second reading transistor is used as an external input end, and the source/drain electrode of the second reading transistor is used as the output end of the forward reading isolation branch.
Optionally, the reverse read isolation branch includes a third readout transistor and a fourth readout transistor connected in series; the grid electrode of the third reading transistor is connected between the second transmission transistor and the two oppositely interlocked inverters and is used as the control end of the reverse reading isolation branch circuit, and the drain/source electrode of the third reading transistor is grounded; the grid electrode of the fourth reading transistor is used as an external input end, and the source/drain electrode of the fourth reading transistor is used as the output end of the reverse reading isolation branch circuit.
The invention also provides a convolution operation module, which comprises a convolution operation sub-block and a pulse frequency quantization unit; the convolution operation sub-block comprises an array formed by N multiplied by N neuron network units, wherein N is a positive integer, and the array comprises: the first transmission transistor and the second transmission transistor of the neuron network unit are connected with a write word line of the array, a drain/source electrode of the first transmission transistor and a source/drain electrode of the second transmission transistor are respectively connected with a bit line and a reverse bit line, the external input end of the forward isolation branch is connected with a row word line, the external input end of the reverse isolation branch is connected with a column word line, the output end of the forward isolation branch is connected with a row current output, and the output end of the reverse isolation branch is connected with a column current output; the pulse frequency quantization unit is connected with the output end of the convolution operation sub-block, converts the current output in the convolution operation sub-block into a pulse signal to carry out pulse frequency counting, and further converts the pulse signal into multi-bit digital output.
Optionally, the pulse frequency quantization unit includes an amplifier, a comparator, and a counter connected in series: the amplifier is connected with an integrating capacitor in parallel and is connected with the comparator in series to serve as an input end of the pulse frequency quantization unit; the output end of the comparator is further connected to two ends of the integrating capacitor through a discharge control switch, and the discharge control switch is used for discharging the integrating capacitor when the comparator outputs a high level; the integrating capacitor is charged by analog current input and then amplified by an amplifier to cause pulse output of a comparator, so that the current value is equivalently converted into pulse frequency; the counter is connected with the comparator in series and counts the number of pulses to form multi-bit digital output.
The invention also provides a convolution neural network which comprises an array formed by the convolution operation modules with the number of N multiplied by N, wherein N is a positive integer.
According to the invention, each row of word lines receives two groups of different data and respectively carries out convolution operation, more groups of convolution kernels can be used for the data of the same group of convolution windows, and resource waste is avoided.
Drawings
Fig. 1 is a schematic circuit structure diagram of a neural network unit according to an embodiment of the present invention.
Fig. 2 is a schematic circuit diagram of a convolution operation sub-block in a convolution operation module according to an embodiment of the present invention.
Fig. 3 is a schematic circuit diagram of a pulse quantization unit in a convolution operation module according to an embodiment of the present invention.
Detailed Description
The following describes in detail specific embodiments of the sram cell, the convolution operation module, and the convolution neural network according to the present invention with reference to the accompanying drawings.
In the following description, the source and drain connections of the transistor are described alternatively, and it should be noted that in other embodiments, the source and drain connections of the transistor may be equivalently replaced without affecting the actual electrical function.
Fig. 1 is a schematic circuit diagram of a neural network unit according to an embodiment of the present invention, which includes a sram cell M1, a forward sense isolation branch B1, and a reverse sense isolation branch B2. The static random access memory cell M1 is a 6T structure including a first pass transistor T1 and a second pass transistor T2 electrically connected in series, and two opposing interlocked first and second inverters D1 and D2 connected in parallel between the first and second pass transistors T1 and T2.
In this particular embodiment, the forward sense isolation branch B1 includes a first sense transistor R1 and a second sense transistor R2 connected in series; the gate of the first readout transistor R1 is connected between the first transfer transistor T1 and the two opposite interlocked inverters as the control terminal of the forward read isolation branch, and the drain of the first readout transistor R1 is connected to the operating voltage VDD; the gate of the second sense transistor R2 serves as the external input and the source of the second sense transistor R2 serves as the output of the forward sense isolation branch. The forward sense isolation branch B1 is connected between the first pass transistor and two opposite interlocked inverters, and is used for converting an externally input digital voltage into an analog current output according to a control signal stored in the SRAM cell M1.
In this embodiment, the reverse read isolation branch B2 includes a third read transistor R3 and a fourth read transistor R4 connected in series. The gate of the third readout transistor R3 is connected between the second transmission transistor T2 and the two opposite interlocked inverters, and serves as a control terminal of the reverse read isolation branch, and the drain of the third readout transistor R3 is grounded. The gate of the fourth pass transistor R4 serves as the external input and the source of the fourth sense transistor R4 serves as the output of the reverse sense isolation branch. The reverse sense isolation branch is connected between the second pass transistor T2 and two opposite interlocked inverters, and is used for converting an externally input digital voltage into an analog current output according to a control signal stored in the sram cell M1.
Next, a specific embodiment of the convolution operation module of the present invention will be given. The module comprises a convolution operation sub-block and a pulse frequency quantization unit. Fig. 2 is a schematic circuit diagram of a convolution operation sub-block in a convolution operation module according to an embodiment of the present invention. The convolution operation sub-block comprises an array formed by N multiplied by N neuron network units. In the present embodiment, 3 × 3 neuron network units C11 to C33 are described as an example. The first transmission transistor T1 and the second transmission transistor T2 of the neuron network unit are connected with a write word line WWL of the array, the drain electrode of the first transmission transistor T1 and the source electrode of the second transmission transistor are respectively connected with a bit line BL and a reverse bit line BLB, the external input end of the forward isolation branch is connected with a row word line LWL, the external input end of the reverse isolation branch is connected with a column word line CWL, the output end of the forward isolation branch is connected with a row current output Lo, and the output end of the reverse isolation branch is connected with a column current output Co. The array can realize convolution operation of a 4 x 8 input matrix by a 3 x 3 convolution kernel in three periods.
Fig. 3 is a schematic circuit diagram of a pulse quantization unit in a convolution operation module according to an embodiment of the present invention. The pulse frequency quantization unit is connected with the output end of the convolution operation sub-block, converts the current output in the convolution operation sub-block into a pulse signal to carry out pulse frequency counting, and further converts the pulse signal into multi-bit digital output. As one embodiment of the foregoing, as shown in fig. 2, the pulse frequency quantization unit includes an amplifier 21, a comparator 22, and a counter 23 connected in series. The amplifier 21 is connected in parallel with an integrating capacitor C and in series with a comparator 22 as an input of the pulse frequency quantization unit. The output end of the comparator 22 is further connected to two ends of the integrating capacitor C through a discharge control switch K, where the discharge control switch K is used to discharge the integrating capacitor C when the comparator 22 outputs a high level 1; the integrating capacitor C is charged by an analog current input and amplified by the amplifier 21 to generate a pulse output of the comparator 22, thereby equivalently converting the current value into a pulse frequency. Specifically, the inverted output of the comparator 22 is a discharge signal of the integrating capacitor C, the output of the comparator 22 is inverted by 0 → 1 each time the integrating capacitor C is charged to the inverting voltage of the comparator 22 by the analog current input, and when the output of the comparator 22 is 1, the integrating capacitor C is discharged, the output of the comparator 22 is 1 → 0, and further the comparator 22 forms a pulse output, and the frequency of the pulse formation is faster as the analog input current is larger. The counter 23 is connected in series with the comparator 22 to count the number of pulses and form a multi-bit digital output.
The convolution operation sub-block and the convolution operation module composed of the pulse frequency quantization unit and composed of the convolution operation sub-block and the convolution operation module composed of the convolution operation sub-block of fig. 2 and fig. 3 form an N × N array, namely a convolution neural network, and the acceleration calculation of the convolution neural network can be carried out.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.