WO2021056677A1

WO2021056677A1 - Dual-phase coefficient adjustable analog multiplication calculation circuit for convolutional neural network

Info

Publication number: WO2021056677A1
Application number: PCT/CN2019/114107
Authority: WO
Inventors: 刘波; 沈泽昱; 孙煜昊; 黄乐朋; 朱文涛; 杨军
Original assignee: 东南大学
Priority date: 2019-09-27
Filing date: 2019-10-29
Publication date: 2021-04-01
Also published as: WO2021056980A1; CN110750231B; CN110750231A

Abstract

A dual-phase coefficient adjustable analog multiplication circuit for a convolutional neural network, which relates to the technical fields of calculating, estimating and counting. The multiplication calculation circuit comprises a current-type network digital-to-analog conversion module, a dual-phase coefficient adjustable analog multiplication array, a pipeline-type analog-to-digital conversion module and a calculation unit control module. Multiplication calculation of a neural network layer is achieved by using a discrete time circuit structure, and a signed multiplier design is newly added to provide positive and negative control, signed multiplication may be achieved and a wider range of voltage amplitude is provided.

Description

A dual-phase coefficient adjustable analog multiplication calculation circuit oriented to convolutional neural network

Technical field

The invention discloses a dual-phase coefficient adjustable analog multiplication calculation circuit oriented to a convolutional neural network, which relates to digital-analog hybrid integrated circuit technology and belongs to the technical field of calculation, calculation and counting.

Background technique

There are many better optimized designs for the convolutional layer of today's convolutional neural networks, which have significant effects in terms of power consumption, area and energy efficiency. For example, in the aspect of data storage, methods such as quantization and compression are used to realize the binarization of the convolutional neural network; in the calculation circuit, the exclusive NOR gate is used as an approximate multiplier for convolution operations. Therefore, the further optimization design of the convolutional layer in the field of reducing the network layer and the digital circuit has reached the bottleneck period. The calculation volume and parameter volume of the convolutional neural network are large, and the requirements for hardware accelerators are high. I want to further optimize and reduce The power consumption of convolution operation saves circuit cost. Many studies have proposed a scheme of combining analog circuits and digital circuits in a chip. For example, use analog multipliers instead of digital multipliers. However, compared with the digital multiplier, the traditional analog multiplier does not make full use of the scaling ratio of the CMOS process. In practice, the design of the multiplication circuit is more difficult. In order to achieve reliable accuracy, the digital-to-analog conversion circuit requires higher requirements. Therefore, the advantages of analog circuits cannot be fully utilized, and the computational power consumption that can be reduced is very limited.

Summary of the invention

In order to solve the problem that the further optimization design of the existing neural network convolutional layer has reached the bottleneck period, the present invention provides a neural network-oriented two-phase coefficient adjustable analog multiplication circuit, which can convert the digital signal in the multiplication operation into For analog signals, the discrete-time circuit design is used to design analog multiplication calculation circuits, which can reduce the calculation power consumption of irregular network layers and achieve high linearity robustness. The design of dual-phase coefficient switching circuits can have a wide frequency response tuning range.

The present invention adopts the following technical solutions to achieve the above-mentioned purpose of the invention: A neural network-oriented dual-phase coefficient adjustable analog multiplier includes a current network digital-to-analog conversion module, a dual-phase coefficient adjustable analog multiplication array, a pipelined analog-to-digital conversion module and calculation Unit control module. The current-type network digital-to-analog conversion module converts the characteristic data read from the storage module into an analog voltage, and uses the analog voltage as the input voltage of the two-phase coefficient adjustable analog multiplication array. The calculation unit control module reads the weight data from the storage module, and controls the switching state of the analog multiplication unit in the dual-phase coefficient adjustable analog multiplication array in combination with the size of the convolution kernel, and completes the setting of the coefficient and the working state. The two-phase coefficient adjustable analog multiplication array is an array composed of analog multiplication units, which is used to realize the multiplication operation of various network layers in the neural network. The pipelined analog-to-digital conversion module converts the output voltage of the two-phase coefficient adjustable analog multiplication array into a digital signal. Finally, the digital signal output by the pipelined analog-to-digital conversion module is stored in the storage module.

The present invention adopts the above technical scheme and has the following beneficial effects:

(1) This patent proposes a two-phase coefficient adjustable analog multiplication calculation circuit for convolutional neural networks, which uses discrete-time circuit structure to realize the multiplication calculation of the neural network layer, and adds a signed multiplier design to provide forward control And negative control can realize multiplication with sign bit and provide a wider range of voltage amplitude.

(2) The dual-phase coefficient adjustable analog multiplication array is composed of multiple coefficient adjustable analog multiplication circuit units, and each coefficient adjustable analog multiplication circuit unit is composed of a dual-phase sample and hold buffer circuit and a switch-controlled analog multiplication circuit. The sample-and-hold buffer circuit converts the input analog voltage into a signed multiplier, and adjusts the multiplication coefficient by controlling the switch circuit structure in the analog multiplier circuit, so that the analog voltage representing the signed multiplier is the input voltage of the analog multiplier, The analog voltage output by the analog multiplication circuit can be superimposed to achieve a multiplication with a coefficient between 0 and 1 and a precision of ^2-6 . Low power consumption and high linearity can also be maintained at low power supply voltages.

(3) The current-type network digital-to-analog conversion module precharges the output signal terminal to the analog voltage proportional to the input value, so as to achieve the purpose of digital-to-analog conversion. It has good linearity and mismatch, and uses multiplexing. The device generates an input pulse for each data, reducing area overhead and signal routing.

(4) The pipeline analog-to-digital conversion module adopts a parallel structure, which can process multiple sampled data at the same time. The signal processing speed is high, while maintaining high precision, it requires low power consumption, and has good linearity and low power consumption. Offset characteristics, so it can achieve high-speed and high-resolution conversion.

Description of the drawings

Figure 1 is a schematic diagram of the overall architecture of the present invention.

Figure 2 is the current-type network digital-to-analog conversion module of the present invention.

Figure 3 shows the analog multiplication unit and its two-phase circuit structure of the present invention.

Figure 4 shows the pipelined analog-to-digital conversion module of the present invention.

detailed description

The present invention will be further clarified below in conjunction with specific examples. It should be understood that these examples are only used to illustrate the present invention and not to limit the scope of the present invention. After reading the present invention, those skilled in the art will understand various equivalent forms of the present invention. All modifications fall within the scope defined by the appended claims of this application.

The two-phase coefficient adjustable analog multiplication calculation circuit for convolutional neural networks performs deep convolution, point-by-point convolution, activation layer, pooling layer and batch normalization in neural networks under the control and scheduling between internal modules Multiplication in the transformation layer. As shown in Figure 1, the neural network-oriented dual-phase coefficient adjustable analog multiplier includes: current network digital-to-analog conversion module, dual-phase coefficient adjustable analog multiplication array, pipelined analog-to-digital conversion module, and calculation unit control module.

As shown in Figure 2, the current-mode network digital-to-analog conversion module is composed of an input pulse generation module and a cascaded PMOS constant current source. The input pulse generation module is composed of an 8:1 multiplexer with 8 timing signals. The purpose is to generate an input pulse for each input value while reducing area overhead and signal routing. And can design multiplexers according to different precision requirements, such as 6:1 or 10:1 multiplexers. The constant current source cascaded PMOS three PMOS transistors _{_{(M P1, M P2, M}} P3) and an NMOS transistor M _N (negative channel metal oxide semiconductor field-effect) transistors. The charging current time of the output signal terminal is proportional to the input value. This digital-to-analog conversion module architecture has better linearity and mismatch than the binary weighted PMOS charging digital-to-analog conversion module. In addition, compared with the signal generated by the mismatch of the threshold voltage of the PMOS, the pulse width of the timing signal usually has a smaller change, so it has good stability.

The input characteristic data is read from the storage module, and the input characteristic data enters the first-in-first-out memory of the current-type network digital-to-analog conversion module. When the input data is 6bit, first pass the input data through the 6:1 multiplexer, and use the input The 3 most significant bits of the data are used to select the first half of the charging pulse width, and the 3 least significant bits of the input data are used to determine the second half of the charging pulse width, and then the charging pulse is input to the current-type network digital-to-analog conversion module to convert it into an analog voltage. The analog voltage output by the current network digital-to-analog conversion module is transferred to the two-phase coefficient adjustable analog multiplication calculation array as the input voltage.

The weight data is read from the storage module, and the weight data enters the calculation unit control module. The calculation unit control module combines the size of the convolution kernel and the weight data to configure each analog multiplication calculation unit with a 2-bit control signal and a 6-bit multiplication coefficient value. The 2-bit control signal controls the switch 7 and the switch 8 of each analog multiplication calculation unit to realize the forward control or the reverse control of the input signal, and the 6-bit multiplication coefficient value controls the switches 1 to 6 to realize the adjustment of the multiplication coefficient value. Under normal circumstances, the size of the convolution kernel is 3×3. At this time, the coefficient-adjustable analog multiplication array will mobilize 3×3 computing units in the array to complete the task. When the size of the convolution kernel is 2×2, the analog multiplication circuit array with adjustable coefficients can perform 4 sets of arithmetic operations at the same time. When the size of the convolution kernel is 1×1, the analog multiplication circuit array with adjustable coefficients can perform 16 sets of arithmetic operations at the same time. When the size of the convolution kernel is 4×4, the analog multiplication circuit array with adjustable coefficients can simultaneously perform a set of arithmetic operations. When the size of the convolution kernel is N×N, and N is greater than 4, multiple analog multiplication circuit arrays with adjustable coefficients can be used for parallel calculation.

The dual-phase coefficient adjustable analog multiplication array is composed of 4 by 4, a total of 16 coefficient adjustable analog multiplication circuit units. Each coefficient adjustable analog multiplication circuit unit is composed of a dual-phase sample and hold buffer circuit and a switch-controlled analog multiplication circuit. The dual-phase sample and hold buffer circuit is composed of a common source amplifier. To control the input of the signed multiplier, use the signed multiplier as the input signal of the switch-controlled analog multiplication circuit, and combine with the adjustment of the multiplication coefficient to realize the multiplication with the coefficient between 0 and 1 and the accuracy of ^2-6 . At the same time, low power consumption and high linearity can be maintained under low power supply voltage.

As shown in Figure 3, after the input voltage of the adjustable analog multiplication calculation array enters the analog multiplication calculation unit, it is positively controlled when the switch 7 is closed and the switch 8 is open. The analog multiplication calculation unit is closed when the switch 7 is open and the switch 8 is closed. When the time is negative control, the analog multiplication calculation unit stops when the switches 7 and 8 are both open. After selecting the operating mode, the input voltage is stabilized by the sample and hold buffer circuit. After the input voltage is stabilized, it is used as the input voltage of the six parallel switch branches. There are 6 branches in total from switch 1 to switch 6, and the switch on each branch is connected in series with a capacitor corresponding to the value of the multiplication coefficient of one bit. Switch 1 is connected in series with a 10fF capacitor, switch 2 is connected in series with a 30fF capacitor, switch 3 is connected in series with a 40fF capacitor, switch 4 is connected in series with a 10fF capacitor, switch 5 is connected in series with a 20fF capacitor, and switch 6 is connected in series with a 40fF capacitor. The opening and closing of switches 1 to 6 are controlled by the control module of the calculation unit. Switches 1 to 6 respectively correspond to the lowest to highest position of the 6-bit coefficient value. If the corresponding bit is 1, the switch is closed and the corresponding capacitor is charged; if If the corresponding bit is 0, the switch is turned on and the corresponding capacitor is discharged. Connect switch 1 branch, switch 2 branch, switch 3 branch and a 10fF capacitor in parallel to main branch 1, connect switch 4 branch, switch 5 branch, and switch 6 branch in parallel to main branch 2, then The main branch 1, an 800/7fF capacitor, and the main branch 2 will be connected in series in sequence. The terminal voltage of the main branch 2 is the output voltage. If switch 1 is closed, the output voltage will increase by 8/569 (about 1/64) of the input voltage; if switch 2 is closed, the output voltage will increase by 24/569 (about 1/32) of the input voltage; if switch 3 is closed , The output voltage will increase by about 32/569 (1/16) of the input voltage; if the switch 4 is closed, the output voltage will increase by 72/575 (about 1/8) of the input voltage; if the switch 5 is closed, the output voltage will increase There is an increase of 144/575 (about 1/4) of the input voltage; if the switch 6 is closed, the output voltage will have an increase of 288/575 (about 1/2) of the input voltage. It can be seen that the analog multiplication unit uses a discrete-time switched capacitor circuit. According to the switch circuit, the adjustable high-order narrow bandwidth programmable filter is realized. Then the digital circuit controls the closing of the 6 switches, and the support generated when the capacitor is charged on the switch circuit is controlled. The output voltage of the analog multiplication unit can be obtained by superposing the circuit voltage.

The pipelined analog-to-digital converter can achieve high-speed and high-resolution conversion, and meet the requirements of low power consumption and small area chip design. As shown in Figure 4, the pipelined analog-to-digital converter is mainly composed of multiple cascaded circuits, and each stage includes a Sample/Hold (S/H) amplifier, low-precision ADC, DAC, and summing circuit. The input analog quantity is converted into a 3bit digital quantity by a 3-bit coarse-precision ADC, which is the high 3 bits of the output data, and the 3bit digital quantity is converted into an analog quantity by the DAC. The S/H amplifier samples the 3bit digital quantity output by the ADC and performs summation or difference operation with the analog quantity converted by the DAC, thereby deleting the analog signal corresponding to the 3bit digital quantity from the input signal, and the difference is obtained by amplifying Output the low 3 digits of the data and send the low 3 digits of the output data to the next-level circuit for processing. According to the accuracy requirements of the digital quantity, the participating signals are converted and finally high-precision n-bit output data is obtained. Pipeline ADCs need to use digital error correction technology to reduce the accuracy requirements of the internal comparator. If the upper-level comparator has a large offset and the input voltage is at the comparison point, it will generate an incorrect output value, resulting in a difference The difference. After passing through the amplifier, the correct ADC result can be restored. Compared with other analog-to-digital converters, pipeline ADC is a parallel structure, can process multiple sampled data at the same time, high signal processing speed, low power consumption while maintaining high precision, and has good linearity and Low offset characteristics, so it can achieve high-speed and high-resolution conversion.

The realization of the entire function of the dual-phase coefficient adjustable analog multiplication calculation circuit disclosed in the present application includes the following 6 steps.

Step 1: Input characteristic data and read it from the storage module. The characteristic data enters the first-in-first-out memory of the current-type network digital-to-analog conversion module. The digital-to-analog conversion module converts the characteristic data into an analog voltage and transmits it to the dual-phase coefficient adjustable analog Multiply the calculation array as the input voltage.

Step 2: The weight data is read from the storage module, and the weight data enters the calculation unit control module. The calculation unit control module combines the size of the convolution kernel and the weight data to control the eight switches in each analog multiplication calculation unit, and set the multiplication coefficients And the working mode of the two-phase symbol selector (positive control, negative control, stop).

Step 3: The input voltage of the analog multiplication calculation unit passes through the two-phase sign selector in positive control or negative control mode to complete the sign bit operation; then, the signed multiplier passes through the sample and hold buffer circuit to maintain the input voltage value. To prevent its attenuation from affecting the calculation results, at the same time, switches 1 to 6 are switched to the closed or open state according to the six-bit digital signal of the coefficient value. The capacitor on the branch where the closed switch is located is charged, and the capacitor on the branch where the open switch is located is charged. Discharge, when the charge and discharge process is over, the multiplication calculation result can be determined according to the difference of the capacitance and the circuit structure. The contribution of the capacitance of different switch branches to the output voltage is different, and the contribution of switch 1 to the output voltage is about 1/ With an input voltage of 64, the contribution of switch 2 to the output voltage is about 1/32 of the input voltage, the contribution of switch 3 to the output voltage is about 1/16 of the input voltage, and the contribution of switch 4 to the output voltage is about 1/8 Input voltage; the contribution of switch 5 to the output voltage is about 1/4 of the input voltage, and the contribution of switch 6 to the output voltage is about 1/2 of the input voltage; finally, according to the closing of the switch, the corresponding output voltage can be obtained , Which is the output of the analog multiplier.

Step 4: The output voltage of the analog multiplier is finally transferred to the pipeline analog-to-digital converter to obtain the output value, which is stored in the memory and waits for the next read instruction.

Claims

A dual-phase coefficient adjustable analog multiplication calculation circuit for convolutional neural network, which is characterized in that it comprises:

The digital-to-analog conversion module converts the read characteristic data into an analog voltage and outputs it to the dual-phase coefficient adjustable multiplication array,

The calculation unit control module outputs the working state control signal and the multiplication coefficient control signal of the two-phase coefficient adjustable multiplication array according to the read weight data and combined with the size of the convolution kernel,

Two-phase coefficient adjustable multiplication array, each multiplication unit converts the input analog voltage into a signed multiplier under the action of its working state control signal and multiplication coefficient control signal, selects the circuit structure corresponding to the multiplication coefficient, and outputs the multiplication operation result,

The analog-to-digital conversion module performs analog-to-digital conversion on the multiplication result output by the bidirectional coefficient adjustable multiplication array.
The two-phase coefficient adjustable analog multiplication calculation circuit for convolutional neural network according to claim 1, wherein each multiplication unit comprises:

The two-phase sample and hold buffer circuit has a switch in series with the forward input terminal and the reverse input terminal. The control terminals of the two switches are connected to the working state control signal output by the control module of the calculation unit, and the input analog voltage is forwardly controlled. Or output an analog voltage that represents a signed multiplier after reverse control, and,

The switch-controlled analog multiplication circuit is composed of multiple capacitor branches connected in parallel. Each capacitor branch is connected in series with a switch controlled by the multiplication coefficient control signal. The input terminal formed by connecting one end of the positive plate of each capacitor branch is connected to two-phase At the output end of the sample and hold buffer circuit, each capacitor branch is switched to the charging state or the discharging state under the action of the multiplication coefficient control signal, and the analog voltage representing the signed multiplier acts on the capacitor branch in the charging state, and each capacitor branch The output terminal formed after one end of the negative plate of the circuit is connected to output an analog voltage representing the result of the multiplication operation.
The bi-phase coefficient adjustable analog multiplication calculation circuit for convolutional neural network according to claim 1, wherein the digital-to-analog conversion module comprises:

Input pulse generation module, which multiplexes the input characteristic data and maps it to the input terminal of the cascaded PMOS constant current source, and,

The cascaded PMOS constant current source converts the characteristic data mapped at the input terminal into the charging current time of the analog voltage output terminal in direct proportion.
The two-phase coefficient adjustable analog multiplication calculation circuit for convolutional neural network according to claim 1, wherein the analog-to-digital conversion module is composed of a plurality of cascaded modules, and each cascaded module includes:

A low-precision ADC performs analog-to-digital conversion on the multiplication result of its input to obtain the high-bit data of the output digital signal.

DAC, which performs analog-to-digital conversion on the high-order data of the output digital signal at its input and outputs the analog signal corresponding to the high-order data.

A sample/hold amplifier, which samples and holds the result of the multiplication operation at the input and outputs it, and,

The summing circuit performs the summation or difference operation on the multiplication result output by the sample/hold amplifier and the analog signal corresponding to the high-order data output by the DAC, and outputs the multiplication operation result with the high-order data deleted to the next cascade module.
The two-phase coefficient adjustable analog multiplication calculation circuit for convolutional neural network according to claim 3, wherein the input pulse generation module is a multiplexer, and the multiplexer is each The characteristic data generates an input pulse.
The dual-phase coefficient adjustable analog multiplication calculation circuit for convolutional neural network according to claim 4, wherein the analog-to-digital conversion module further comprises correcting the output signals of the multi-level connection modules and then splicing them to obtain the required accuracy Digital correction and alignment circuit for digital quantities.
The dual-phase coefficient adjustable analog multiplication for convolutional neural networks is characterized in that the characteristic data is converted into an analog voltage as the input of the dual-phase coefficient adjustable multiplication array, and the dual-phase is controlled according to the weight data and combined with the size of the convolution kernel. The working state and multiplication coefficient of each multiplication unit in the coefficient adjustable multiplication array are converted from analog to digital on the multiplication operation result output by the bidirectional coefficient adjustable multiplication array to obtain the calculation result.
The dual-phase coefficient adjustable analog multiplication oriented to convolutional neural network according to claim 7, characterized in that the working status and multiplication of each multiplication unit in the dual-phase coefficient adjustable multiplication array are controlled according to the weight data and combined with the size of the convolution kernel The specific method of the coefficient is: according to the weight data and combined with the size of the convolution kernel to generate a working state control signal for forward control or reverse control of the analog voltage at the input of each multiplication unit, and select the circuit structure of each multiplication unit to achieve different multiplication coefficients. Multiplication coefficient control signal.
The two-phase coefficient adjustable analog multiplication method for convolutional neural network according to claim 7, wherein the specific method of converting the characteristic data into an analog voltage as the input quantity of the two-phase coefficient adjustable multiplication array is: The data is mapped as a proportional conversion to the charging current time of the analog voltage output terminal of the cascaded PMOS constant current source.
The two-phase coefficient adjustable analog multiplication oriented to the convolutional neural network according to claim 7, characterized in that a pipelined analog-to-digital conversion method is used to perform analog-to-digital conversion on the multiplication operation result output by the bidirectional coefficient adjustable multiplication array to obtain the calculation result .