CN109389212B

CN109389212B - Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network

Info

Publication number: CN109389212B
Application number: CN201811646433.9A
Authority: CN
Inventors: 李丽; 陈沁雨; 傅玉祥; 陈铠; 何书专; 陈辉; 程开丰
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2018-12-30
Filing date: 2018-12-30
Publication date: 2022-03-25
Anticipated expiration: 2038-12-30
Also published as: CN109389212A

Abstract

The reconfigurable activation quantization pooling system facing the low bit width convolutional neural network comprises a plurality of reconfigurable activation quantization pooling processing units, a plurality of mapping units and a plurality of mapping units, wherein the reconfigurable activation quantization pooling processing units are used for executing activation, quantization and pooling operations and executing reconfigurable operations of a working mode activation-quantization working mode or an activation-quantization-pooling working mode; the storage unit controller is used for controlling data transmission of the reconfigurable activation quantization pooling unit and the storage unit under different configurations; and the storage unit is used for temporarily storing convolution layer result data required in the pooling operation. The software optimization design can reduce redundant calculation without changing the original function by simplifying a plurality of steps of activation, quantization and the like of the low bit width convolution neural network into one step. Has the advantages that: the three steps of activation, quantization and pooling are mapped on the same hardware unit in a reconfigurable mode, so that the hardware resource area is reduced; the method for collaborative optimization of software and hardware has the characteristics of small area, low power consumption and high flexibility.

Description

Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network

Technical Field

The invention belongs to the field of artificial intelligence algorithm hardware acceleration, and particularly relates to a reconfigurable activation quantization pooling system for a low bit width convolutional neural network.

Background

The low-bit-width convolutional neural network is generally expressed as a 4-bit and below quantization convolutional neural network, which is different from the conventional convolutional neural network, and its weight and image input data can be expressed by only a few bits, such as a binarization network, a ternary network and other low-bit-width quantization neural networks. The weights of the binarization network and the image input data can be represented by 0 or 1 only; the weight of the tri-valued network is only represented by 0 or 1, and the image input data is characterized as-1, 0 or 1; in many other low bit width quantized neural networks, a certain number is often expressed in some bit combination, e.g., a 2bit "01" indicates a value of 0.5.

In the low-bit-width convolutional neural network, except for a convolutional layer, an active layer and a pooling layer included in the traditional network, a quantization operation is specially designed, and the generated image output data is quantized again to the originally set bit width.

In recent years, the hardware design for such a low bit width convolutional neural network is increasing, and the convolutional layer generally has a processing flow of sequentially performing the following operations: convolution, batch standardization, activation, quantification and pooling, wherein certain convolution layers have no pooling operation; the full connectivity layer generally processes the flow to perform the following operations in order: full concatenation, batch normalization, activation and quantification. However, such serial operations may reduce processing efficiency, bring additional hardware overhead, and may not well meet the requirements of practical applications.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a reconfigurable activation quantization pooling system facing a low-bit-width convolutional neural network, which effectively improves the flexibility of activation, quantization and pooling operations, reduces power consumption and hardware overhead, and is specifically realized by the following technical scheme:

the reconfigurable activation quantization pooling system for the low-bit-width convolutional neural network receives convolutional layer result data, and comprises:

the reconfigurable activation quantization pooling processing units are used for executing activation, quantization and pooling operations and executing reconfigurable operation of an activation-quantization working mode or an activation-quantization-pooling working mode;

the storage unit controller is used for controlling data transmission of the reconfigurable activation quantization pooling unit and the storage unit under different configurations;

and the storage unit is used for temporarily storing convolution layer result data required in the pooling operation.

The reconfigurable activation quantization pooling system for the low-bit-width convolutional neural network is further designed in that the data of the reconfigurable activation quantization pooling processing unit is transmitted to the reconfigurable activation quantization pooling processing unit from the convolution processing unit in an activation-quantization working mode, and the result data is directly output after the data is processed; and receiving the convolution layer result data in an activation-quantization-pooling working mode, firstly storing the convolution layer result data into a storage unit, transmitting the convolution layer result data into a reconfigurable activation-quantization pooling processing unit for processing under the control of a storage unit controller, and still storing the processing result back to the storage unit.

The reconfigurable activation quantization pooling system facing the low bit width convolutional neural network is further designed in that an activation function in the reconfigurable activation quantization pooling processing unit is as shown in formula (1),

x_o＝min(abs(x_i)，1) (1)

wherein x is_iRepresenting the data after convolution processing, x_oIndicating the activation value after activation.

The quantization function in the reconfigurable active quantization pooling processing unit is as shown in formula (2),

where k represents the bit width after quantization, where x_iDenotes the activation value, x_oRepresenting the quantized value after quantization. And the corresponding pooled kernel size is 2x2, as in formula (3):

x_o(i，j)＝max(x(2i，2j)，x(2i，2j+1)，x(2i+1，2j)，x(2i+1，2j+1)) (3)

where i, j respectively denote the coordinate position in the single-channel input image, where x_iRepresenting the quantized value, x_oIndicating the pooled value after pooling.

The reconfigurable activation quantization pooling system for the low bit width convolutional neural network is further designed in that the working process of the system comprises the following steps:

firstly, determining a working mode; if the working mode is activation-quantization, determining a series of characteristics or parameters of an activation function and a quantization method by analyzing the activation function and the quantization method of different low-bit convolutional neural networks; then determining the cross redundancy part of the output range of the activation function and the quantization method, and simplifying the part;

if the working mode is activation-quantization-pooling, the size of a pooling kernel needs to be analyzed on the basis of the optimization of an activation-quantization algorithm; after optimization, the pooling operation is merged into the activation-quantization operation to form a new activation-quantization-pooling operation.

The reconfigurable activation quantization pooling system for the low-bit-width convolutional neural network is further designed in that the storage units support ping-pong operation, one part of the storage units store data transmitted by the convolutional layer, and the other part of the storage units store data required by the reconfigurable activation quantization pooling processing unit.

The reconfigurable activation quantization pooling system facing the low-bit-width convolutional neural network is further designed in that the reconfigurable activation quantization pooling processing unit comprises three stage units, namely a first stage unit, a second stage unit and a third stage unit, each stage unit comprises a comparator, a gate and a register, two inputs of the comparator in the first stage unit are external image input data and a threshold 3, two inputs of the comparator in the second stage unit are data output from the first stage unit and a threshold 2, and two inputs of the comparator in the third stage unit are data output from the second stage unit and a threshold 1.

The invention has the following advantages:

the reconfigurable activation quantization pooling system for the low bit width convolutional neural network mainly aims at the characteristics of the low bit width convolutional neural network to realize the software and hardware optimization of various types of activation quantization pooling; the design method has the characteristics of high flexibility, low calculation complexity, small area, low power consumption and the like.

Drawings

Fig. 1 is a block schematic diagram of a reconfigurable activation quantization pooling system oriented to a low bit width convolutional neural network.

Fig. 2 is a schematic diagram of a reconfigurable active quantization pooling processing unit.

Fig. 3 is a schematic diagram of a reconfigurable active quantization pooling processing unit configuration.

Fig. 4 is a schematic diagram of the system operating in the active quantization mode of operation.

Fig. 5 is a schematic diagram of the system operating in a quantization pooling mode of operation.

Detailed Description

The following describes the present invention in detail with reference to the accompanying drawings.

As shown in fig. 1, the reconfigurable activation quantization pooling system for the low bit width convolutional neural network of the present embodiment includes a plurality of reconfigurable activation quantization pooling processing units, a storage unit controller, and a storage unit; the reconfigurable activation quantization pooling unit is used for executing activation, quantization and pooling operations; the storage unit controller is used for controlling data transmission of the reconfigurable activation quantization pooling unit and the storage unit under different configurations; the storage unit is used for temporarily storing convolution layer result data required in the pooling operation.

In fig. 1, the dotted line indicates the data flow direction in the active-quantization operating mode, and the data is transmitted from the convolution processing unit to the reconfigurable active quantization pooling processing unit, and after being processed, the result data is directly output; the solid line represents the data flow direction in the active-quantization-pooling working mode, the convolution layer result data is transmitted to the module and stored in the storage unit firstly, and is transmitted to the reconfigurable active-quantization pooling processing unit to be processed under the control of the storage unit controller, and the processing result is still stored back to the storage unit; the storage unit supports ping-pong operation to ensure that the execution process is not interrupted.

The specific application method of the design method is described below with reference to a specific low bit width convolutional neural network. The bit width of the low bit width convolution neural network image input data is 2 bits, and the weight is 1 bit; the activation function and the quantization function are respectively as follows:

x_o＝min(abs(x_i)，1) (4)

and pooled kernel size was 2x2, as follows:

x_o(i，j)＝max(x(2i，2j)，x(2i，2j+1)，x(2i+1，2j)，x(2i+1，2j+1)) (6)

the system firstly optimizes a software algorithm, and k is known to be 2 according to parameters of a specific low bit width convolutional neural network. According to the analysis, the output range of the activation function is obtained to be [0, 1], and the input range of the quantization function is the output range of the activation function; the threshold values of the quantization function are 1/6, 1/2 and 5/6, and after the quantization is carried out by the quantization function, the output of the quantization function falls on four values of 0, 1/3, 2/3 and 1; the activation and quantization functions may be replaced by a series of comparisons, with the quantized value taking 1 if the input is greater than 5/6; if the input is greater than 1/2 and less than or equal to 5/6, the quantized value is 2/3; if the input is greater than 1/6 and less than or equal to 1/2, the quantized value is 1/3; if x is equal to or less than 1/6, the quantization value is 0.

The system carries out optimization design on a hardware part, and because the bit width of the image input data of the network is 2 bits, the number of stages of the multi-stage unit pipeline processing architecture is 3, as shown in FIG. 2; each stage unit comprises a comparator, a plurality of gates and a register. Two inputs of the comparator in the stage unit 1 are external image input data and the threshold 3, two inputs of the comparator in the stage unit 2 are data output from the stage unit 1 and the threshold 2, and two inputs of the comparator in the stage unit 3 are data output from the stage unit 2 and the threshold 1. The configuration word in fig. 3 is used to configure the operation mode of the unit, the operation mode of the unit is active-quantization-pooling when the configuration word is 1, and the unit operates in active-quantization mode when the configuration word is 0, and the specific value configuration is shown in fig. 3.

The unit in the active-quantization-pooling mode of operation performs in detail the procedure, see fig. 4. When the operand above the comparator is larger than the operand below the comparator, the comparator outputs 1, otherwise 0 is output; let (a, b, c, d) be four pixel values of a 2x2 sub-region of the image, and the relationship between the 3 thresholds and these four values is: a > threshold 3 > b > threshold 2> c > threshold 1> d; at the time 1, the pixel value a is compared with the threshold value 3, the comparison 1 is set to 1 and is stored in the enable 1, and the quantized value 4 is the quantized result of a and is stored in the register 1; in the time 2, the pixel b enters the phase unit 1, because the enable 1 is already set to 1 at the previous time, the comparison 1 will be kept to 1 no matter the magnitude relation between b and the threshold 3, so the value in the register 1 does not change, meanwhile, the quantized value 4 in the register 1 is transmitted into the phase unit 2 and compared with the threshold 2, although the threshold 2 is greater than the threshold 1, the strobe 22 is controlled by the enable 1 of the slave phase unit 1 and still keeps to 0, and the number stored in the register 2 is also the comparison result threshold 3 of the previous phase unit; by analogy, when the output enable signal is set high, the threshold 3 stored in the register 3 will be output as a result of the sub-region pooling operation; the gray background in fig. 4 represents the phase unit that is turned off.

The unit in the active-quantization mode of operation performs the process in detail, see fig. 5. After the configuration word is set to 0, the comparison signal is not influenced by the enable signal at the last moment of the stage unit, but is still controlled by the enable signal of the last stage unit; because it requires the output of quantized numbers for each input, and does not require turning off smaller numbers for subsequent inputs in the sub-region, as compared to the active-quantize-pooling mode.

If the input is larger than the threshold value in a certain stage unit, the enable signal is set to be 1 so as to switch off the subsequent operation from the vertical direction and the horizontal direction, the vertical direction represents the rest image input data, and the horizontal direction represents the operation in the rest stage unit for processing the current image input data.

The system is optimally designed from two aspects of software and hardware, is based on a multi-stage unit flow processing architecture, is supported by a reconfigurable technology, and is guided by a stage turn-off low-power-consumption technology, so that the power consumption, the area and the flexibility of the activation quantization pooling module can be reduced.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any person skilled in the art may modify or modify the technical details disclosed above into equivalent embodiments with equivalent variations. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims

1. A reconfigurable activation quantization pooling system for a low bit width convolutional neural network is characterized in that: the system receives convolutional layer result data, comprising:

the storage unit is used for temporarily storing convolution layer result data required in the pooling operation;

the activation function in the reconfigurable activation quantization pooling processing unit is as formula (1),

x_a＝min(abs(x_c),1) (1)

wherein x is_cRepresenting the data after convolution processing, x_aRepresenting an activation value after activation;

wherein k represents a quantized bit width, x represents a quantized value,

and the corresponding pooled kernel size is 2x2, as in formula (3):

x_o(i，j)＝max(x(2i，2j),x(2i，2j+1)，x(2i+1，2j)，x(2i+1，2j+1)) (3)

wherein i, j are respectively expressed in a single passCoordinate position, x, in the input image_oIndicating the pooled value after pooling.

2. The low bit width convolutional neural network-oriented reconfigurable activation quantization pooling system of claim 1, wherein: under the active-quantization working mode, the data of the reconfigurable active quantization pooling processing unit is transmitted to the reconfigurable active quantization pooling processing unit from the convolution processing unit, and after being processed, the result data is directly output; and receiving the convolution layer result data in an activation-quantization-pooling working mode, firstly storing the convolution layer result data into a storage unit, transmitting the convolution layer result data into a reconfigurable activation-quantization pooling processing unit for processing under the control of a storage unit controller, and still storing the processing result into the storage unit.

3. The low bit width convolutional neural network-oriented reconfigurable activation quantization pooling system of claim 1, wherein: the workflow of the system comprises the following steps:

firstly, determining a working mode; if the working mode is activation-quantization, determining a series of characteristics or parameters of an activation function and a quantization method by analyzing the activation function and the quantization method of different low-bit convolutional neural networks; then determining the cross redundancy part of the output range of the activation function and the quantization method to realize simplification;

4. The low bit width convolutional neural network-oriented reconfigurable activation quantization pooling system of claim 3, wherein: the storage units support ping-pong operation, one part of the storage units stores data transmitted by the convolutional layers, and the other part of the storage units stores data required by the reconfigurable activation quantization pooling processing unit.

5. The low bit width convolutional neural network-oriented reconfigurable activation quantization pooling system of claim 1, wherein: the reconfigurable activation quantization pooling processing unit comprises three stage units, namely a first stage unit, a second stage unit and a third stage unit, wherein each stage unit comprises a comparator, a gate and a register, two inputs of the comparator in the first stage unit are external image input data and a threshold 3, two inputs of the comparator in the second stage unit are data output from the first stage unit and a threshold 2, and two inputs of the comparator in the third stage unit are data output from the second stage unit and a threshold 1.