CN109389212B - Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network - Google Patents
Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network Download PDFInfo
- Publication number
- CN109389212B CN109389212B CN201811646433.9A CN201811646433A CN109389212B CN 109389212 B CN109389212 B CN 109389212B CN 201811646433 A CN201811646433 A CN 201811646433A CN 109389212 B CN109389212 B CN 109389212B
- Authority
- CN
- China
- Prior art keywords
- quantization
- activation
- pooling
- reconfigurable
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Image Processing (AREA)
- Compression Of Band Width Or Redundancy In Fax (AREA)
Abstract
The reconfigurable activation quantization pooling system facing the low bit width convolutional neural network comprises a plurality of reconfigurable activation quantization pooling processing units, a plurality of mapping units and a plurality of mapping units, wherein the reconfigurable activation quantization pooling processing units are used for executing activation, quantization and pooling operations and executing reconfigurable operations of a working mode activation-quantization working mode or an activation-quantization-pooling working mode; the storage unit controller is used for controlling data transmission of the reconfigurable activation quantization pooling unit and the storage unit under different configurations; and the storage unit is used for temporarily storing convolution layer result data required in the pooling operation. The software optimization design can reduce redundant calculation without changing the original function by simplifying a plurality of steps of activation, quantization and the like of the low bit width convolution neural network into one step. Has the advantages that: the three steps of activation, quantization and pooling are mapped on the same hardware unit in a reconfigurable mode, so that the hardware resource area is reduced; the method for collaborative optimization of software and hardware has the characteristics of small area, low power consumption and high flexibility.
Description
Technical Field
The invention belongs to the field of artificial intelligence algorithm hardware acceleration, and particularly relates to a reconfigurable activation quantization pooling system for a low bit width convolutional neural network.
Background
The low-bit-width convolutional neural network is generally expressed as a 4-bit and below quantization convolutional neural network, which is different from the conventional convolutional neural network, and its weight and image input data can be expressed by only a few bits, such as a binarization network, a ternary network and other low-bit-width quantization neural networks. The weights of the binarization network and the image input data can be represented by 0 or 1 only; the weight of the tri-valued network is only represented by 0 or 1, and the image input data is characterized as-1, 0 or 1; in many other low bit width quantized neural networks, a certain number is often expressed in some bit combination, e.g., a 2bit "01" indicates a value of 0.5.
In the low-bit-width convolutional neural network, except for a convolutional layer, an active layer and a pooling layer included in the traditional network, a quantization operation is specially designed, and the generated image output data is quantized again to the originally set bit width.
In recent years, the hardware design for such a low bit width convolutional neural network is increasing, and the convolutional layer generally has a processing flow of sequentially performing the following operations: convolution, batch standardization, activation, quantification and pooling, wherein certain convolution layers have no pooling operation; the full connectivity layer generally processes the flow to perform the following operations in order: full concatenation, batch normalization, activation and quantification. However, such serial operations may reduce processing efficiency, bring additional hardware overhead, and may not well meet the requirements of practical applications.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a reconfigurable activation quantization pooling system facing a low-bit-width convolutional neural network, which effectively improves the flexibility of activation, quantization and pooling operations, reduces power consumption and hardware overhead, and is specifically realized by the following technical scheme:
the reconfigurable activation quantization pooling system for the low-bit-width convolutional neural network receives convolutional layer result data, and comprises:
the reconfigurable activation quantization pooling processing units are used for executing activation, quantization and pooling operations and executing reconfigurable operation of an activation-quantization working mode or an activation-quantization-pooling working mode;
the storage unit controller is used for controlling data transmission of the reconfigurable activation quantization pooling unit and the storage unit under different configurations;
and the storage unit is used for temporarily storing convolution layer result data required in the pooling operation.
The reconfigurable activation quantization pooling system for the low-bit-width convolutional neural network is further designed in that the data of the reconfigurable activation quantization pooling processing unit is transmitted to the reconfigurable activation quantization pooling processing unit from the convolution processing unit in an activation-quantization working mode, and the result data is directly output after the data is processed; and receiving the convolution layer result data in an activation-quantization-pooling working mode, firstly storing the convolution layer result data into a storage unit, transmitting the convolution layer result data into a reconfigurable activation-quantization pooling processing unit for processing under the control of a storage unit controller, and still storing the processing result back to the storage unit.
The reconfigurable activation quantization pooling system facing the low bit width convolutional neural network is further designed in that an activation function in the reconfigurable activation quantization pooling processing unit is as shown in formula (1),
xo=min(abs(xi),1) (1)
wherein x isiRepresenting the data after convolution processing, xoIndicating the activation value after activation.
The quantization function in the reconfigurable active quantization pooling processing unit is as shown in formula (2),
where k represents the bit width after quantization, where xiDenotes the activation value, xoRepresenting the quantized value after quantization. And the corresponding pooled kernel size is 2x2, as in formula (3):
xo(i,j)=max(x(2i,2j),x(2i,2j+1),x(2i+1,2j),x(2i+1,2j+1)) (3)
where i, j respectively denote the coordinate position in the single-channel input image, where xiRepresenting the quantized value, xoIndicating the pooled value after pooling.
The reconfigurable activation quantization pooling system for the low bit width convolutional neural network is further designed in that the working process of the system comprises the following steps:
firstly, determining a working mode; if the working mode is activation-quantization, determining a series of characteristics or parameters of an activation function and a quantization method by analyzing the activation function and the quantization method of different low-bit convolutional neural networks; then determining the cross redundancy part of the output range of the activation function and the quantization method, and simplifying the part;
if the working mode is activation-quantization-pooling, the size of a pooling kernel needs to be analyzed on the basis of the optimization of an activation-quantization algorithm; after optimization, the pooling operation is merged into the activation-quantization operation to form a new activation-quantization-pooling operation.
The reconfigurable activation quantization pooling system for the low-bit-width convolutional neural network is further designed in that the storage units support ping-pong operation, one part of the storage units store data transmitted by the convolutional layer, and the other part of the storage units store data required by the reconfigurable activation quantization pooling processing unit.
The reconfigurable activation quantization pooling system facing the low-bit-width convolutional neural network is further designed in that the reconfigurable activation quantization pooling processing unit comprises three stage units, namely a first stage unit, a second stage unit and a third stage unit, each stage unit comprises a comparator, a gate and a register, two inputs of the comparator in the first stage unit are external image input data and a threshold 3, two inputs of the comparator in the second stage unit are data output from the first stage unit and a threshold 2, and two inputs of the comparator in the third stage unit are data output from the second stage unit and a threshold 1.
The invention has the following advantages:
the reconfigurable activation quantization pooling system for the low bit width convolutional neural network mainly aims at the characteristics of the low bit width convolutional neural network to realize the software and hardware optimization of various types of activation quantization pooling; the design method has the characteristics of high flexibility, low calculation complexity, small area, low power consumption and the like.
Drawings
Fig. 1 is a block schematic diagram of a reconfigurable activation quantization pooling system oriented to a low bit width convolutional neural network.
Fig. 2 is a schematic diagram of a reconfigurable active quantization pooling processing unit.
Fig. 3 is a schematic diagram of a reconfigurable active quantization pooling processing unit configuration.
Fig. 4 is a schematic diagram of the system operating in the active quantization mode of operation.
Fig. 5 is a schematic diagram of the system operating in a quantization pooling mode of operation.
Detailed Description
The following describes the present invention in detail with reference to the accompanying drawings.
As shown in fig. 1, the reconfigurable activation quantization pooling system for the low bit width convolutional neural network of the present embodiment includes a plurality of reconfigurable activation quantization pooling processing units, a storage unit controller, and a storage unit; the reconfigurable activation quantization pooling unit is used for executing activation, quantization and pooling operations; the storage unit controller is used for controlling data transmission of the reconfigurable activation quantization pooling unit and the storage unit under different configurations; the storage unit is used for temporarily storing convolution layer result data required in the pooling operation.
In fig. 1, the dotted line indicates the data flow direction in the active-quantization operating mode, and the data is transmitted from the convolution processing unit to the reconfigurable active quantization pooling processing unit, and after being processed, the result data is directly output; the solid line represents the data flow direction in the active-quantization-pooling working mode, the convolution layer result data is transmitted to the module and stored in the storage unit firstly, and is transmitted to the reconfigurable active-quantization pooling processing unit to be processed under the control of the storage unit controller, and the processing result is still stored back to the storage unit; the storage unit supports ping-pong operation to ensure that the execution process is not interrupted.
The specific application method of the design method is described below with reference to a specific low bit width convolutional neural network. The bit width of the low bit width convolution neural network image input data is 2 bits, and the weight is 1 bit; the activation function and the quantization function are respectively as follows:
xo=min(abs(xi),1) (4)
and pooled kernel size was 2x2, as follows:
xo(i,j)=max(x(2i,2j),x(2i,2j+1),x(2i+1,2j),x(2i+1,2j+1)) (6)
the system firstly optimizes a software algorithm, and k is known to be 2 according to parameters of a specific low bit width convolutional neural network. According to the analysis, the output range of the activation function is obtained to be [0, 1], and the input range of the quantization function is the output range of the activation function; the threshold values of the quantization function are 1/6, 1/2 and 5/6, and after the quantization is carried out by the quantization function, the output of the quantization function falls on four values of 0, 1/3, 2/3 and 1; the activation and quantization functions may be replaced by a series of comparisons, with the quantized value taking 1 if the input is greater than 5/6; if the input is greater than 1/2 and less than or equal to 5/6, the quantized value is 2/3; if the input is greater than 1/6 and less than or equal to 1/2, the quantized value is 1/3; if x is equal to or less than 1/6, the quantization value is 0.
The system carries out optimization design on a hardware part, and because the bit width of the image input data of the network is 2 bits, the number of stages of the multi-stage unit pipeline processing architecture is 3, as shown in FIG. 2; each stage unit comprises a comparator, a plurality of gates and a register. Two inputs of the comparator in the stage unit 1 are external image input data and the threshold 3, two inputs of the comparator in the stage unit 2 are data output from the stage unit 1 and the threshold 2, and two inputs of the comparator in the stage unit 3 are data output from the stage unit 2 and the threshold 1. The configuration word in fig. 3 is used to configure the operation mode of the unit, the operation mode of the unit is active-quantization-pooling when the configuration word is 1, and the unit operates in active-quantization mode when the configuration word is 0, and the specific value configuration is shown in fig. 3.
The unit in the active-quantization-pooling mode of operation performs in detail the procedure, see fig. 4. When the operand above the comparator is larger than the operand below the comparator, the comparator outputs 1, otherwise 0 is output; let (a, b, c, d) be four pixel values of a 2x2 sub-region of the image, and the relationship between the 3 thresholds and these four values is: a > threshold 3 > b > threshold 2> c > threshold 1> d; at the time 1, the pixel value a is compared with the threshold value 3, the comparison 1 is set to 1 and is stored in the enable 1, and the quantized value 4 is the quantized result of a and is stored in the register 1; in the time 2, the pixel b enters the phase unit 1, because the enable 1 is already set to 1 at the previous time, the comparison 1 will be kept to 1 no matter the magnitude relation between b and the threshold 3, so the value in the register 1 does not change, meanwhile, the quantized value 4 in the register 1 is transmitted into the phase unit 2 and compared with the threshold 2, although the threshold 2 is greater than the threshold 1, the strobe 22 is controlled by the enable 1 of the slave phase unit 1 and still keeps to 0, and the number stored in the register 2 is also the comparison result threshold 3 of the previous phase unit; by analogy, when the output enable signal is set high, the threshold 3 stored in the register 3 will be output as a result of the sub-region pooling operation; the gray background in fig. 4 represents the phase unit that is turned off.
The unit in the active-quantization mode of operation performs the process in detail, see fig. 5. After the configuration word is set to 0, the comparison signal is not influenced by the enable signal at the last moment of the stage unit, but is still controlled by the enable signal of the last stage unit; because it requires the output of quantized numbers for each input, and does not require turning off smaller numbers for subsequent inputs in the sub-region, as compared to the active-quantize-pooling mode.
If the input is larger than the threshold value in a certain stage unit, the enable signal is set to be 1 so as to switch off the subsequent operation from the vertical direction and the horizontal direction, the vertical direction represents the rest image input data, and the horizontal direction represents the operation in the rest stage unit for processing the current image input data.
The system is optimally designed from two aspects of software and hardware, is based on a multi-stage unit flow processing architecture, is supported by a reconfigurable technology, and is guided by a stage turn-off low-power-consumption technology, so that the power consumption, the area and the flexibility of the activation quantization pooling module can be reduced.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any person skilled in the art may modify or modify the technical details disclosed above into equivalent embodiments with equivalent variations. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.
Claims (5)
1. A reconfigurable activation quantization pooling system for a low bit width convolutional neural network is characterized in that: the system receives convolutional layer result data, comprising:
the reconfigurable activation quantization pooling processing units are used for executing activation, quantization and pooling operations and executing reconfigurable operation of an activation-quantization working mode or an activation-quantization-pooling working mode;
the storage unit controller is used for controlling data transmission of the reconfigurable activation quantization pooling unit and the storage unit under different configurations;
the storage unit is used for temporarily storing convolution layer result data required in the pooling operation;
the activation function in the reconfigurable activation quantization pooling processing unit is as formula (1),
xa=min(abs(xc),1) (1)
wherein x iscRepresenting the data after convolution processing, xaRepresenting an activation value after activation;
the quantization function in the reconfigurable active quantization pooling processing unit is as shown in formula (2),
wherein k represents a quantized bit width, x represents a quantized value,
and the corresponding pooled kernel size is 2x2, as in formula (3):
xo(i,j)=max(x(2i,2j),x(2i,2j+1),x(2i+1,2j),x(2i+1,2j+1)) (3)
wherein i, j are respectively expressed in a single passCoordinate position, x, in the input imageoIndicating the pooled value after pooling.
2. The low bit width convolutional neural network-oriented reconfigurable activation quantization pooling system of claim 1, wherein: under the active-quantization working mode, the data of the reconfigurable active quantization pooling processing unit is transmitted to the reconfigurable active quantization pooling processing unit from the convolution processing unit, and after being processed, the result data is directly output; and receiving the convolution layer result data in an activation-quantization-pooling working mode, firstly storing the convolution layer result data into a storage unit, transmitting the convolution layer result data into a reconfigurable activation-quantization pooling processing unit for processing under the control of a storage unit controller, and still storing the processing result into the storage unit.
3. The low bit width convolutional neural network-oriented reconfigurable activation quantization pooling system of claim 1, wherein: the workflow of the system comprises the following steps:
firstly, determining a working mode; if the working mode is activation-quantization, determining a series of characteristics or parameters of an activation function and a quantization method by analyzing the activation function and the quantization method of different low-bit convolutional neural networks; then determining the cross redundancy part of the output range of the activation function and the quantization method to realize simplification;
if the working mode is activation-quantization-pooling, the size of a pooling kernel needs to be analyzed on the basis of the optimization of an activation-quantization algorithm; after optimization, the pooling operation is merged into the activation-quantization operation to form a new activation-quantization-pooling operation.
4. The low bit width convolutional neural network-oriented reconfigurable activation quantization pooling system of claim 3, wherein: the storage units support ping-pong operation, one part of the storage units stores data transmitted by the convolutional layers, and the other part of the storage units stores data required by the reconfigurable activation quantization pooling processing unit.
5. The low bit width convolutional neural network-oriented reconfigurable activation quantization pooling system of claim 1, wherein: the reconfigurable activation quantization pooling processing unit comprises three stage units, namely a first stage unit, a second stage unit and a third stage unit, wherein each stage unit comprises a comparator, a gate and a register, two inputs of the comparator in the first stage unit are external image input data and a threshold 3, two inputs of the comparator in the second stage unit are data output from the first stage unit and a threshold 2, and two inputs of the comparator in the third stage unit are data output from the second stage unit and a threshold 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811646433.9A CN109389212B (en) | 2018-12-30 | 2018-12-30 | Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811646433.9A CN109389212B (en) | 2018-12-30 | 2018-12-30 | Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109389212A CN109389212A (en) | 2019-02-26 |
CN109389212B true CN109389212B (en) | 2022-03-25 |
Family
ID=65430886
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811646433.9A Active CN109389212B (en) | 2018-12-30 | 2018-12-30 | Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109389212B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3907662A4 (en) * | 2019-02-27 | 2022-01-19 | Huawei Technologies Co., Ltd. | Method and apparatus for processing neural network model |
CN111767204B (en) * | 2019-04-02 | 2024-05-28 | 杭州海康威视数字技术股份有限公司 | Spill risk detection method, device and equipment |
CN110222815B (en) * | 2019-04-26 | 2021-09-07 | 上海酷芯微电子有限公司 | Configurable activation function device and method suitable for deep learning hardware accelerator |
CN110390385B (en) * | 2019-06-28 | 2021-09-28 | 东南大学 | BNRP-based configurable parallel general convolutional neural network accelerator |
CN110718211B (en) * | 2019-09-26 | 2021-12-21 | 东南大学 | Keyword recognition system based on hybrid compressed convolutional neural network |
CN113762496B (en) * | 2020-06-04 | 2024-05-03 | 合肥君正科技有限公司 | Method for reducing low-bit convolutional neural network reasoning operation complexity |
EP4375875A4 (en) * | 2021-07-30 | 2024-09-25 | Huawei Tech Co Ltd | Neural network post-processing method and apparatus, chip, electronic device, and storage medium |
CN114169513B (en) * | 2022-02-11 | 2022-05-24 | 深圳比特微电子科技有限公司 | Neural network quantization method and device, storage medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017124645A1 (en) * | 2016-01-20 | 2017-07-27 | 北京中科寒武纪科技有限公司 | Apparatus for processing floating point number |
CN108364061A (en) * | 2018-02-13 | 2018-08-03 | 北京旷视科技有限公司 | Arithmetic unit, operation execute equipment and operation executes method |
CN108510067A (en) * | 2018-04-11 | 2018-09-07 | 西安电子科技大学 | The convolutional neural networks quantization method realized based on engineering |
CN108647779A (en) * | 2018-04-11 | 2018-10-12 | 复旦大学 | A kind of low-bit width convolutional neural networks Reconfigurable Computation unit |
-
2018
- 2018-12-30 CN CN201811646433.9A patent/CN109389212B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017124645A1 (en) * | 2016-01-20 | 2017-07-27 | 北京中科寒武纪科技有限公司 | Apparatus for processing floating point number |
CN108364061A (en) * | 2018-02-13 | 2018-08-03 | 北京旷视科技有限公司 | Arithmetic unit, operation execute equipment and operation executes method |
CN108510067A (en) * | 2018-04-11 | 2018-09-07 | 西安电子科技大学 | The convolutional neural networks quantization method realized based on engineering |
CN108647779A (en) * | 2018-04-11 | 2018-10-12 | 复旦大学 | A kind of low-bit width convolutional neural networks Reconfigurable Computation unit |
Also Published As
Publication number | Publication date |
---|---|
CN109389212A (en) | 2019-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109389212B (en) | Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network | |
WO2020258529A1 (en) | Bnrp-based configurable parallel general convolutional neural network accelerator | |
US20180330239A1 (en) | Apparatus and method for compression coding for artificial neural network | |
WO2021036905A1 (en) | Data processing method and apparatus, computer equipment, and storage medium | |
CN109102065B (en) | Convolutional neural network accelerator based on PSoC | |
WO2021129445A1 (en) | Data compression method and computing device | |
US20210209451A1 (en) | Artificial intelligence integrated circuit | |
US20210110269A1 (en) | Neural network dense layer sparsification and matrix compression | |
CN111583094B (en) | Image pulse coding method and system based on FPGA | |
CN110874627B (en) | Data processing method, data processing device and computer readable medium | |
US11263530B2 (en) | Apparatus for operations at maxout layer of neural networks | |
CN117751366A (en) | Neural network accelerator and data processing method thereof | |
CN111831358A (en) | Weight precision configuration method, device, equipment and storage medium | |
CN112884146A (en) | Method and system for training model based on data quantization and hardware acceleration | |
CN108647780B (en) | Reconfigurable pooling operation module structure facing neural network and implementation method thereof | |
CN108830379B (en) | Neural morphology processor based on parameter quantification sharing | |
WO2023109748A1 (en) | Neural network adjustment method and corresponding apparatus | |
CN111898752B (en) | Apparatus and method for performing LSTM neural network operations | |
CN117319373A (en) | Data transmission method, device, electronic equipment and computer readable storage medium | |
CN115291813A (en) | Data storage method and device, data reading method and device, and equipment | |
US20030018672A1 (en) | System and method for fast median filters, with a predetermined number of elements, in processors | |
Liu et al. | Tcp-net: Minimizing operation counts of binarized neural network inference | |
CN111381875B (en) | Data comparator, data processing method, chip and electronic equipment | |
CN113222121A (en) | Data processing method, device and equipment | |
CN116113926A (en) | Neural network circuit and control method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |