CN109389212B - Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network - Google Patents

Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network Download PDF

Info

Publication number
CN109389212B
CN109389212B CN201811646433.9A CN201811646433A CN109389212B CN 109389212 B CN109389212 B CN 109389212B CN 201811646433 A CN201811646433 A CN 201811646433A CN 109389212 B CN109389212 B CN 109389212B
Authority
CN
China
Prior art keywords
quantization
activation
pooling
reconfigurable
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811646433.9A
Other languages
Chinese (zh)
Other versions
CN109389212A (en
Inventor
李丽
陈沁雨
傅玉祥
陈铠
何书专
陈辉
程开丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201811646433.9A priority Critical patent/CN109389212B/en
Publication of CN109389212A publication Critical patent/CN109389212A/en
Application granted granted Critical
Publication of CN109389212B publication Critical patent/CN109389212B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)
  • Image Processing (AREA)

Abstract

The reconfigurable activation quantization pooling system facing the low bit width convolutional neural network comprises a plurality of reconfigurable activation quantization pooling processing units, a plurality of mapping units and a plurality of mapping units, wherein the reconfigurable activation quantization pooling processing units are used for executing activation, quantization and pooling operations and executing reconfigurable operations of a working mode activation-quantization working mode or an activation-quantization-pooling working mode; the storage unit controller is used for controlling data transmission of the reconfigurable activation quantization pooling unit and the storage unit under different configurations; and the storage unit is used for temporarily storing convolution layer result data required in the pooling operation. The software optimization design can reduce redundant calculation without changing the original function by simplifying a plurality of steps of activation, quantization and the like of the low bit width convolution neural network into one step. Has the advantages that: the three steps of activation, quantization and pooling are mapped on the same hardware unit in a reconfigurable mode, so that the hardware resource area is reduced; the method for collaborative optimization of software and hardware has the characteristics of small area, low power consumption and high flexibility.

Description

Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network
Technical Field
The invention belongs to the field of artificial intelligence algorithm hardware acceleration, and particularly relates to a reconfigurable activation quantization pooling system for a low bit width convolutional neural network.
Background
The low-bit-width convolutional neural network is generally expressed as a 4-bit and below quantization convolutional neural network, which is different from the conventional convolutional neural network, and its weight and image input data can be expressed by only a few bits, such as a binarization network, a ternary network and other low-bit-width quantization neural networks. The weights of the binarization network and the image input data can be represented by 0 or 1 only; the weight of the tri-valued network is only represented by 0 or 1, and the image input data is characterized as-1, 0 or 1; in many other low bit width quantized neural networks, a certain number is often expressed in some bit combination, e.g., a 2bit "01" indicates a value of 0.5.
In the low-bit-width convolutional neural network, except for a convolutional layer, an active layer and a pooling layer included in the traditional network, a quantization operation is specially designed, and the generated image output data is quantized again to the originally set bit width.
In recent years, the hardware design for such a low bit width convolutional neural network is increasing, and the convolutional layer generally has a processing flow of sequentially performing the following operations: convolution, batch standardization, activation, quantification and pooling, wherein certain convolution layers have no pooling operation; the full connectivity layer generally processes the flow to perform the following operations in order: full concatenation, batch normalization, activation and quantification. However, such serial operations may reduce processing efficiency, bring additional hardware overhead, and may not well meet the requirements of practical applications.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a reconfigurable activation quantization pooling system facing a low-bit-width convolutional neural network, which effectively improves the flexibility of activation, quantization and pooling operations, reduces power consumption and hardware overhead, and is specifically realized by the following technical scheme:
the reconfigurable activation quantization pooling system for the low-bit-width convolutional neural network receives convolutional layer result data, and comprises:
the reconfigurable activation quantization pooling processing units are used for executing activation, quantization and pooling operations and executing reconfigurable operation of an activation-quantization working mode or an activation-quantization-pooling working mode;
the storage unit controller is used for controlling data transmission of the reconfigurable activation quantization pooling unit and the storage unit under different configurations;
and the storage unit is used for temporarily storing convolution layer result data required in the pooling operation.
The reconfigurable activation quantization pooling system for the low-bit-width convolutional neural network is further designed in that the data of the reconfigurable activation quantization pooling processing unit is transmitted to the reconfigurable activation quantization pooling processing unit from the convolution processing unit in an activation-quantization working mode, and the result data is directly output after the data is processed; and receiving the convolution layer result data in an activation-quantization-pooling working mode, firstly storing the convolution layer result data into a storage unit, transmitting the convolution layer result data into a reconfigurable activation-quantization pooling processing unit for processing under the control of a storage unit controller, and still storing the processing result back to the storage unit.
The reconfigurable activation quantization pooling system facing the low bit width convolutional neural network is further designed in that an activation function in the reconfigurable activation quantization pooling processing unit is as shown in formula (1),
xo=min(abs(xi),1) (1)
wherein x isiRepresenting the data after convolution processing, xoIndicating the activation value after activation.
The quantization function in the reconfigurable active quantization pooling processing unit is as shown in formula (2),
Figure GDA0003450225860000021
where k represents the bit width after quantization, where xiDenotes the activation value, xoRepresenting the quantized value after quantization. And the corresponding pooled kernel size is 2x2, as in formula (3):
xo(i,j)=max(x(2i,2j),x(2i,2j+1),x(2i+1,2j),x(2i+1,2j+1)) (3)
where i, j respectively denote the coordinate position in the single-channel input image, where xiRepresenting the quantized value, xoIndicating the pooled value after pooling.
The reconfigurable activation quantization pooling system for the low bit width convolutional neural network is further designed in that the working process of the system comprises the following steps:
firstly, determining a working mode; if the working mode is activation-quantization, determining a series of characteristics or parameters of an activation function and a quantization method by analyzing the activation function and the quantization method of different low-bit convolutional neural networks; then determining the cross redundancy part of the output range of the activation function and the quantization method, and simplifying the part;
if the working mode is activation-quantization-pooling, the size of a pooling kernel needs to be analyzed on the basis of the optimization of an activation-quantization algorithm; after optimization, the pooling operation is merged into the activation-quantization operation to form a new activation-quantization-pooling operation.
The reconfigurable activation quantization pooling system for the low-bit-width convolutional neural network is further designed in that the storage units support ping-pong operation, one part of the storage units store data transmitted by the convolutional layer, and the other part of the storage units store data required by the reconfigurable activation quantization pooling processing unit.
The reconfigurable activation quantization pooling system facing the low-bit-width convolutional neural network is further designed in that the reconfigurable activation quantization pooling processing unit comprises three stage units, namely a first stage unit, a second stage unit and a third stage unit, each stage unit comprises a comparator, a gate and a register, two inputs of the comparator in the first stage unit are external image input data and a threshold 3, two inputs of the comparator in the second stage unit are data output from the first stage unit and a threshold 2, and two inputs of the comparator in the third stage unit are data output from the second stage unit and a threshold 1.
The invention has the following advantages:
the reconfigurable activation quantization pooling system for the low bit width convolutional neural network mainly aims at the characteristics of the low bit width convolutional neural network to realize the software and hardware optimization of various types of activation quantization pooling; the design method has the characteristics of high flexibility, low calculation complexity, small area, low power consumption and the like.
Drawings
Fig. 1 is a block schematic diagram of a reconfigurable activation quantization pooling system oriented to a low bit width convolutional neural network.
Fig. 2 is a schematic diagram of a reconfigurable active quantization pooling processing unit.
Fig. 3 is a schematic diagram of a reconfigurable active quantization pooling processing unit configuration.
Fig. 4 is a schematic diagram of the system operating in the active quantization mode of operation.
Fig. 5 is a schematic diagram of the system operating in a quantization pooling mode of operation.
Detailed Description
The following describes the present invention in detail with reference to the accompanying drawings.
As shown in fig. 1, the reconfigurable activation quantization pooling system for the low bit width convolutional neural network of the present embodiment includes a plurality of reconfigurable activation quantization pooling processing units, a storage unit controller, and a storage unit; the reconfigurable activation quantization pooling unit is used for executing activation, quantization and pooling operations; the storage unit controller is used for controlling data transmission of the reconfigurable activation quantization pooling unit and the storage unit under different configurations; the storage unit is used for temporarily storing convolution layer result data required in the pooling operation.
In fig. 1, the dotted line indicates the data flow direction in the active-quantization operating mode, and the data is transmitted from the convolution processing unit to the reconfigurable active quantization pooling processing unit, and after being processed, the result data is directly output; the solid line represents the data flow direction in the active-quantization-pooling working mode, the convolution layer result data is transmitted to the module and stored in the storage unit firstly, and is transmitted to the reconfigurable active-quantization pooling processing unit to be processed under the control of the storage unit controller, and the processing result is still stored back to the storage unit; the storage unit supports ping-pong operation to ensure that the execution process is not interrupted.
The specific application method of the design method is described below with reference to a specific low bit width convolutional neural network. The bit width of the low bit width convolution neural network image input data is 2 bits, and the weight is 1 bit; the activation function and the quantization function are respectively as follows:
xo=min(abs(xi),1) (4)
Figure GDA0003450225860000041
and pooled kernel size was 2x2, as follows:
xo(i,j)=max(x(2i,2j),x(2i,2j+1),x(2i+1,2j),x(2i+1,2j+1)) (6)
the system firstly optimizes a software algorithm, and k is known to be 2 according to parameters of a specific low bit width convolutional neural network. According to the analysis, the output range of the activation function is obtained to be [0, 1], and the input range of the quantization function is the output range of the activation function; the threshold values of the quantization function are 1/6, 1/2 and 5/6, and after the quantization is carried out by the quantization function, the output of the quantization function falls on four values of 0, 1/3, 2/3 and 1; the activation and quantization functions may be replaced by a series of comparisons, with the quantized value taking 1 if the input is greater than 5/6; if the input is greater than 1/2 and less than or equal to 5/6, the quantized value is 2/3; if the input is greater than 1/6 and less than or equal to 1/2, the quantized value is 1/3; if x is equal to or less than 1/6, the quantization value is 0.
The system carries out optimization design on a hardware part, and because the bit width of the image input data of the network is 2 bits, the number of stages of the multi-stage unit pipeline processing architecture is 3, as shown in FIG. 2; each stage unit comprises a comparator, a plurality of gates and a register. Two inputs of the comparator in the stage unit 1 are external image input data and the threshold 3, two inputs of the comparator in the stage unit 2 are data output from the stage unit 1 and the threshold 2, and two inputs of the comparator in the stage unit 3 are data output from the stage unit 2 and the threshold 1. The configuration word in fig. 3 is used to configure the operation mode of the unit, the operation mode of the unit is active-quantization-pooling when the configuration word is 1, and the unit operates in active-quantization mode when the configuration word is 0, and the specific value configuration is shown in fig. 3.
The unit in the active-quantization-pooling mode of operation performs in detail the procedure, see fig. 4. When the operand above the comparator is larger than the operand below the comparator, the comparator outputs 1, otherwise 0 is output; let (a, b, c, d) be four pixel values of a 2x2 sub-region of the image, and the relationship between the 3 thresholds and these four values is: a > threshold 3 > b > threshold 2> c > threshold 1> d; at the time 1, the pixel value a is compared with the threshold value 3, the comparison 1 is set to 1 and is stored in the enable 1, and the quantized value 4 is the quantized result of a and is stored in the register 1; in the time 2, the pixel b enters the phase unit 1, because the enable 1 is already set to 1 at the previous time, the comparison 1 will be kept to 1 no matter the magnitude relation between b and the threshold 3, so the value in the register 1 does not change, meanwhile, the quantized value 4 in the register 1 is transmitted into the phase unit 2 and compared with the threshold 2, although the threshold 2 is greater than the threshold 1, the strobe 22 is controlled by the enable 1 of the slave phase unit 1 and still keeps to 0, and the number stored in the register 2 is also the comparison result threshold 3 of the previous phase unit; by analogy, when the output enable signal is set high, the threshold 3 stored in the register 3 will be output as a result of the sub-region pooling operation; the gray background in fig. 4 represents the phase unit that is turned off.
The unit in the active-quantization mode of operation performs the process in detail, see fig. 5. After the configuration word is set to 0, the comparison signal is not influenced by the enable signal at the last moment of the stage unit, but is still controlled by the enable signal of the last stage unit; because it requires the output of quantized numbers for each input, and does not require turning off smaller numbers for subsequent inputs in the sub-region, as compared to the active-quantize-pooling mode.
If the input is larger than the threshold value in a certain stage unit, the enable signal is set to be 1 so as to switch off the subsequent operation from the vertical direction and the horizontal direction, the vertical direction represents the rest image input data, and the horizontal direction represents the operation in the rest stage unit for processing the current image input data.
The system is optimally designed from two aspects of software and hardware, is based on a multi-stage unit flow processing architecture, is supported by a reconfigurable technology, and is guided by a stage turn-off low-power-consumption technology, so that the power consumption, the area and the flexibility of the activation quantization pooling module can be reduced.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any person skilled in the art may modify or modify the technical details disclosed above into equivalent embodiments with equivalent variations. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims (5)

1. A reconfigurable activation quantization pooling system for a low bit width convolutional neural network is characterized in that: the system receives convolutional layer result data, comprising:
the reconfigurable activation quantization pooling processing units are used for executing activation, quantization and pooling operations and executing reconfigurable operation of an activation-quantization working mode or an activation-quantization-pooling working mode;
the storage unit controller is used for controlling data transmission of the reconfigurable activation quantization pooling unit and the storage unit under different configurations;
the storage unit is used for temporarily storing convolution layer result data required in the pooling operation;
the activation function in the reconfigurable activation quantization pooling processing unit is as formula (1),
xa=min(abs(xc),1) (1)
wherein x iscRepresenting the data after convolution processing, xaRepresenting an activation value after activation;
the quantization function in the reconfigurable active quantization pooling processing unit is as shown in formula (2),
Figure FDA0003450225850000011
wherein k represents a quantized bit width, x represents a quantized value,
and the corresponding pooled kernel size is 2x2, as in formula (3):
xo(i,j)=max(x(2i,2j),x(2i,2j+1),x(2i+1,2j),x(2i+1,2j+1)) (3)
wherein i, j are respectively expressed in a single passCoordinate position, x, in the input imageoIndicating the pooled value after pooling.
2. The low bit width convolutional neural network-oriented reconfigurable activation quantization pooling system of claim 1, wherein: under the active-quantization working mode, the data of the reconfigurable active quantization pooling processing unit is transmitted to the reconfigurable active quantization pooling processing unit from the convolution processing unit, and after being processed, the result data is directly output; and receiving the convolution layer result data in an activation-quantization-pooling working mode, firstly storing the convolution layer result data into a storage unit, transmitting the convolution layer result data into a reconfigurable activation-quantization pooling processing unit for processing under the control of a storage unit controller, and still storing the processing result into the storage unit.
3. The low bit width convolutional neural network-oriented reconfigurable activation quantization pooling system of claim 1, wherein: the workflow of the system comprises the following steps:
firstly, determining a working mode; if the working mode is activation-quantization, determining a series of characteristics or parameters of an activation function and a quantization method by analyzing the activation function and the quantization method of different low-bit convolutional neural networks; then determining the cross redundancy part of the output range of the activation function and the quantization method to realize simplification;
if the working mode is activation-quantization-pooling, the size of a pooling kernel needs to be analyzed on the basis of the optimization of an activation-quantization algorithm; after optimization, the pooling operation is merged into the activation-quantization operation to form a new activation-quantization-pooling operation.
4. The low bit width convolutional neural network-oriented reconfigurable activation quantization pooling system of claim 3, wherein: the storage units support ping-pong operation, one part of the storage units stores data transmitted by the convolutional layers, and the other part of the storage units stores data required by the reconfigurable activation quantization pooling processing unit.
5. The low bit width convolutional neural network-oriented reconfigurable activation quantization pooling system of claim 1, wherein: the reconfigurable activation quantization pooling processing unit comprises three stage units, namely a first stage unit, a second stage unit and a third stage unit, wherein each stage unit comprises a comparator, a gate and a register, two inputs of the comparator in the first stage unit are external image input data and a threshold 3, two inputs of the comparator in the second stage unit are data output from the first stage unit and a threshold 2, and two inputs of the comparator in the third stage unit are data output from the second stage unit and a threshold 1.
CN201811646433.9A 2018-12-30 2018-12-30 Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network Active CN109389212B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811646433.9A CN109389212B (en) 2018-12-30 2018-12-30 Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811646433.9A CN109389212B (en) 2018-12-30 2018-12-30 Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network

Publications (2)

Publication Number Publication Date
CN109389212A CN109389212A (en) 2019-02-26
CN109389212B true CN109389212B (en) 2022-03-25

Family

ID=65430886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811646433.9A Active CN109389212B (en) 2018-12-30 2018-12-30 Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network

Country Status (1)

Country Link
CN (1) CN109389212B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220121936A1 (en) * 2019-02-27 2022-04-21 Huawei Technologies Co., Ltd. Neural Network Model Processing Method and Apparatus
CN110222815B (en) * 2019-04-26 2021-09-07 上海酷芯微电子有限公司 Configurable activation function device and method suitable for deep learning hardware accelerator
CN110390385B (en) * 2019-06-28 2021-09-28 东南大学 BNRP-based configurable parallel general convolutional neural network accelerator
CN110718211B (en) * 2019-09-26 2021-12-21 东南大学 Keyword recognition system based on hybrid compressed convolutional neural network
CN113762496B (en) * 2020-06-04 2024-05-03 合肥君正科技有限公司 Method for reducing low-bit convolutional neural network reasoning operation complexity
WO2023004800A1 (en) * 2021-07-30 2023-02-02 华为技术有限公司 Neural network post-processing method and apparatus, chip, electronic device, and storage medium
CN114169513B (en) * 2022-02-11 2022-05-24 深圳比特微电子科技有限公司 Neural network quantization method and device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017124645A1 (en) * 2016-01-20 2017-07-27 北京中科寒武纪科技有限公司 Apparatus for processing floating point number
CN108364061A (en) * 2018-02-13 2018-08-03 北京旷视科技有限公司 Arithmetic unit, operation execute equipment and operation executes method
CN108510067A (en) * 2018-04-11 2018-09-07 西安电子科技大学 The convolutional neural networks quantization method realized based on engineering
CN108647779A (en) * 2018-04-11 2018-10-12 复旦大学 A kind of low-bit width convolutional neural networks Reconfigurable Computation unit

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017124645A1 (en) * 2016-01-20 2017-07-27 北京中科寒武纪科技有限公司 Apparatus for processing floating point number
CN108364061A (en) * 2018-02-13 2018-08-03 北京旷视科技有限公司 Arithmetic unit, operation execute equipment and operation executes method
CN108510067A (en) * 2018-04-11 2018-09-07 西安电子科技大学 The convolutional neural networks quantization method realized based on engineering
CN108647779A (en) * 2018-04-11 2018-10-12 复旦大学 A kind of low-bit width convolutional neural networks Reconfigurable Computation unit

Also Published As

Publication number Publication date
CN109389212A (en) 2019-02-26

Similar Documents

Publication Publication Date Title
CN109389212B (en) Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network
US10402725B2 (en) Apparatus and method for compression coding for artificial neural network
WO2020258529A1 (en) Bnrp-based configurable parallel general convolutional neural network accelerator
CN109102065B (en) Convolutional neural network accelerator based on PSoC
WO2021036905A1 (en) Data processing method and apparatus, computer equipment, and storage medium
KR102592721B1 (en) Convolutional neural network system having binary parameter and operation method thereof
US11625587B2 (en) Artificial intelligence integrated circuit
US20190243755A1 (en) Dynamic memory mapping for neural networks
US20210110269A1 (en) Neural network dense layer sparsification and matrix compression
KR101950786B1 (en) Acceleration Method for Artificial Neural Network System
JP2022532432A (en) Data compression methods and computing devices
CN111240746B (en) Floating point data inverse quantization and quantization method and equipment
US11263530B2 (en) Apparatus for operations at maxout layer of neural networks
CN117751366A (en) Neural network accelerator and data processing method thereof
CN110874627A (en) Data processing method, data processing apparatus, and computer readable medium
CN112884146A (en) Method and system for training model based on data quantization and hardware acceleration
CN115423084A (en) Systolic array, systolic array system, method and apparatus for computing systolic array system, and storage medium
CN111831358A (en) Weight precision configuration method, device, equipment and storage medium
CN108647780B (en) Reconfigurable pooling operation module structure facing neural network and implementation method thereof
CN114626516A (en) Neural network acceleration system based on floating point quantization of logarithmic block
WO2023109748A1 (en) Neural network adjustment method and corresponding apparatus
CN116325737A (en) Quantification of tree-based machine learning models
US20200334013A1 (en) Processing element and processing system
US20220398442A1 (en) Deep learning computational storage drive
US20030018672A1 (en) System and method for fast median filters, with a predetermined number of elements, in processors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant