CN110135568B - Full-integer neural network method applying bounded linear rectification unit - Google Patents

Full-integer neural network method applying bounded linear rectification unit Download PDF

Info

Publication number
CN110135568B
CN110135568B CN201910453798.8A CN201910453798A CN110135568B CN 110135568 B CN110135568 B CN 110135568B CN 201910453798 A CN201910453798 A CN 201910453798A CN 110135568 B CN110135568 B CN 110135568B
Authority
CN
China
Prior art keywords
network
integer
floating
point number
blu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910453798.8A
Other languages
Chinese (zh)
Other versions
CN110135568A (en
Inventor
赵恒锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910453798.8A priority Critical patent/CN110135568B/en
Publication of CN110135568A publication Critical patent/CN110135568A/en
Application granted granted Critical
Publication of CN110135568B publication Critical patent/CN110135568B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a full-integer neural network method applying a bounded linear rectification unit, which comprises the following stepsThe following steps: firstly, changing the format of input data by using a new normalization method to adapt to a full integer network; step two, after training is finished, calculating discrete step length for each convolution kernel, and for the convolution kernels with batch normalization layers, for W of different channelsfAnd bfCalculating discrete step independently; discretizing network parameters of floating point numbers; step four, discretizing WfThen, training a floating-point number network by using an accumulative gradient updating algorithm; replacing the traditional linear rectifying unit with a bounded linear rectifying unit; step six, an N variance principle is applied to find out a proper BLU parameter h, and the distribution of the output data of the ReLU is firstly counted before the BLU is adopted; the technical scheme provided by the invention can effectively overcome the defects that the prior art is difficult to realize the consideration of speed, storage space, performance and practicability and has lower performance.

Description

Full-integer neural network method applying bounded linear rectification unit
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a full-integer neural network method applying a bounded linear rectification unit.
Background
With the development of science and technology, people are increasingly researching artificial intelligence, which is abbreviated as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. At present, a neural network compression algorithm mainly comprises a method for reducing bit width of parameters and reducing the number of the parameters, so that the storage volume of a neural network is smaller, the operation speed is higher, and the existing neural network optimization method is difficult to realize the consideration of speed, storage space, performance and practicability. Although the existing full-integer neural network can perform full-integer operation, the performance is low, and therefore, the development of a full-integer neural network method applying a bounded linear rectifying unit is the key point for solving the problems.
An effective convolutional neural network compression method (CNNpack) is disclosed in patent application publication No. CN 106557812 a, application publication No. 2017.04.05. Deep convolutional neural networks are widely used as a basic deep learning structure in many computer vision tasks. However, most convolutional neural networks are limited in their application to mobile devices due to their large memory and high computational load. The invention aims at the problem that the network is compressed in the frequency domain; by considering the convolution kernel as an image, the representation in the frequency domain is decomposed into a common part (cluster center) and a private part (residual), and then the low-energy coefficients are discarded without affecting the network accuracy. Furthermore, by linearly combining the convolution responses of the DCT bases, network computation cost can be reduced. The invention has the characteristics of high compression ratio and high speed-up ratio, and can be used for compressing a common deep convolutional network.
However, the effective convolutional neural network compression method is difficult to achieve the compromise of speed, storage space, performance and practicability, and the performance is low.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects in the prior art, the invention provides a full-integer neural network method applying a bounded linear rectification unit, which can effectively overcome the defects of the prior art that the speed, the storage space, the performance and the practicability are difficult to be considered and the performance is lower.
(II) technical scheme
In order to achieve the purpose, the invention is realized by the following technical scheme:
a full integer neural network method employing bounded linear rectification units, comprising the steps of:
step one, subtracting a fixed value from the pixel of each image by using a new normalization method, converting the pixel of each image into a signed integer with the same number of bits, multiplying all images by a fixed floating point number for a floating point number network, maintaining the integer network unchanged, directly sending the images into the network, and training the floating point number network by adopting the conventional random gradient descent algorithm;
step two, after training is finished, calculating discrete step length for each convolution kernel, and for the convolution kernels with batch normalization layers, for W of different channelsfAnd bfCalculating discrete step independently;
discretizing floating point number network parameters, and discretizing WfThe method of invariable fixed zero point is adopted to calculate WfDiscrete step size of (a);
step four, discretizing WfThen, training a floating-point number network by using an accumulative gradient updating algorithm;
replacing the traditional linear rectifying unit with a bounded linear rectifying unit;
step six, an N variance principle is applied to find out a proper BLU parameter h, the distribution of the output data of the ReLU is firstly counted before the BLU is adopted, and according to the selected N, the minimum value in the data of the maximum corresponding proportion in the output data is used as the initial parameter h of the BLU;
step seven, quantizing and dequantizing h to calculate final parameters of a floating point network and an integer network, introducing multiplication mul and right shift to a feature map for cross-layer connection or a connection layer, and then modifying discrete step;
step eight, after BLU parameters of each layer of the floating-point number network and the integer network are determined, the floating-point number network is trained again by utilizing an accumulative gradient updating algorithm;
step nine, calculating an integer network parameter WiAnd biAnd obtaining the final integer neural network.
Preferably, W is calculated in the third stepfMaximum of absolute values of parameters: maxabs ═ max (abs (W)f))。
Preferably, W is calculated in the third stepfTo WiDiscrete step of (1):
Figure GDA0003316597660000031
for convolutional layers with batch normalization, step needs to be recalculated from different channels.
Preferably, the specific algorithm of the step seven is as follows:
hi=[hf*ratioY]
mul,shift=mul_shift(hi,m)
ratiov=ratioY*mul*2-shift,hf=(2m-1-1)/ratiov
hi=[hf*ratioY]。
preferably, said step nine
Figure GDA0003316597660000032
Figure GDA0003316597660000033
(III) advantageous effects
Compared with the prior art, the invention provides a full integer neural network method applying a bounded linear rectification unit, which has the following beneficial effects:
firstly, changing a processing method of input data of a neural network to make the input data suitable for integer network operation;
secondly, discretizing the weight parameters of the floating-point number network, keeping the zero point unchanged, and reducing the operation amount of the target integer network;
third, the floating-point number network is trained using the cumulative gradient descent algorithm, using different cumulative step sizes for the batch normalization layer. Therefore, the accuracy rate after training can be improved;
fourth, replace the ReLU with BLU. Therefore, the accuracy of the target integer network is improved;
fifthly, a method for searching for a BLU parameter suitable for a floating point network is provided, so that the accuracy of the floating point network is ensured, and the accuracy of the integer network is improved;
sixthly, a quantization and inverse quantization method of the BLU parameter is provided, so that the result of the integer network is closer to a floating point network;
seventh, processing of the special network structure: cross-layer connections and connection layers. In the case of adding or connecting a plurality of layers, the ratio of the layers can be ensured to be equal by adjusting three parameters of mul, shift and step.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of a single layer convolution according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating BLU definition (left) and effects (right) according to an embodiment of the present invention;
FIG. 4 is a process of adjusting BLU parameters (left) and the result of the adjustment (right) in an embodiment of the present invention;
FIG. 5 is a diagram of an integer cross-layer connection (left) and a floating point cross-layer connection (right) according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A full integer neural network method employing bounded linear rectification units, as shown in fig. 1-5, comprising the steps of:
step one, subtracting a fixed value from a pixel of each image by using a new normalization method, converting the pixel of each image into a signed integer with the same number of bits, for example, subtracting 128 bits from an 8-bit image, subtracting 32768 bits from a 16-bit image, multiplying all images by a fixed floating point number for a floating point number network, directly sending the integer network into the network while maintaining the integer network unchanged, and training the floating point number network by adopting the conventional random gradient descent algorithm, so that the floating point number network is easy to train under the condition that the input data of the floating point number network and the integer network only differ by a fixed multiplying power;
step two, after training is finished, calculating discrete step length for each convolution kernel, and for the convolution kernels with batch normalization layers, for W of different channelsfAnd bfThe discrete step is independently calculated, so that the accuracy of the trained neural network is higher;
discretizing floating point number network parameters, and discretizing WfThe method of invariable fixed zero point is adopted to calculate WfCompared with the prior method of changing the zero point, the discrete step length of the method reduces the calculated amount;
step four, discretizing WfThen, training a floating-point number network by using an accumulative gradient updating algorithm;
step five, the traditional linear rectification unit is replaced by the bounded linear rectification unit, so that the output range of the convolution layer is fixed, the learning capability of the floating-point number network is kept, and the precision of the corresponding integer network is higher;
and step six, applying an N variance principle to find a proper BLU parameter h, counting the distribution of the output data of the ReLU before adopting the BLU, and according to the selected N, using the minimum value in the data of the maximum corresponding proportion in the output data as the initial parameter h of the BLU, for example, when the N is 3, selecting the minimum value in the data of the maximum 0.15% of all the output data of the ReLU of one layer as the initial h of the layer. By reasonably selecting the size of N, most data can be smaller than h, and h is not too large;
step seven, quantizing and dequantizing h to calculate final parameters of the floating point network and the integer network, introducing multiplication mul and right shift to the feature map for cross-layer connection or connection layers, then modifying discrete step, multiplying mul _ d and right shift _ d for other layers by taking one of the convolution layers as a reference layer, and modifying corresponding step, taking direct addition across two layers as an example, as shown in fig. 5, the specific algorithm is as follows:
first, according to the known ratioY1Recursion layer by layer to obtain ratioV2And ratioY3
Second, calculate muld,shiftd=mul_shift_f(ratioY1,ratioY3) To make
Figure GDA0003316597660000051
Third, modify the step size of the latter convolutional layer:
Figure GDA0003316597660000061
for tie layers, e.g. Y1iAnd Y2iConnecting, selecting the one with the minimum ratio, and setting Y as the rest1iAfter quantization is V1iCorresponding to ratioV1. Step, mul and shift corresponding to all other Y are adjusted to make the ratio corresponding to the other YVWith ratioV1Equal;
step eight, after BLU parameters of each layer of the floating-point number network and the integer network are determined, the floating-point number network is trained again by utilizing an accumulative gradient updating algorithm;
step nine, calculating an integer network parameter WiAnd biAnd obtaining the final integer neural network.
Specifically, the step three is to calculate WfMaximum of absolute values of parameters: maxabs ═ max (abs (W)f) For convolution layers with batch normalization, step needs to be recalculated according to different channels; calculating W in the third stepfTo WiDiscrete step of (1):
Figure GDA0003316597660000062
so that the denominator takes 2n-1-1 instead of 2n-1To avoid overflow; the specific algorithm of the seventh step is as follows:
hi=[hf*ratioY]
mul,shift=mul+shift(hi,m)
ratioV=ratioY*mul*2-shift,hf=(2m-1-1)/ratioV
hi=[hf*ratioY],[]representing a rounding operation. This has the advantage that an integer network of 2 can be madem-11 ratio strictlyVH corresponding to floating-point number networkfThereby reducing errors; in the ninth step
Figure GDA0003316597660000063
Figure GDA0003316597660000064
The purpose of this scheme is to convert a given floating-point number network (down) into a full integer network (up, hereinafter simply referred to as an integer network), as shown in fig. 2. FIG. 2 is a structure of a layer of convolutional layers of a full-integer network, a network being formed by connecting a plurality of such layers;
given a floating-point number network, by setting the parameter W of the floating-point number networkfAnd bfThe quantization can obtain the parameter W corresponding to the integer networkiAnd bi. All data in a floating point network is of the float32 type. For integer networks, the weight W is defined in this schemeiAnd input data XiN, output of integer convolution operation UiAnd parameter biAre all k-bit integers, which are added to give YiAlso a k-bit integer. Y isiObtaining m-bit V through Bounded Linear rectification Unit (BLU) and quantizationiDefinition of the scheme ViThe number of bits of (d) is m. In general Vi> 0, so V can also be represented by unsigned integersiThus, one bit of data can be saved;
the bounded linear rectification unit may be represented in the form:
Figure GDA0003316597660000071
for a certain layer of floating-point number network, the scheme defines h as hfThe corresponding integer network is defined as hi. It can be seen that all data output is limited to between 1 and h, thus being output data YiQuantization to ViBringing great convenience. To quantize YiTo ViThe scheme adopts a method of first multiplying and then moving to the right proposed by Google;
Figure GDA0003316597660000072
by multiplying and right shifting, the effective data bit width will be reduced. The method for calculating mul and shift adopts the existing algorithm, and the scheme is defined as follows:
mul,shift=mul_shift(Yi,m)
and
mul,shift=mul_shift_f(x,y)
in the scheme, all integer network parameters and data have corresponding objects in a floating-point network, and the difference between the integer network parameters and the floating-point network data is a multiple. Parameters or data X for an integer networkiAnd object X in a corresponding floating-point number networkfThe relationship between the two is as follows:
Xi≈Xf*ratrioX
obviously, ratioXIs propagated layer by layer, e.g. ratioU=ratioX*ratioW
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (3)

1. A full integer neural network method employing bounded linear rectification units, characterized by: the method comprises the following steps:
step one, subtracting a fixed value from the pixel of each image by using a new normalization method, converting the pixel of each image into a signed integer with the same number of bits, multiplying all images by a fixed floating point number for a floating point number network, maintaining the integer network unchanged, directly sending the images into the network, and training the floating point number network by adopting the conventional random gradient descent algorithm;
step two, after training is finished, calculating discrete step length for each convolution kernel, and for the convolution kernels with batch normalization layers, for W of different channelsfAnd bfCalculating discrete step independently;
discretizing floating point number network parameters, and discretizing WfThe method of invariable fixed zero point is adopted to calculate WfCalculating W from the discrete step size offMaximum of absolute values of parameters:
maxabs=max(abs(Wf));
calculating WfTo WiDiscrete step of (1):
Figure FDA0003316597650000011
for convolution layers with batch normalization, step needs to be recalculated according to different channels;
step four, discretizing WfThen, training a floating-point number network by using an accumulative gradient updating algorithm;
replacing the traditional linear rectifying unit with a bounded linear rectifying unit;
step six, an N variance principle is applied to find out a proper BLU parameter h, the distribution of the output data of the ReLU is firstly counted before the BLU is adopted, and according to the selected N, the minimum value in the data of the maximum corresponding proportion in the output data is used as the initial parameter h of the BLU;
step seven, quantizing and dequantizing h to calculate final parameters of a floating point network and an integer network, introducing multiplication mul and right shift to a feature map for cross-layer connection or a connection layer, and then modifying discrete step;
step eight, after BLU parameters of each layer of the floating-point number network and the integer network are determined, the floating-point number network is trained again by utilizing an accumulative gradient updating algorithm;
step nine, calculating an integer network parameter WiAnd biAnd obtaining the final integer neural network.
2. The full integer neural network method with the applied bounded linear rectification unit of claim 1, wherein: the specific algorithm of the seventh step is as follows:
hi=[hf*ratioY]
mul,shift=mul_shift(hi,m)
ratiov=ratioY*mul*2-shift,hf=(2m-1-1)/ratiov
hi=[hf*ratioY]。
3. the full integer neural network method with the applied bounded linear rectification unit of claim 1, wherein: in the ninth step
Figure FDA0003316597650000021
CN201910453798.8A 2019-05-28 2019-05-28 Full-integer neural network method applying bounded linear rectification unit Active CN110135568B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910453798.8A CN110135568B (en) 2019-05-28 2019-05-28 Full-integer neural network method applying bounded linear rectification unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910453798.8A CN110135568B (en) 2019-05-28 2019-05-28 Full-integer neural network method applying bounded linear rectification unit

Publications (2)

Publication Number Publication Date
CN110135568A CN110135568A (en) 2019-08-16
CN110135568B true CN110135568B (en) 2022-03-04

Family

ID=67582563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910453798.8A Active CN110135568B (en) 2019-05-28 2019-05-28 Full-integer neural network method applying bounded linear rectification unit

Country Status (1)

Country Link
CN (1) CN110135568B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180107451A1 (en) * 2016-10-14 2018-04-19 International Business Machines Corporation Automatic scaling for fixed point implementation of deep neural networks
CN108053028B (en) * 2017-12-21 2021-09-14 深圳励飞科技有限公司 Data fixed-point processing method and device, electronic equipment and computer storage medium
CN108304786A (en) * 2018-01-17 2018-07-20 东南大学 A kind of pedestrian detection method based on binaryzation convolutional neural networks
CN109800877B (en) * 2019-02-20 2022-12-30 腾讯科技(深圳)有限公司 Parameter adjustment method, device and equipment of neural network

Also Published As

Publication number Publication date
CN110135568A (en) 2019-08-16

Similar Documents

Publication Publication Date Title
CN107516129B (en) Dimension self-adaptive Tucker decomposition-based deep network compression method
CN109063825B (en) Convolutional neural network accelerator
CN110363279A (en) Image processing method and device based on convolutional neural networks model
US20190012559A1 (en) Dynamic quantization for deep neural network inference system and method
CN109002889B (en) Adaptive iterative convolution neural network model compression method
WO2019238029A1 (en) Convolutional neural network system, and method for quantifying convolutional neural network
US20200117981A1 (en) Data representation for dynamic precision in neural network cores
WO2021022685A1 (en) Neural network training method and apparatus, and terminal device
US11483560B2 (en) Point cloud partition methods, encoder, and decoder
JP7408799B2 (en) Neural network model compression
WO2022067790A1 (en) Point cloud layering method, decoder, encoder, and storage medium
CN114239798B (en) Multiplication-free deep neural network model compression method based on parity logarithm quantization
CN113033448B (en) Remote sensing image cloud-removing residual error neural network system, method and equipment based on multi-scale convolution and attention and storage medium
US20050069035A1 (en) Low-complexity 2-power transform for image/video compression
CN114222129A (en) Image compression encoding method, image compression encoding device, computer equipment and storage medium
US20060215917A1 (en) Decoding apparatus, dequantizing method, and program thereof
CN112836823B (en) Convolutional neural network back propagation mapping method based on cyclic recombination and blocking
CN110135568B (en) Full-integer neural network method applying bounded linear rectification unit
KR20200022386A (en) Information processing device and information processing method
CN115866253B (en) Inter-channel conversion method, device, terminal and medium based on self-modulation
CN113283591B (en) Efficient convolution implementation method and device based on Winograd algorithm and approximate multiplier
De Silva et al. Exploring the Implementation of JPEG Compression on FPGA
CN113177627A (en) Optimization system, retraining system, and method thereof, and processor and readable medium
JP2017525185A (en) High precision and quantization method applicable to wavelet transform matrix
CN114528101B (en) Structured dynamic quantization method of neural network applied to power edge calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant