CN110852416A - CNN accelerated computing method and system based on low-precision floating-point data expression form - Google Patents

CNN accelerated computing method and system based on low-precision floating-point data expression form Download PDF

Info

Publication number
CN110852416A
CN110852416A CN201910940659.8A CN201910940659A CN110852416A CN 110852416 A CN110852416 A CN 110852416A CN 201910940659 A CN201910940659 A CN 201910940659A CN 110852416 A CN110852416 A CN 110852416A
Authority
CN
China
Prior art keywords
point number
floating point
low
precision floating
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910940659.8A
Other languages
Chinese (zh)
Other versions
CN110852416B (en
Inventor
吴晨
王铭宇
徐世平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Biong Core Technology Co ltd
Original Assignee
Chengdu Star Innovation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Star Innovation Technology Co ltd filed Critical Chengdu Star Innovation Technology Co ltd
Priority to CN201910940659.8A priority Critical patent/CN110852416B/en
Publication of CN110852416A publication Critical patent/CN110852416A/en
Application granted granted Critical
Publication of CN110852416B publication Critical patent/CN110852416B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Abstract

The invention discloses a CNN accelerated calculation method and a CNN accelerated calculation system based on a low-precision floating point data representation form, and relates to the field of CNN accelerated calculation; the accelerated calculation method comprises the following steps: the floating point number functional module receives an input activation value and a weight from the storage system according to the control signal, and distributes the input activation value and the weight to different processing units PE for convolution calculation to complete CNN accelerated calculation; the convolution calculation comprises forward calculation of convolution layers which are completed by performing dot product calculation on MaEb floating point numbers quantized through low-precision floating point number representation forms; the invention realizes the accurate CNN after quantification under the condition of not needing retraining by calculating and using the expression form MaEb of low-precision floating point numberRate; by performing low-precision floating-point number multiplication, N is realized by DSPmThe MaEb floating-point number multiplier greatly improves the accelerated performance of a customized circuit or an un-customized circuit under the condition of ensuring the accuracy, the customized circuit is an ASIC (application specific integrated circuit) or an SOC (system on chip), and the un-customized circuit comprises an FPGA (field programmable gate array).

Description

CNN accelerated computing method and system based on low-precision floating-point data expression form
Technical Field
The invention relates to the field of deep convolutional neural network quantization, in particular to a CNN (convolutional neural network) accelerated calculation method and system based on a low-precision floating point data representation form.
Background
In recent years, the application of AI (Artificial Intelligence) has penetrated many aspects such as face recognition, game play, image processing, simulation, and the like, and although the processing accuracy is improved, since a neural network includes many layers and a large number of parameters, it requires a very large computational cost and storage space. In this regard, technicians have proposed a neural network compression processing scheme, that is, parameters or storage space of the network are reduced by changing the network structure or using a quantization and approximation method, and network cost and storage space are reduced without greatly affecting the performance of the neural network.
Patent numbers in the prior art: CN109740737A, patent name: a convolutional neural network quantization processing method, a device and a computer device are provided, the method comprises the following steps: acquiring the maximum weight and the maximum deviation of each convolution layer in the convolution neural network; calculating a first dynamic bit precision value of the maximum weight value and a second dynamic bit precision value of the maximum deviation amount, wherein the first dynamic bit precision value is different from the second dynamic bit precision value; quantizing the weight and deviation of the corresponding convolutional layer by using the first dynamic bit precision value and the second dynamic bit precision value corresponding to each convolutional layer; and obtaining the convolution result of the convolutional neural network based on the quantized weight and the quantized deviation amount in each convolutional layer. According to the scheme, a double-precision quantization processing method is adopted to improve the accuracy after quantization, specifically, the maximum weight and the maximum deviation of a convolutional layer in a convolutional neural network are obtained, the dynamic bit precision value of the maximum weight and the dynamic bit precision value of the maximum deviation are respectively calculated, and then convolution calculation is realized by utilizing the two dynamic bit precision values.
Although the prior art improves the quantization and improves the quantization accuracy, there are still several limitations: 1) for quantized deep convolutional neural networks (the number of convolutional layers/fully-connected layers exceeds 100 layers), retraining is required to ensure accuracy; 2) quantization requires the use of 16-bit floating point numbers or 8-bit fixed point numbers to ensure accuracy; 3) on the premise of not using retraining and ensuring accuracy, the prior art can only realize two multiplication operations at most in one DSP, thereby causing lower acceleration performance on an FPGA.
Therefore, a CNN accelerated computation method and system based on low-precision floating-point data representation form are needed, which overcome the above problems, achieve finding the optimal data representation form MaEb without retraining, and achieve N through DSPmAnd the MaEb floating point number multiplier ensures the accuracy of the quantized convolutional neural network and improves the acceleration performance of a custom circuit or a non-custom circuit.
Disclosure of Invention
The invention aims to: the invention provides a CNN (convolutional neural network) accelerated calculation method and system based on a low-precision floating point data representation form, which uses a representation form MaEb of low-precision floating point data to ensure the accuracy of a quantized convolutional neural network without retraining, performs low-precision floating point multiplication operation through the floating point of the MaEb, and realizes N through a DSP (digital signal processor)mAnd the MaEb floating-point number multiplier improves the acceleration performance of the customized circuit or the non-customized circuit.
The technical scheme adopted by the invention is as follows:
the CNN accelerated calculation method based on the low-precision floating point data expression form comprises the following steps:
the central control module generates a control signal to arbitrate the floating-point number functional module and the storage system;
the floating-point number functional module receives an input activation value and a weight from a storage system according to a control signal, and distributes the input activation value and the weight to different processing units PE to carry out convolution calculation of each convolution layer, so that CNN accelerated calculation is completed;
the convolution calculation includes forward calculation of convolution layers completed by performing dot product calculation on MaEb floating point numbers quantized through low-precision floating point number representation forms, wherein a and b are positive integers.
Preferably, the forward calculation of the convolution layer completed by the dot product calculation of the MaEb floating point number quantized by the low-precision floating point number representation form includes the following steps:
step a: quantizing input data of the single-precision floating point number into a floating point number of MaEb in a low-precision floating point number expression form, wherein the input data comprises an input activation value, a weight and a bias, and a + b is more than 0 and less than or equal to 31;
step b: distributing the floating point number of MaEb to parallel N in the floating point function modulemForward computing the low-precision floating-point number multipliers to obtain a full-precision floating-point number product, wherein NmRepresenting the number of low-precision floating-point number multipliers of one processing unit PE in the floating-point number functional module;
step c: transmitting the full-precision floating point number product to a data conversion module to obtain a fixed point number result without precision loss;
step b: and after distributing the fixed point number result to 4T parallel fixed point number addition trees, sequentially accumulating, pooling and activating the fixed point number addition tree result and the bias in the input data through a post-processing unit to finish the calculation of the convolution layer, wherein T is a positive integer.
Preferably, the steps a, b, c comprise the steps of:
the original picture and the weight are quantized into a MaEb floating point number through a low-precision floating point number expression form, the bias is quantized into a 16-bit fixed point number, and the quantized original picture, the weight and the bias are input into the network and stored in an external memory;
after the quantized picture and the weight are subjected to low-precision floating point number multiplication to obtain a (2a + b +4) bit floating point number, the (2a + b +4) bit floating point number is converted into a (2a + 2)(b+1)-1) performing an accumulation calculation after the number of fixed points, and adding the accumulation calculation result and the 16-bit fixed point number of the offset quantization to obtain a 32-bit fixed point number;
and converting the 32-bit fixed point number into a MaEb floating point number as the input of the next layer of the network, and storing the MaEb floating point number into an external memory.
Preferably, the original picture, the floating point number with weight quantized MaEb includes the following steps:
defining a low precision floating point number representation MaEb of the network, the low precision floating point number representation comprising a sign bit, a mantissa, and an exponent;
in the process of optimizing the representation form of the low-precision floating point number, simultaneously changing the combination of the scale factor, the a and the b and calculating the mean square error of the weight and the activation value before and after quantization of each layer of the network, and acquiring the optimal representation form of the low-precision floating point number and the optimal scale factor under the representation form according to the minimum value of the weight and the mean square error of the activation value before and after quantization;
based on the low-precision floating point number representation form and the optimal scale factor, the single-precision floating point number of the original picture and the weight is quantized into a floating point number represented by a low-precision floating point number representation form MaEb;
when a is 4 or 5, the network quantized in the low-precision floating-point number representation is the optimal result.
Preferably, the performing the low precision floating point number multiplication operation on the floating point number of MaEb includes the following steps:
the floating point number of MaEb is divided into an a-bit multiplier-adder and a b-bit adder, and the calculation formula is as follows:
Figure BDA0002222801880000031
wherein M isx,My,Ex,EyDenotes the mantissa and exponent of X and Y, respectively, equation 0.Mx×0.My+(1.Mx+0.My) Realized by an a-bit unsigned fixed-point multiply-add device, equation Ex+EyCan be realized by a b-bit unsigned fixed point number adder;
based on the DSP implemented multiplier-adder P ═ a × B + C, the blank bits added at the input ports implement a number of a-bit multiplier-adders, where A, B, C denotes the three input ports of the DSP.
Preferably, the maximum value of the A, B, C bit width is 25, 18 and 48 respectively.
A system comprises a customization circuit or a non-customization circuit, wherein the customization circuit or the non-customization circuit comprises a floating point function module, the floating point function module is used for receiving an input activation value and a weight from a storage system according to a control signal, distributing the input activation value and the weight to different processing units PE and calculating the convolution which is quantized into MaEb floating point number through a low-precision floating point number representation form in parallel, wherein a and b are positive integers;
the storage system is used for caching the input characteristic diagram, the weight and the output characteristic diagram;
the central control module is used for arbitrating the floating point number functional module and the storage system after decoding the instruction into a control signal;
the floating-point number functional module comprises N parallel processing units PE, and the processing units PE realize N through DSPmA MaEb floating-point number multiplier, wherein N is a positive integer and NmIndicating the number of low precision floating point multipliers of a processing element PE in the floating point function.
Preferably, each processing element PE comprises 4T parallel branches, each of which contains Nm/(4T) multipliers, NmAnd 4T, the multiplier, the data conversion module, the fixed point number addition tree and the post-processing unit are sequentially connected, wherein T is a positive integer.
Preferably, the storage system comprises an input feature map caching module IFMB with a ping-pong architecture, a weight caching module WB and an output feature map caching module OFMB.
Preferably, the post-processing unit comprises an accumulator, a pooling layer and an activation function connected in sequence.
Preferably, a and b satisfy 0 < a + b ≦ 31, and when a is 4 or 5, the network quantized in the low precision floating point number representation is the optimal result.
Preferably, the custom circuit comprises an ASIC or SOC and the off-the-shelf circuit comprises an FPGA.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. the invention uses a low-precision floating point number representation form MaEb, can find an optimal data representation form without retraining, only needs 4 bits or 5 bits of mantissas, ensures that the accuracy loss of top-1/top-5 can be ignored, and the reduction values of the accuracy of top-1/top-5 are respectively within 0.5%/0.3%;
2. the invention realizes the multiplication operation of 8-bit low-precision floating point number by using a 4-bit multiplier-adder and a 3-bit adder, and realizes 4 low-precision floating point number multiplication operations in the same mode in one DSP, which is equivalent to realizing the multiplication operation in four convolution operations in one DSP, compared with the prior art that at most two multiplication operations can be realized by using one DSP, the invention greatly improves the accelerated performance on a customized circuit or an un-customized circuit under the condition of ensuring the accuracy, wherein the customized circuit comprises an ASIC (application specific integrated circuit) or an SOC (system on chip), and the un-customized circuit comprises an FPGA (field programmable gate array);
3. compared with an Intel i9 CPU, the throughput of the invention is improved by 64.5 times, and compared with the existing FPGA accelerator, the throughput of the invention is improved by 1.5 times; for VGG16 and a YOLO convolutional neural network, compared with the existing six FPGA accelerators, the throughput is respectively improved by 3.5 times and 27.5 times, and the throughput of a single DSP is respectively improved by 4.1 times and 5 times;
4. the data representation form of the invention can also be applied to ASIC aspect, in ASIC design, the number of standard units needed is less than that of 8-bit specific point number multipliers;
5. when the forward calculation of the convolution layer is carried out based on the quantization method, the fixed point number of the accumulation result is converted into the floating point number, so that the storage resource is saved; the floating point number accumulation is converted into fixed point number accumulation, so that a large number of customized circuit or non-customized circuit resources can be saved, and the throughput of the customized circuit or the non-customized circuit is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a block diagram of the system of the present invention;
FIG. 2 is a flow chart of a quantization method of the present invention;
FIG. 3 is a schematic diagram of the forward computational data flow of the quantized convolutional neural network of the present invention;
FIG. 4 is a schematic diagram of a full pipeline architecture of the floating-point function module of the present invention;
FIG. 5 is a schematic diagram of the convolution calculation of the present invention;
FIG. 6 is a diagram of the input form of the DSP port according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The features and properties of the present invention are described in further detail below with reference to examples.
Example 1
The embodiment provides a CNN accelerated calculation method and system based on a low-precision floating point data representation form, wherein a representation form MaEb of low-precision floating point data is used, the accuracy of a quantized convolutional neural network is guaranteed under the condition that retraining is not needed, low-precision floating point multiplication is performed through the floating point of the MaEb, and N is achieved through a DSPmThe MaEb floating-point number multiplier improves the acceleration performance of a customized circuit or an un-customized circuit, and comprises the following specific steps:
the CNN accelerated calculation method based on the low-precision floating point data expression form comprises the following steps:
the central control module generates a control signal to arbitrate the floating-point number functional module and the storage system;
the floating-point number functional module receives an input activation value and a weight from a storage system according to a control signal, and distributes the input activation value and the weight to different processing units PE to carry out convolution calculation of each convolution layer, so that CNN accelerated calculation is completed;
the convolution calculation includes forward calculation of convolution layers completed by performing dot product calculation on MaEb floating point numbers quantized through low-precision floating point number representation forms, wherein a and b are positive integers.
As shown in fig. 4, the forward calculation of the convolution layer completed by the dot product calculation through the MaEb floating point number quantized by the low-precision floating point number representation includes the steps of:
step a: quantizing input data of the single-precision floating point number into a floating point number of MaEb in a low-precision floating point number expression form, wherein the input data comprise an input activation value, weight and bias, and a + b is more than 0 and less than or equal to 31;
step b: distributing the floating point number of MaEb to parallel N in the floating point function modulemForward computing the low-precision floating-point number multipliers to obtain a full-precision floating-point number product, wherein NmRepresenting the number of low-precision floating-point number multipliers of one processing unit PE in the floating-point number functional module;
step c: transmitting the full-precision floating point number product to a data conversion module to obtain a fixed point number result without precision loss;
step b: and after distributing the fixed point number result to 4T parallel fixed point number addition trees, sequentially accumulating, pooling and activating the fixed point number addition tree result and the bias in the input data through a post-processing unit to finish the calculation of the convolution layer, wherein T is a positive integer.
The steps a, b and c comprise the following steps:
as shown in fig. 3, the original picture and the weight are quantized into a floating point number of MaEb in a low precision floating point number representation form, the bias is quantized into a fixed point number of 16 bits, and the quantized original picture, the weight and the bias are input into the network and stored in an external memory, wherein a + b is more than 0 and less than or equal to 31, and a and b are positive integers;
after the quantized picture and the weight are subjected to low-precision floating point number multiplication to obtain a (2a + b +4) bit floating point number, the (2a + b +4) bit floating point number is converted into a (2a + 2)(b+1)-1) performing an accumulation calculation after the number of fixed points, and adding the accumulation calculation result and the 16-bit fixed point number of the offset quantization to obtain a 32-bit fixed point number;
and converting the 32-bit fixed point number into a MaEb floating point number as the input of the next layer of the network, and storing the MaEb floating point number into an external memory.
As shown in fig. 2, the floating point number of the original picture and the weight quantized MaEb includes the following steps:
defining a low-precision floating point number representation MaEb of the network, wherein the low-precision floating point number representation comprises a sign bit, a mantissa and an exponent, a + b is more than 0 and less than or equal to 31, and a and b are positive integers;
in the process of optimizing the representation form of the low-precision floating point number, simultaneously changing the combination of the scale factor, the a and the b and calculating the mean square error of the weight and the activation value before and after quantization of each layer of the network, and acquiring the optimal representation form of the low-precision floating point number and the optimal scale factor under the representation form according to the minimum value of the weight and the mean square error of the activation value before and after quantization;
based on the low-precision floating point number representation form and the optimal scale factor, the single-precision floating point number of the original picture and the weight is quantized into a floating point number represented by a low-precision floating point number representation form MaEb;
when a is 4 or 5, the network quantized in the low-precision floating-point number representation is the optimal result.
The low-precision floating point number multiplication operation of the MaEb floating point number comprises the following steps:
the floating point number of MaEb is divided into an a-bit multiplier-adder and a b-bit adder, and the calculation formula is as follows:
wherein M isx,My,Ex,EyDenotes the mantissa and exponent of X and Y, respectively, equation 0.Mx+0.My+(1.Mx+0.My) Realized by an a-bit unsigned fixed-point multiply-add device, equation Ex+EyCan be realized by a b-bit unsigned fixed point number adder;
based on the DSP implemented multiplier-adder P ═ a × B + C, the blank bits added at the input ports implement a number of a-bit multiplier-adders, where A, B, C denotes the three input ports of the DSP.
The maximum value of the A, B, C bit width is 25, 18 and 48 respectively.
Quantification details:
defining a low-precision floating point number representation MaEb of the network, wherein the low-precision floating point number representation comprises a sign bit, a mantissa and an exponent, and a and b are positive integers;
in the process of optimizing the representation form of the low-precision floating point number, simultaneously changing the combination of the scale factor, the a and the b and calculating the mean square error of the weight and the activation value before and after quantization of each layer of the network, and acquiring the optimal representation form of the low-precision floating point number and the optimal scale factor under the representation form according to the minimum value of the weight and the mean square error of the activation value before and after quantization;
and based on the low-precision floating point number representation form and the optimal scale factor, the single-precision floating point number is quantized into the low-precision floating point number.
The decimal value of the low-precision floating point number representation in the step 1 is calculated as follows:
Figure BDA0002222801880000072
wherein, VdecDecimal values representing the representation of low precision floating point numbers, S, M and E representing sign bits, mantissas and exponents, respectively, all being unsigned values, EbA bias representing an exponent for introducing a positive and a negative number for the exponent, expressed as:
Figure BDA0002222801880000073
wherein, DWEThe bit width of the exponent, the mantissa, and the exponent are all non-fixed.
The optimization in the quantization comprises the following steps:
step aa: the method comprises the following steps of mapping a single-precision floating point number multiplied by a scale factor into a dynamic range which can be represented by a low-precision floating point number, rounding the mapped number to be the nearest low-precision floating point number, and keeping data exceeding the dynamic range to be the maximum value or the minimum value, wherein the calculation formula is as follows:
Vlfp=quan(Vfp32×2sf,MINlfb,MAXlfp)
Figure BDA0002222801880000081
wherein, VlfpAnd Vfp32Representing decimal values expressed in the form of low-precision floating-point numbers and single-precision floating-point numbers, MINlfpAnd MAXlfpRepresenting the minimum and maximum values that a low precision floating point number can represent, sf representing a scale factor, quan (x, IN, MAX) representing the quantization of any floating point number x over a range MIN to MAX, round (x) representing the rounding of any floating point number x;
step bb: calculating a Mean Square Error (MSE) of weights before and after quantization and an activation value, wherein the mean square error of the weights before and after quantization and the activation value represents a quantization error, and calculating as follows:
Figure BDA0002222801880000082
wherein N represents the number of weights and activation values;
step cc: changing the scale factor, and repeating the steps aa and bb;
step dd: changing the representation form of the low-precision floating point number, namely the combination of a and b in MaEb, and repeating the steps aa, bb and cc;
step ee: and taking the low-precision representation form and the scale factor corresponding to the minimum value of the mean square error of the weight and the activation value as an optimal result.
As shown in fig. 2, for each convolutional neural network, an optimal low-precision floating-point data representation (bit-width combinations of different mantissas and exponents) is found, thereby ensuring that the quantization error is minimum; in the quantization process of the CNN, quantization or non-quantization can be selected for each layer, and meanwhile, the low-precision floating point number representation form of each layer can be different during quantization, namely a and b only need to satisfy 0 < a + b < 31. Specifically, in the process of optimizing the low-precision floating point number representation form (the optimization can adopt a traversal or other search modes) for each convolutional neural network needing to be quantized, the optimal scale factor under the low-precision floating point number representation form is searched for the weight and the activation value of each layer of the convolutional neural network, and the mean square error of the weight and the activation value before and after quantization is ensured to be minimum; the reason for ensuring the accuracy without retraining is realized by the quantization method of the application is as follows: for a convolutional neural network before quantization, it has an accuracy result itself, and this result is usually defined as a standard value. The method aims to quantize the convolutional neural network on the premise of ensuring the accuracy of the standard; the weight and the activation value of the network before quantization, the data are more close to non-uniform distribution such as Gaussian distribution, gamma distribution and the like, namely the values are concentrated in a certain range, and the probability of the values appearing outside the range is smaller; the quantization weight and the activation value are that the original data are approximately represented by a number with lower precision, the quantization is carried out by a low-precision floating point number, the low-precision floating point number is characterized in that the number which can be represented near zero is more, and the number which can be represented towards two sides is less, namely the characteristic of the low-precision floating point number is closer to the distribution of the weight and the activation value before quantization. And comparing the data before and after quantization, wherein when the quantized data is closer to the data before quantization, the loss of accuracy rate caused by the quantized network is smaller. The mean square error can represent the difference between the quantized data and the data before quantization, and the smaller the mean square error, the more the quantized data is closer to the data before quantization. Therefore, the situation that the mean square error is minimum can be explained, and the situation that the accuracy loss is minimum can be ensured, so that the situation that retraining is not needed can be realized. The optimal data representation form can be found through the quantization method, only 4 bits or 5 bits of mantissas are needed, the loss of the accuracy of top-1/top-5 can be ignored, and the accuracy reduction values of top-1/top-5 are respectively within 0.5%/0.3%.
The forward computational data flow of the quantized neural network is shown in fig. 3. For clarity in explaining the data flow, the data bit width of each step in the data flow is listed using the low precision floating point representation of M4E3 in fig. 3 as an example, i.e., a is 4 and b is 3; all input pictures, weights and offsets are represented by single precision floating point numbers. First, the original picture and the weights are quantized in the data representation of M4E3, while the offset is quantized to 16-bit specific points, and the quantized input picture, the weights and the offset are stored in an external memory in order to reduce quantization error. Next, low-precision floating-point number multiplication is performed on the quantized picture and the weight, and the product is stored as a 15-bit floating-point number M10E 4. And then, the product of the 15-bit floating point number is converted into a 23-bit fixed point number, the 16-bit fixed point number with bias quantization is combined for accumulation calculation, and the final result of accumulation is stored as a 32-bit fixed point number. The above operation has two advantages: 1. no precision loss exists in the whole process, so that the accuracy of the final reasoning result is ensured; 2. converting floating point number accumulation into fixed point number accumulation can save a large amount of resources of a customized circuit (such as an ASIC/SOC) or a non-customized circuit (such as an FPGA), thereby improving the throughput of the realization of the customized circuit (such as the ASIC/SOC) or the non-customized circuit (such as the FPGA). Finally, before being used by another CNN layer, the final output result is converted into a floating point number of M4E3 again and is stored in an external memory, so that the storage space is saved. Only the last data conversion step in the overall data stream will result in a reduction of bit width and loss of precision. The accuracy loss of the part does not influence the final accuracy, and can be verified according to experiments.
The multipliers in each PE are designed for low precision floating point numbers. According to the representation form of the low-precision floating point number, the multiplication of two low-precision floating point numbers can be divided into three parts: 1) carrying out exclusive or on sign bits; 2) multiplying the mantissas; 3) and (4) adding the indexes. Take the form of MaEb for example. We need an a-bit unsigned number multiplier-adder and a b-bit unsigned number adder to implement the multiplication of the two numbers. Although the multiplication of mantissa should use the multiplier with (a +1) bits after considering the first hidden bit (the divisor is 1, and the non-reduced divisor is 0), the present application designs it as a-bit multiplier-adder, which is to improve the efficiency of DSP. Meanwhile, the exponential offset is not included in the adder, because in the embodiment of the present application, all data representations are the same, and the exponential offset is also the same, so that the data representation can be processed in the last step, thereby simplifying the design of the adder.
As shown in fig. 5, in the convolution calculation process, each pixel point of the output channel is calculated by the following formula:
Figure BDA0002222801880000091
where IC denotes the number of input channels, KW and KH denote the width and height of the convolution kernel, and x, y, w and b denote the input activation value, the output activation value, the weight and the offset, respectively. Since 4 low precision floating point number multiplications are implemented with one DSP and calculated as follows: (a + b) × (c + d) ═ ac + bc + ad + bd, so each PE is designed to compute two output channels simultaneously, and on each output channel, two convolution results can be computed simultaneously, as shown in fig. 5. Specifically, in the first cycle, the values of the first pixel and the corresponding first convolution kernel on the IC input channels are fed into the PE for calculation, labeled a and c in fig. 5, respectively. To follow the parallel computation pattern in the four multipliers, the second pixel point on the IC input channels (labeled b in fig. 5) and the value of the corresponding convolution kernel (labeled d in fig. 5) used to compute another output channel are also fed into the PE for computation. Thus, a and b are used repeatedly to calculate values at different locations on the same output channel, while c and d are used in common to calculate values on different output channels. In the same manner, data for a second location is input in a second cycle. Thus, after KW × KH cycles, one PE can calculate four convolution results.
In the present application, N is used in each PEmA multiplier, so that the value of IC is designed to be Nm4, therefore, N is computed in parallel within each PEmAnd 4 input channels. After the corresponding weight and bias are used, two output channels are calculated in parallel, and two pixel points on each output channel are calculated. When the number of input channels is larger than NmAt/4, or the number per output channel is greater than 2 orWhen the number of output channels is greater than 2, a plurality of rounds of calculation are required to complete one convolution operation. Because of the scale of the PE and CNN convolutional layers, the CNN convolutional layer often cannot obtain the final result by one calculation on the PE, the calculation divides the convolutional layer into a plurality of parts, one part of the convolutional layer is put on the PE for calculation, and the calculation result is an intermediate result. This intermediate result is stored in the OFMB and is retrieved from the OFMB for calculation until the next portion is calculated. To be able to improve parallelism, N is used in this designpThe PEs are among different PEs, and can be sent to pixel points on different input characteristic graphs and different weights to perform parallel computation of different dimensions. For example, all PEs may share the same input profile and use different weights to compute different output channels in parallel. Or all PEs share the same parameters and use different input profiles to compute the input channels in parallel. Parameter NmAnd NpIs determined by considering the CNN network structure, the throughput and the bandwidth requirement.
According to the calculation mode in the PE, both IFMB and WB are set to provide N to each PE separately for each cyclem2 input enable values and weights, while the OFMB needs to save four output enable values per cycle. Although each pixel point on the output feature map is finally saved as a low precision floating point number, in the case of intermediate results, it is saved as 16 bits to reduce precision loss. Thus, the bit width of the OFMB needs to be set to 64 bits for each PE. Since the input activation values or weights may be shared by different PEs in different parallel computing modes, defining PifmAnd Pofm(Pifm×Pofm=Np) These two parameters represent the number of PEs used to compute the input and output profiles in parallel, respectively. Thus, PifmThe PEs share the same weight, PofmThe PEs share the same input activation value. For the bit widths of IFMB, WB, and OFMB, it needs to be set as:
Figure BDA0002222801880000111
and 64NpWhereinBW represents the bit width of the low precision floating point number. Parameter Nm,PifmAnd PofmAre determined by balancing considerations of throughput, bandwidth requirements, and resource usage. The sizes of the three on-chip buffers are also determined by comprehensively considering the throughput and the resource usage. In the design of the processor, throughput, bandwidth requirements, resource utilization and expandability are considered in a balanced manner, so that the buffer size is determined to be the size capable of guaranteeing the transmission time of the hidden DMA. With off-the-shelf circuitry, such as in FPGA implementations, IFMB and OFMB are implemented with block RAM and WB is implemented with distributed RAM, because distributed memory can provide larger loans. In the CNN forward calculation process, only when all the input feature maps are used, or all the weights are used, or the OFMB is full, will the external memory be accessed to read new input feature maps or weights or to save the output feature maps.
The specific implementation of 4 multipliers in one DSP. In an FPGA implementation, the data representation M4E3 is used. To explain clearly that four low precision floating point number multipliers are implemented in one DSP, multiplication by two reduced numbers is used as an example. The mantissa of the product of two numbers may be expressed as:
wherein M isx,My,Ex,EyDenotes the mantissa and exponent of X and Y, respectively, equation 0.Mx×0.My+(1.Mx+0.My) Can be realized by a 4-bit unsigned fixed point number multiply-add device, equation Ex+EyMay be implemented by a 3-bit unsigned fixed point number adder. Since the DSPs in the Xilinx 7-family FPGA can implement a multiplier-adder P ═ axb + C (where the maximum bit widths of a, B, and C are 25, 18, and 48, respectively), blank bits are added to each input port, so that the DSPs are fully used to implement four 4-bit multiplier-adders, and the specific input form of each port of the DSPs is shown in fig. 6. During the calculation, the decimal point position is set at the rightmost sideThat is to say 0.MxAnd 0.MyConversion to a 4-bit positive number, 1.Mx+0.MyTo a positive number of 10 bits to ensure that no overlap occurs during the calculation. In such a way, the exponents and the equation 1.M are implemented using a small number of look-up tables (LUTs) and a small number of flip-flops (FFs)x+0.MyIn the case of addition of (2), one DSP may be used to implement multiplication of 4 numbers in the form of M4E3 data representations, thereby greatly increasing the throughput of a single DSP.
To sum up, the present embodiment quantizes the single-precision floating point number of the original picture and the weight to a floating point number represented by a low-precision floating point number representation form MaEb based on the low-precision floating point number representation form and the optimal scale factor, where the low-precision floating point number multiplication operation performed on the floating point number of the MaEb includes splitting the floating point number of the MaEb into an a-bit multiplier-adder and a b-bit adder; based on the multiplier adder P realized by DSP, a-bit multiplier adder is realized by adding blank bits to the input port. For example, a 4-bit multiplier-adder and a 3-bit adder are used to realize the multiplication of an 8-bit low-precision floating point number, and 4 low-precision floating point number multiplications in this way are realized in one DSP, which is equivalent to realizing the multiplication in four convolution operations in one DSP, compared with the existing method that only one DSP can realize two multiplications at most, the accelerated performance on a customized circuit or an un-customized circuit is greatly improved under the condition of ensuring the accuracy; the throughput is improved by 64.5 times compared with an Intel i9 CPU, and is improved by 1.5 times compared with the existing FPGA accelerator; for VGG16 and a YOLO convolutional neural network, compared with the existing six FPGA accelerators, the throughput is respectively improved by 3.5 times and 27.5 times, and the throughput of a single DSP is respectively improved by 4.1 times and 5 times; meanwhile, when forward calculation of the convolution layer is carried out based on a quantization method, the fixed point number of the accumulated result is converted into a floating point number, so that the storage resource is saved; floating point number accumulation is converted into fixed point number accumulation, so that a large number of customized circuit or non-customized circuit resources can be saved, and the throughput of the customized circuit or the non-customized circuit can be improved.
Example 2
Based on embodiment 1, a system includes a customized circuit or an un-customized circuit, the customized circuit includes an ASIC or an SOC, the un-customized circuit includes an FPGA, as shown in fig. 1, the customized circuit or the un-customized circuit includes a floating point function module, the floating point function module is configured to receive an input activation value and a weight from a storage system according to a control signal, distribute the input activation value and the weight to different processing units PE, and calculate a convolution quantized to a MaEb floating point number by a low precision floating point number representation in parallel, where 0 < a + b < 31, and a and b are positive integers;
the storage system is used for caching the input characteristic diagram, the weight and the output characteristic diagram;
the central control module is used for arbitrating the floating point number functional module and the storage system after decoding the instruction into a control signal;
the floating-point number functional module comprises N parallel processing units PE, and the processing units PE realize N through DSPmA MaEb floating-point number multiplier, wherein N is a positive integer and NmIndicating the number of low precision floating point multipliers of a processing element PE in the floating point function.
Each processing element PE comprises 4T parallel branches, each of said parallel branches comprising Nm/(4T) multipliers, NmAnd 4T, the multiplier, the data conversion module, the fixed point number addition tree and the post-processing unit are sequentially connected, wherein T is a positive integer.
The storage system comprises an input characteristic diagram caching module IFMB, a weight caching module WB and an output characteristic diagram caching module OFMB with a ping-pong structure.
The post-processing unit comprises an accumulator, a pooling layer and an activation function which are connected in sequence.
In MaEb, the values of a and b are 4 and 3, namely M4E3, T is 1, NmFor 8, each processing unit PE includes 4 parallel branches, each of which includes 2 multipliers, 2 data conversion modules, 1 fixed point addition tree, and 1 post-processing unit PPM.
Distributing the floating point number of MaEb to parallel N in the floating point function modulemForward computing the low-precision floating-point number multipliers to obtain a full-precision floating-point number product, wherein NmRepresenting the number of low-precision floating-point number multipliers of one processing unit PE in the floating-point number functional module; transmitting the full-precision floating point number product to a data conversion module to obtain a fixed point number result without precision loss; and after distributing the fixed point number result to four parallel (T is 1) fixed point number addition trees, sequentially accumulating, pooling and activating the fixed point number addition tree result and the bias in the input data through a post-processing unit to finish the calculation of the convolutional layer.
In summary, the above modules calculate a convolution layer, and for acceleration of a convolutional neural network CNN, each layer needs to be calculated by the above modules, and in combination with the central control module and the storage system, the floating point function module is configured to receive an input activation value and a weight from the storage system according to a control signal, distribute the input activation value and the weight to different processing units PE, and perform parallel calculation to quantize the convolution value into a MaEb floating point number through a low-precision floating point number representation form; the method and the device can ensure the accuracy of the quantized convolutional neural network without retraining based on MaEb floating point number, and the processing unit PE realizes N through DSPmA MaEb floating-point number multiplier; the performance of acceleration on a customized circuit or an un-customized circuit is greatly improved under the condition of ensuring the accuracy.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (12)

1. The CNN accelerated calculation method based on the low-precision floating point data expression form is characterized in that: the method comprises the following steps:
the central control module generates a control signal to arbitrate the floating-point number functional module and the storage system;
the floating-point number functional module receives an input activation value and a weight from a storage system according to a control signal, and distributes the input activation value and the weight to different processing units PE to carry out convolution calculation of each convolution layer, so that CNN accelerated calculation is completed;
the convolution calculation includes forward calculation of convolution layers completed by performing dot product calculation on MaEb floating point numbers quantized through low-precision floating point number representation forms, wherein a and b are positive integers.
2. The CNN accelerated computation method based on a representation of low-precision floating-point data according to claim 1, wherein: the forward calculation of the convolution layer completed by performing dot product calculation through the MaEb floating point number quantized by the low-precision floating point number representation form comprises the following steps:
step a: quantizing input data of the single-precision floating point number into a floating point number of MaEb in a low-precision floating point number expression form, wherein the input data comprises an input activation value, a weight and a bias, and a + b is more than 0 and less than or equal to 31;
step b: distributing the floating point number of MaEb to parallel N in the floating point function modulemForward computing the low-precision floating-point number multipliers to obtain a full-precision floating-point number product, wherein NmRepresenting the number of low-precision floating-point number multipliers of one processing unit PE in the floating-point number functional module;
step c: transmitting the full-precision floating point number product to a data conversion module to obtain a fixed point number result without precision loss;
step b: and after distributing the fixed point number result to 4T parallel fixed point number addition trees, sequentially accumulating, pooling and activating the fixed point number addition tree result and the bias in the input data through a post-processing unit to finish the calculation of the convolution layer, wherein T is a positive integer.
3. The CNN accelerated computation method based on a representation of low-precision floating-point data according to claim 2, wherein: the steps a, b and c comprise the following steps:
the original picture and the weight are quantized into a MaEb floating point number through a low-precision floating point number expression form, the bias is quantized into a 16-bit fixed point number, and the quantized original picture, the weight and the bias are input into the network and stored in an external memory;
after the quantized picture and the weight are subjected to low-precision floating point number multiplication to obtain a (2a + b +4) bit floating point number, the (2a + b +4) bit floating point number is converted into a (2a + 2)(b+1)-1) performing an accumulation calculation after the number of fixed points, and adding the accumulation calculation result and the 16-bit fixed point number of the offset quantization to obtain a 32-bit fixed point number;
and converting the 32-bit fixed point number into a MaEb floating point number as the input of the next layer of the network, and storing the MaEb floating point number into an external memory.
4. The CNN accelerated computation method based on a representation of low-precision floating-point data according to claim 3, wherein: the floating point number of the original picture and the weight quantized MaEb comprises the following steps:
defining a low precision floating point number representation MaEb of the network, the low precision floating point number representation comprising a sign bit, a mantissa, and an exponent;
in the process of optimizing the representation form of the low-precision floating point number, simultaneously changing the combination of the scale factor, the a and the b and calculating the mean square error of the weight and the activation value before and after quantization of each layer of the network, and acquiring the optimal representation form of the low-precision floating point number and the optimal scale factor under the representation form according to the minimum value of the weight and the mean square error of the activation value before and after quantization;
based on the low-precision floating point number representation form and the optimal scale factor, the single-precision floating point number of the original picture and the weight is quantized into a floating point number represented by a low-precision floating point number representation form MaEb;
when a is 4 or 5, the network quantized in the low-precision floating-point number representation is the optimal result.
5. The CNN accelerated computation method based on a representation of low-precision floating-point data according to claim 3, wherein: the low-precision floating point number multiplication operation of the MaEb floating point number comprises the following steps:
the floating point number of MaEb is divided into an a-bit multiplier-adder and a b-bit adder, and the calculation formula is as follows:
Figure FDA0002222801870000021
wherein M isx,My,Ex,EyDenotes the mantissa and exponent of X and Y, respectively, equation 0.Mx×0.My+(1.Mx+0.My) Realized by an a-bit unsigned fixed-point multiply-add device, equation Ex+EyCan be realized by a b-bit unsigned fixed point number adder;
based on the DSP implemented multiplier-adder P ═ a × B + C, the blank bits added at the input ports implement a number of a-bit multiplier-adders, where A, B, C denotes the three input ports of the DSP.
6. The CNN accelerated computation method based on a representation of low-precision floating-point data according to claim 5, wherein: the maximum value of the A, B, C bit width is 25, 18 and 48 respectively.
7. A system based on the method of claim 1, characterized in that: the system comprises a customization circuit or a non-customization circuit, wherein the customization circuit or the non-customization circuit comprises a floating point function module, the floating point function module is used for receiving an input activation value and weight from a storage system according to a control signal, distributing the input activation value and the weight to different processing units PE (processor edge) for parallel calculation and quantizing the input activation value and the weight into a convolution of MaEb floating point number through a low-precision floating point number representation form, wherein a and b are positive integers;
the storage system is used for caching the input characteristic diagram, the weight and the output characteristic diagram;
the central control module is used for arbitrating the floating point number functional module and the storage system after decoding the instruction into a control signal;
the floating-point number functional module comprises N parallel processing units PE, and the processing units PE realize N through DSPmA MaEb floating-point number multiplier, wherein n is positive integerNumber, NmIndicating the number of low precision floating point multipliers of a processing element PE in the floating point function.
8. The system of claim 7, wherein: each processing element PE comprises 4T parallel branches, each of which comprises Nm/(4T) multipliers, NmAnd 4T, the multiplier, the data conversion module, the fixed point number addition tree and the post-processing unit are sequentially connected, wherein T is a positive integer.
9. The system of claim 7, wherein: the storage system comprises an input characteristic diagram caching module IFMB, a weight caching module WB and an output characteristic diagram caching module OFMB with a ping-pong structure.
10. The system of claim 8, wherein: the post-processing unit comprises an accumulator, a pooling layer and an activation function which are connected in sequence.
11. The system of claim 7, wherein: and a and b satisfy 0 < a + b ≦ 31, and when a is 4 or 5, the network quantized in the form of low-precision floating point number representation is the optimal result.
12. The system of claim 7, wherein: the custom circuit comprises an ASIC or SOC and the off-the-shelf circuit comprises an FPGA.
CN201910940659.8A 2019-09-30 2019-09-30 CNN hardware acceleration computing method and system based on low-precision floating point data representation form Active CN110852416B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910940659.8A CN110852416B (en) 2019-09-30 2019-09-30 CNN hardware acceleration computing method and system based on low-precision floating point data representation form

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910940659.8A CN110852416B (en) 2019-09-30 2019-09-30 CNN hardware acceleration computing method and system based on low-precision floating point data representation form

Publications (2)

Publication Number Publication Date
CN110852416A true CN110852416A (en) 2020-02-28
CN110852416B CN110852416B (en) 2022-10-04

Family

ID=69596180

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910940659.8A Active CN110852416B (en) 2019-09-30 2019-09-30 CNN hardware acceleration computing method and system based on low-precision floating point data representation form

Country Status (1)

Country Link
CN (1) CN110852416B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414994A (en) * 2020-03-03 2020-07-14 哈尔滨工业大学 FPGA-based Yolov3 network computing acceleration system and acceleration method thereof
CN111507465A (en) * 2020-06-16 2020-08-07 电子科技大学 Configurable convolutional neural network processor circuit
CN111696149A (en) * 2020-06-18 2020-09-22 中国科学技术大学 Quantization method for stereo matching algorithm based on CNN
CN112148249A (en) * 2020-09-18 2020-12-29 北京百度网讯科技有限公司 Dot product operation implementation method and device, electronic equipment and storage medium
CN112541583A (en) * 2020-12-16 2021-03-23 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) Neural network accelerator
CN112598078A (en) * 2020-12-28 2021-04-02 北京达佳互联信息技术有限公司 Hybrid precision training method and device, electronic equipment and storage medium
CN112632878A (en) * 2020-12-10 2021-04-09 中山大学 High-speed low-resource binary convolution unit based on FPGA
CN112734020A (en) * 2020-12-28 2021-04-30 中国电子科技集团公司第十五研究所 Convolution multiplication accumulation hardware acceleration device, system and method of convolution neural network
CN113222130A (en) * 2021-04-09 2021-08-06 广东工业大学 Reconfigurable convolution neural network accelerator based on FPGA
CN113762498A (en) * 2020-06-04 2021-12-07 合肥君正科技有限公司 Method for quantizing RoiAlign operator
CN114580628A (en) * 2022-03-14 2022-06-03 北京宏景智驾科技有限公司 Efficient quantization acceleration method and hardware circuit for neural network convolution layer
CN116127255A (en) * 2022-12-14 2023-05-16 北京登临科技有限公司 Convolution operation circuit and related circuit or device with same
CN112598078B (en) * 2020-12-28 2024-04-19 北京达佳互联信息技术有限公司 Hybrid precision training method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570559A (en) * 2015-10-09 2017-04-19 阿里巴巴集团控股有限公司 Data processing method and device based on neural network
CN107239829A (en) * 2016-08-12 2017-10-10 北京深鉴科技有限公司 A kind of method of optimized artificial neural network
CN108133262A (en) * 2016-12-01 2018-06-08 上海兆芯集成电路有限公司 With for perform it is efficient 3 dimension convolution memory layouts neural network unit
CN108647184A (en) * 2018-05-10 2018-10-12 杭州雄迈集成电路技术有限公司 A kind of Dynamic High-accuracy bit convolution multiplication Fast implementation
US20180307980A1 (en) * 2017-04-24 2018-10-25 Intel Corporation Specialized fixed function hardware for efficient convolution
CN109800877A (en) * 2019-02-20 2019-05-24 腾讯科技(深圳)有限公司 Parameter regulation means, device and the equipment of neural network
CN109902745A (en) * 2019-03-01 2019-06-18 成都康乔电子有限责任公司 A kind of low precision training based on CNN and 8 integers quantization inference methods
CN110058883A (en) * 2019-03-14 2019-07-26 成都恒创新星科技有限公司 A kind of CNN accelerated method and system based on OPU

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570559A (en) * 2015-10-09 2017-04-19 阿里巴巴集团控股有限公司 Data processing method and device based on neural network
CN107239829A (en) * 2016-08-12 2017-10-10 北京深鉴科技有限公司 A kind of method of optimized artificial neural network
CN108133262A (en) * 2016-12-01 2018-06-08 上海兆芯集成电路有限公司 With for perform it is efficient 3 dimension convolution memory layouts neural network unit
US20180307980A1 (en) * 2017-04-24 2018-10-25 Intel Corporation Specialized fixed function hardware for efficient convolution
CN108647184A (en) * 2018-05-10 2018-10-12 杭州雄迈集成电路技术有限公司 A kind of Dynamic High-accuracy bit convolution multiplication Fast implementation
CN109800877A (en) * 2019-02-20 2019-05-24 腾讯科技(深圳)有限公司 Parameter regulation means, device and the equipment of neural network
CN109902745A (en) * 2019-03-01 2019-06-18 成都康乔电子有限责任公司 A kind of low precision training based on CNN and 8 integers quantization inference methods
CN110058883A (en) * 2019-03-14 2019-07-26 成都恒创新星科技有限公司 A kind of CNN accelerated method and system based on OPU

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴焕等: "《基于Caffe加速卷积神经网络前向推理》", 《计算机工程与设计》 *
王慧丽等: "《基于通用向量DSP的深度学习硬件加速技术》", 《中国科学》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414994A (en) * 2020-03-03 2020-07-14 哈尔滨工业大学 FPGA-based Yolov3 network computing acceleration system and acceleration method thereof
CN113762498A (en) * 2020-06-04 2021-12-07 合肥君正科技有限公司 Method for quantizing RoiAlign operator
CN113762498B (en) * 2020-06-04 2024-01-23 合肥君正科技有限公司 Method for quantizing RoiAlign operator
CN111507465A (en) * 2020-06-16 2020-08-07 电子科技大学 Configurable convolutional neural network processor circuit
CN111507465B (en) * 2020-06-16 2020-10-23 电子科技大学 Configurable convolutional neural network processor circuit
CN111696149A (en) * 2020-06-18 2020-09-22 中国科学技术大学 Quantization method for stereo matching algorithm based on CNN
CN112148249B (en) * 2020-09-18 2023-08-18 北京百度网讯科技有限公司 Dot product operation realization method and device, electronic equipment and storage medium
JP2022552046A (en) * 2020-09-18 2022-12-15 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Dot product operation implementation method, device, electronic device, and storage medium
CN112148249A (en) * 2020-09-18 2020-12-29 北京百度网讯科技有限公司 Dot product operation implementation method and device, electronic equipment and storage medium
WO2022057502A1 (en) * 2020-09-18 2022-03-24 北京百度网讯科技有限公司 Method and device for implementing dot product operation, electronic device, and storage medium
CN112632878B (en) * 2020-12-10 2023-11-24 中山大学 High-speed low-resource binary convolution unit based on FPGA
CN112632878A (en) * 2020-12-10 2021-04-09 中山大学 High-speed low-resource binary convolution unit based on FPGA
CN112541583A (en) * 2020-12-16 2021-03-23 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) Neural network accelerator
CN112598078A (en) * 2020-12-28 2021-04-02 北京达佳互联信息技术有限公司 Hybrid precision training method and device, electronic equipment and storage medium
CN112734020A (en) * 2020-12-28 2021-04-30 中国电子科技集团公司第十五研究所 Convolution multiplication accumulation hardware acceleration device, system and method of convolution neural network
CN112598078B (en) * 2020-12-28 2024-04-19 北京达佳互联信息技术有限公司 Hybrid precision training method and device, electronic equipment and storage medium
CN113222130A (en) * 2021-04-09 2021-08-06 广东工业大学 Reconfigurable convolution neural network accelerator based on FPGA
CN114580628A (en) * 2022-03-14 2022-06-03 北京宏景智驾科技有限公司 Efficient quantization acceleration method and hardware circuit for neural network convolution layer
CN116127255A (en) * 2022-12-14 2023-05-16 北京登临科技有限公司 Convolution operation circuit and related circuit or device with same
CN116127255B (en) * 2022-12-14 2023-10-03 北京登临科技有限公司 Convolution operation circuit and related circuit or device with same

Also Published As

Publication number Publication date
CN110852416B (en) 2022-10-04

Similar Documents

Publication Publication Date Title
CN110852416B (en) CNN hardware acceleration computing method and system based on low-precision floating point data representation form
CN110852434B (en) CNN quantization method, forward calculation method and hardware device based on low-precision floating point number
KR102647858B1 (en) Low-power hardware acceleration method and system for convolution neural network computation
Ko et al. Design and application of faithfully rounded and truncated multipliers with combined deletion, reduction, truncation, and rounding
JP6528893B1 (en) Learning program, learning method, information processing apparatus
EP3853713A1 (en) Multiply and accumulate circuit
US10491239B1 (en) Large-scale computations using an adaptive numerical format
US20200401873A1 (en) Hardware architecture and processing method for neural network activation function
US10872295B1 (en) Residual quantization of bit-shift weights in an artificial neural network
CN111240746B (en) Floating point data inverse quantization and quantization method and equipment
CN109165006B (en) Design optimization and hardware implementation method and system of Softmax function
CN111091183B (en) Neural network acceleration system and method
Wu et al. Efficient dynamic fixed-point quantization of CNN inference accelerators for edge devices
WO2022170811A1 (en) Fixed-point multiply-add operation unit and method suitable for mixed-precision neural network
US20230376274A1 (en) Floating-point multiply-accumulate unit facilitating variable data precisions
US20200311545A1 (en) Information processor, information processing method, and storage medium
CN111492369A (en) Residual quantization of shift weights in artificial neural networks
CN114860193A (en) Hardware operation circuit for calculating Power function and data processing method
US20220334802A1 (en) Information processing apparatus, information processing system, and information processing method
CN111258545B (en) Multiplier, data processing method, chip and electronic equipment
Vinh et al. FPGA Implementation of Trigonometric Function Using Loop-Optimized Radix-4 CORDIC
CN113419779B (en) Scalable multi-precision data pipeline system and method
JP7247418B2 (en) Computing unit, method and computer program for multiplication
WO2023004799A1 (en) Electronic device and neural network quantization method
CN117632079A (en) Hardware accelerator for floating point operations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200609

Address after: Room 305, building 9, meizhuang new village, 25 Yangzi Jiangbei Road, Weiyang District, Yangzhou City, Jiangsu Province 225000

Applicant after: Liang Lei

Address before: 610094 China (Sichuan) Free Trade Pilot Area, Chengdu City, Sichuan Province, 1402, Block 199, Tianfu Fourth Street, Chengdu High-tech Zone

Applicant before: Chengdu Star Innovation Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221220

Address after: 518017 1110, Building 3, Northwest Shenjiu Science and Technology Pioneer Park, the intersection of Taohua Road and Binglang Road, Fubao Community, Fubao Street, Shenzhen, Guangdong

Patentee after: Shenzhen biong core technology Co.,Ltd.

Address before: Room 305, Building 9, Meizhuang New Village, No. 25, Yangzijiang North Road, Weiyang District, Yangzhou City, Jiangsu Province, 225000

Patentee before: Liang Lei