CN110598839A - Convolutional neural network system and method for quantizing convolutional neural network - Google Patents

Convolutional neural network system and method for quantizing convolutional neural network Download PDF

Info

Publication number
CN110598839A
CN110598839A CN201810603231.XA CN201810603231A CN110598839A CN 110598839 A CN110598839 A CN 110598839A CN 201810603231 A CN201810603231 A CN 201810603231A CN 110598839 A CN110598839 A CN 110598839A
Authority
CN
China
Prior art keywords
convolutional layer
quantization
quantized
convolution
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810603231.XA
Other languages
Chinese (zh)
Inventor
郭鑫
罗龙强
余国生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201810603231.XA priority Critical patent/CN110598839A/en
Priority to PCT/CN2019/090660 priority patent/WO2019238029A1/en
Publication of CN110598839A publication Critical patent/CN110598839A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Complex Calculations (AREA)

Abstract

The application provides a convolutional neural network system and a convolutional neural network quantization method, wherein the system comprises: a quantization module, configured to quantize input data of an i-th convolutional layer of the system, a weight of the i-th convolutional layer, and a bias, respectively, where i is a positive integer; and the convolution module is used for performing convolution calculation on the quantized input data of the ith convolutional layer, the quantized weight and the quantized bias to obtain a convolution result of the ith convolutional layer. The convolutional neural network system provided by the application quantizes the weight and the offset of a convolutional layer to be quantized in the convolutional layer and input data input into the convolutional layer, and performs convolutional calculation by using the quantized input data, the quantized weight and the quantized offset to obtain a calculation result of each convolutional layer. The calculated amount of the convolutional neural network is reduced, and the quantization precision of the convolutional neural network is improved.

Description

Convolutional neural network system and method for quantizing convolutional neural network
Technical Field
The present application relates to the field of convolutional neural networks, and more particularly, to a convolutional neural network system and a method for quantizing a convolutional neural network.
Background
The deep convolutional neural network has hundreds or even tens of millions of parameters after training, for example, weight parameters and bias parameters included in convolutional neural network model parameters, characteristic map parameters of each convolutional layer, and the like. And the storage of the model parameters and the feature map parameters is performed based on 32-bit bits. Due to the large number of parameters and the large amount of data, the entire convolution calculation process consumes a large amount of memory and calculation resources. The development of the deep convolutional neural network is developed towards a direction of being deeper, larger and more complex, the model size of the deep convolutional neural network cannot be transplanted to a mobile phone end or an embedded chip at all, and even if the deep convolutional neural network is transmitted through a network, the high bandwidth occupancy rate is often a difficult problem of engineering implementation.
At present, the solution for reducing the complexity of the convolutional neural network without reducing the accuracy of the convolutional neural network is mainly realized by using a method for quantizing the parameters of the convolutional neural network. However, the precision of the convolutional neural network is reduced by the current quantization method, and the user experience is influenced.
Disclosure of Invention
The application provides a convolutional neural network system and a convolutional neural network quantization method, wherein the weights and the offsets of convolutional layers needing to be quantized in the convolutional layers and input data input into the convolutional layers are quantized, and the quantized input data, the quantized weights and the quantized offsets are used for carrying out convolutional calculation to obtain the calculation result of each convolutional layer. The calculated amount of the convolutional neural network is reduced, and the quantization precision of the convolutional neural network is improved.
In a first aspect, a convolutional neural network system is provided, including: a quantization module, configured to quantize input data of an i-th convolutional layer of the system, a weight of the i-th convolutional layer, and a bias, respectively, where i is a positive integer; and the convolution module is used for performing convolution calculation on the quantized input data of the ith convolutional layer, the quantized weight and the quantized bias to obtain a convolution result of the ith convolutional layer.
In the convolutional neural network system provided by the first aspect, the quantization module quantizes the weights and offsets of convolutional layers to be quantized in the convolutional layers and input data input to the convolutional layers, and the convolution module performs convolution calculation by using the quantized input data, the quantized weights and the quantized offsets to obtain a calculation result of each convolutional layer. The system can make the calculation result obtained by the system more accurate, reduce the calculation amount of the convolutional neural network, reduce the storage data amount of the convolutional neural network model and the convolutional calculation result, and improve the quantization precision of the convolutional neural network. The realization on hardware design is convenient. And the target detection process in the model compression improves the precision and efficiency of target detection.
In one possible implementation manner of the first aspect, the convolution module includes: a multiplier for multiplying the quantized input data of the i-th convolutional layer by the quantized weight; and the adder is used for adding the output result of the multiplier and the quantized offset to obtain the convolution result of the ith convolution layer.
In a possible implementation manner of the first aspect, when i is equal to 1, the input data of the i-th convolutional layer is an original input picture; or when i is larger than 1, the input data of the i-th convolutional layer is characteristic diagram data.
In a possible implementation manner of the first aspect, when the data to be subjected to the i +1 th layer convolution calculation is data to be quantized, the quantization module is further configured to: and performing inverse quantization corresponding to the quantization of the weight and the quantization of the offset on the convolution result of the ith convolutional layer, wherein the convolution result of the ith convolutional layer after inverse quantization is input data of the (i + 1) th convolutional layer. In the implementation mode, the reversibility of quantization and the accuracy of a model are kept, and the accuracy and precision of the convolutional neural network are further improved.
In a possible implementation manner of the first aspect, when the data of the i +1 th layer convolution calculation to be performed is not the data to be quantized, the quantization module is further configured to: performing inverse quantization corresponding to the quantization of the weight and the quantization of the offset on the convolution result of the ith convolution layer; carrying out characteristic diagram inverse quantization on the result obtained after inverse quantization; the convolution module is further to: and performing convolution calculation on the result of the characteristic diagram after inverse quantization, the weight of the (i + 1) th convolutional layer and the bias of the (i + 1) th convolutional layer to obtain the convolution result of the (i + 1) th convolutional layer. In the implementation mode, the reversibility of quantization is kept, and the accuracy and precision of the convolutional neural network are further improved.
In a possible implementation manner of the first aspect, the quantization module is further configured to: correcting the quantized offset; the convolution module is specifically configured to: and performing convolution calculation on the quantized input data of the ith convolutional layer, the quantized weight and the corrected bias to obtain a convolution result of the ith convolutional layer. In this implementation, the accuracy of quantization is further improved while ensuring the reversibility of quantization. The precision and the accuracy of the model are ensured.
In a possible implementation manner of the first aspect, the system further includes: a quantization parameter obtaining module, configured to obtain a quantization parameter of input data of the ith convolutional layer, a quantization parameter of the weight of the ith convolutional layer, and a quantization parameter of the offset; the quantization module is specifically configured to: quantizing the input data of the i-th convolutional layer according to the quantization parameter of the input data of the i-th convolutional layer, quantizing the weight according to the quantization parameter of the weight of the i-th convolutional layer, and quantizing the offset according to the quantization parameter of the offset of the i-th convolutional layer.
In a second aspect, a convolutional neural network quantization method is provided, including: quantizing the input data of the ith convolutional layer of the convolutional neural network, the weight of the ith convolutional layer and the bias respectively, wherein i is a positive integer; and performing convolution calculation on the quantized input data of the ith convolutional layer, the quantized weight and the quantized offset to obtain a convolution result of the ith convolutional layer.
In the convolutional neural network quantization method provided in the second aspect, the weights and offsets of the convolutional layers to be quantized in the convolutional layers and the input data input into the convolutional layers are quantized, and convolution calculation is performed by using the quantized input data, the quantized weights and the quantized offsets, so as to obtain the calculation result of each convolutional layer. The obtained calculation result is more accurate, the calculation amount of the convolutional neural network is reduced, the storage data amount of the convolutional neural network model and the convolution calculation result can be reduced, and the quantization precision of the convolutional neural network is improved. The realization on hardware design is convenient.
In a possible implementation manner of the second aspect, the performing convolution calculation on the quantized input data of the i-th convolutional layer, the quantized weight, and the quantized offset to obtain a convolution result of the i-th convolutional layer includes: multiplying the quantized input data of the i-th convolutional layer by the quantized weight; and adding the result of the multiplication and the quantized offset to obtain a convolution result of the i-th convolutional layer.
In a possible implementation manner of the second aspect, when i is equal to 1, the input data of the i-th convolutional layer is an original input picture; or when i is larger than 1, the input data of the i-th convolutional layer is characteristic diagram data.
In a possible implementation manner of the second aspect, when the data to be subjected to the i +1 th layer convolution calculation is data to be quantized, the method further includes: and performing inverse quantization corresponding to the quantization of the weight and the quantization of the offset on the convolution result of the ith convolutional layer, wherein the convolution result of the ith convolutional layer after inverse quantization is input data of the (i + 1) th convolutional layer.
In one possible implementation manner of the second aspect, when the data of the (i + 1) th layer convolution calculation to be performed is not the data to be quantized, the method further includes: performing inverse quantization corresponding to the quantization of the weight and the quantization of the offset on the convolution result of the ith convolution layer; carrying out characteristic diagram inverse quantization on the result obtained after inverse quantization; and performing convolution calculation on the result of the characteristic diagram after inverse quantization, the weight of the (i + 1) th convolutional layer and the bias of the (i + 1) th convolutional layer to obtain the convolution result of the (i + 1) th convolutional layer.
In one possible implementation manner of the second aspect, the method further includes: correcting the quantized offset; performing convolution calculation on the quantized input data of the i-th convolutional layer, the quantized weight and the quantized offset, including: and performing convolution calculation on the quantized input data of the ith convolutional layer, the quantized weight and the corrected bias to obtain a convolution result of the ith convolutional layer.
In one possible implementation manner of the second aspect, the method further includes: obtaining a quantization parameter of input data of the ith convolutional layer, a quantization parameter of the weight of the ith convolutional layer and a quantization parameter of the offset; the quantizing the input data of the i-th convolutional layer of the convolutional neural network, the weight of the i-th convolutional layer and the bias respectively comprises: quantizing the input data of the i-th convolutional layer according to the quantization parameter of the input data of the i-th convolutional layer, quantizing the weight according to the quantization parameter of the weight of the i-th convolutional layer, and quantizing the offset according to the quantization parameter of the offset of the i-th convolutional layer.
In a third aspect, a chip is provided, where the chip includes a quantization module and a convolution module, and is used to support the new chip to perform corresponding functions in the above method.
In a fourth aspect, a computer system is provided, which comprises a quantization module and a convolution module, for supporting the new chip to execute the corresponding functions of the above method.
In a fifth aspect, a computer-readable storage medium is provided for storing a computer program comprising instructions for performing the method of the second aspect or any one of the possible implementations of the second aspect.
In a sixth aspect, there is provided a computer program product comprising instructions for carrying out the method of any one of the possible implementations of the first to fourth aspects, or of the second aspect described above.
Drawings
FIG. 1 is a schematic block diagram of a convolutional neural network system architecture of one embodiment of the present application.
Fig. 2 is a schematic block diagram of a convolutional neural network system structure of another embodiment of the present application.
Fig. 3 is a schematic block diagram of a convolutional neural network system structure of yet another embodiment of the present application.
Fig. 4 is a schematic block diagram of a convolutional neural network system structure of another embodiment of the present application.
FIG. 5 is a schematic flow chart diagram of a method of convolutional neural network quantization of one embodiment of the present application.
FIG. 6 is a schematic flow chart diagram of a method of convolutional neural network quantization of another embodiment of the present application.
FIG. 7 is a schematic flow chart diagram of a method of convolutional neural network quantization of yet another embodiment of the present application.
FIG. 8 is a schematic flow chart diagram of a method of convolutional neural network quantization of another embodiment of the present application.
FIG. 9 is a schematic flow chart diagram of a method of convolutional neural network quantization of another embodiment of the present application.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
First, some terms related to the present application are explained.
And (3) quantification: quantization is the process of mapping a set of numbers in an original range of values to another target range of values through a mathematical transformation. Methods such as table look-up, shifting, truncating, etc. may be employed. Where a linear transformation is often employed, this transformation is usually done using multiplication.
Inverse quantization: the quantized number is inversely transformed into the original value range based on the previous linear transformation (quantization process). The inverse quantization can ensure that the system adopts quantized data to calculate according to a certain calculation rule, and after the inverse quantization, the result can still keep a value domain range which is very similar to the calculation result which is calculated according to the same calculation rule by using the data in the original value domain range, so that the loss of the precision of the convolutional neural network is small.
Reversibility: in the quantization and inverse quantization processes, the requirements of quantization and inverse quantization can be mutually inverse transformation. I.e., quantized data, can keep the data approximately equal to the original data after inverse quantization.
Reversibility of quantization calculation: after the data is quantized, each layer of data obtains an amplification multiplier, and the multiplication-accumulation output after convolution with the amplification multiplier needs to remove the same amplification multiplier (quantization parameter) so as to ensure that the value range of the whole calculation process is reversible and the value range is approximate. Whereas the reversible calculation is premised on the convolution-based calculation of multiply-accumulate being a linear process.
Retraining (retrain) method: the method refers to a process of retraining on the basis of a model of a convolutional neural network (hereinafter, simply referred to as a "model") which is trained well, and belongs to fine adjustment on network training, wherein the process is based on the model after the network characteristics are slightly modified because some characteristics of the convolutional neural network need fine adjustment.
And (3) hyper-parameter: the general concept in deep learning is to refer to the configuration parameters of the model training process.
Convolutional layers in deep convolutional neural networks generally comprise multiple layers. In the model obtained after the neural network training is completed, hundreds of parameters or even tens of millions of parameters exist, and the parameters may include weight parameters, bias parameters and the like of each convolutional layer in the convolutional neural network model, and may also include characteristic diagram parameters and the like of each convolutional layer. Due to the large number of parameters and the large amount of data, the entire convolution calculation process consumes a large amount of memory and calculation resources. At present, the solution for reducing the complexity of the convolutional neural network without reducing the accuracy of the convolutional neural network is mainly implemented by using a method of quantizing (quantize) the parameters of the convolutional neural network.
After progressive quantization, the result of calculation using quantized data may deviate from the result of calculation using original data, so that the accuracy of the calculation result is reduced. Therefore, a process of inverse quantization, i.e., the invertibility of the quantization process, needs to be satisfied. At present, the calculation of the reversibility of model quantization is shown in formula (1):
in the formula (1), OP represents Overflow Protection (OP), round represents rounding, FM represents Feature Map (FM), W represents weight (weight), Bias represents Bias, and α and β correspond to quantization parameters of weight (weight) and Bias (Bias), respectively.
It can be seen that in equation (1), the multiply-accumulate Σ OP (round (α W)) × FM and the quantized OP (round (β Bias)) are removed from the quantization parameters α and β, respectively, corresponding to the current convolution layer. Where an equality sign is used because there are rounding and overflow protection operations (e.g., 8bit fixed point overflow protection) in the quantization weights and biasing process.
This approximately equivalent form is referred to as the quantization satisfying mathematical reversibility. The subsequent quantization design is based on such a concept.
In the prior art, a scheme for quantizing a model mainly aims at a quantization design scheme of weights and feature maps to achieve the purpose of quantization. The prior art quantization schemes do not involve quantization of the bias, which is a very important parameter in the model. The accuracy of the entire model is greatly affected. For example, for model compression in a convolutional neural network, the purpose of model compression is mainly to reduce the complexity and size of the model. The model compression mainly comprises a classification process and an object detection process. The classification process is to distinguish what the object is in the picture, and the target detection process is to find the position of the object from the picture and then judge what the object is. If quantization is performed according to the quantization flow of the prior art, the influence of the offset is relatively small for the classification process, but the influence of the offset on the result is very large for the target detection process, and the quantization flow cannot solve the quantization in the target detection process. Moreover, the zero-mean method quantization brings addition and subtraction in the actual characteristic diagram quantization, and is not suitable for efficient hardware design.
There are other quantization methods, but none involve quantization of the offset. The accuracy of the convolutional neural network after quantization is reduced, the accuracy of the convolutional neural network after quantization is influenced, and the user experience is influenced.
Based on the above problems, the present application provides a convolutional neural network system, which can support the quantification of bias, improve the calculation accuracy of the system, reduce the calculation amount of the convolutional neural network, and at the same time, reduce the storage data amount of the convolutional neural network model and the convolutional calculation result, and improve the user experience.
Fig. 1 is a schematic block diagram of a convolutional neural network system provided herein, and as shown in fig. 1, the system 100 includes a quantization module 110 and a convolution module 120.
A quantization module 110, configured to quantize input data of an i-th convolutional layer of the system (i-th convolutional calculation input data in the system), a weight of the i-th convolutional layer, and a bias, respectively, where i is a positive integer, and the i-th convolutional calculation data is data to be quantized.
A convolution module 120, configured to perform convolution calculation on the quantized input data of the ith convolutional layer, the quantized weight, and the quantized offset to obtain a convolution result of the ith convolutional layer.
The convolutional neural network system quantizes the weights and the offsets of convolutional layers needing to be quantized in the convolutional layers and input data input into the convolutional layers through the quantization module, and the convolution module performs convolution calculation by using the quantized input data, the quantized weights and the quantized offsets to obtain a calculation result of each convolutional layer. The system can make the calculation result obtained by the system more accurate, reduce the calculation amount of the convolutional neural network, reduce the storage data amount of the convolutional neural network model and the convolutional calculation result, and improve the quantization precision of the convolutional neural network. The realization on hardware design is convenient. And the target detection process in the model compression improves the precision and efficiency of target detection.
In particular, for convolutional neural networks, logically, convolutional layers generally include multiple layers, each convolutional layer has its own convolutional model, that is, each convolutional layer has its own weight and bias value, and a convolutional model can be understood as a computational model. For example, assume that the convolution model of the i-th convolutional layer (i-th convolutional layer is a concept of a logical convolutional layer) is a model shown in equation (2):
y=∑ax+b (2)
in the formula (2), a may be understood as a weight, b may be understood as an offset, and x may be understood as input data to the i-th convolutional layer. The convolution result of the i-th convolutional layer can be obtained by the calculation of the formula (2).
The quantization module 110 may perform quantization processing on a plurality of convolutional layers in the convolutional neural network. The i-th convolutional layer is subjected to weighting, biasing and input data quantization. And the data calculated by the i-th layer of convolution is the data to be quantized. In the embodiment of the present application, the i-th convolutional layer may also be referred to as a convolutional layer to be quantized. The data of the i-th convolutional layer includes input data of the i-th convolutional layer, weights of the i-th convolutional layer, and offsets. The input data of the i-th convolutional layer may also be referred to as input data of the i-th convolutional layer, i.e., the quantization of the input data of the i-th convolutional layer may also be referred to as quantization of the input data of the i-th convolutional layer. The quantization module 110 quantizes the data calculated by the i-th layer convolution to obtain quantized input data, quantized weight, and quantized offset of the i-th layer convolution layer, respectively. The quantization of the weights and offsets may be viewed as a quantization of the convolution model for the i-th convolutional layer. The convolution module 120 may perform convolution calculation on each of the quantized convolutional layers to obtain a calculation result of each convolutional layer.
It should be understood that since convolutional layers of a convolutional neural network generally include multiple layers, it does not mean that the convolutional neural network includes all convolutional layers that need to be quantized. That is, in the convolutional neural network provided in the present application, it may be that a part of the convolutional layer needs to be quantized. The convolutional layers that need quantization may be continuous convolutional layers or discontinuous convolutional layers. For example, assuming that the convolutional layers have 10 layers in total, quantization processing may be required for the 2 nd to 6 th layers, or quantization processing may be required for the 2 nd, 4 th, 7 th, and 9 th convolutional layers, and the embodiment of the present application is not limited herein.
It should also be understood that, in addition to performing convolution calculation on quantized convolutional layers, the convolution module 120 may also perform convolution calculation on non-quantized convolutional layers to obtain a convolution result. The embodiments of the present application are not limited thereto.
It should also be understood that equation (2) above is merely exemplary, and is only for the purpose of illustrating the weights, biases, and input data of the i-th convolutional layer, and should not impose any limitations on the convolutional model of the i-th convolutional layer.
It should also be understood that the system 100 may also include other modules, such as an input module, a pooling module, and the like. For supporting the system to complete other functions of the convolutional neural network. The embodiments of the present application are not limited thereto.
It is also understood that the system 100 may be a chip or an apparatus, which may include the quantization module 110 and the convolution module 120, etc. The embodiments of the present application are not limited thereto.
Optionally, as an embodiment, when i is equal to 1, the input data of the i-th convolutional layer is an original input picture.
Specifically, when the i-th convolutional layer is the first convolutional layer, the input data of the 1 st convolutional layer is the original input picture input into the system. That is, the quantization module 110 needs to quantize the original input picture input to the system, and needs to quantize the weight and offset of the layer 1 convolutional layer. The convolution module 120 performs convolution calculation on the quantized original input picture, the quantized weight, and the quantized offset of the layer 1 convolutional layer to obtain a convolution calculation result of the layer 1 convolutional layer.
Optionally, as an embodiment, when the quantization module 110 quantizes an original input picture (image) input to the layer 1 convolutional layer of the system, the quantization module may perform quantization by using a quantization formula shown in formula (3).
In formula (3), IMG represents an original input picture input to the 1 st convolutional layer, and input-IMG is a matrix representing the original input picture IMG. Q (IMG) represents the input data of the quantized 1 st convolutional layer, γ1A quantization parameter representing layer 1 profile data, the quantization parameter being a parameter used in the quantization process and corresponding to an amplification multiplier. Alpha is alpha1Quantization parameter, W, representing the weight of the layer 1 convolutional layer1Representing the weight, Bias, of the convolutional layer 11Representing the bias of the layer 1 convolutional layer.
It should be understood that the quantization module 110 may quantize the original input picture of the 1 st convolutional layer by using other formulas besides the above formula (3). For example, any modified formula using formula (3), and the like. The embodiments of the present application are not limited thereto.
Optionally, as an embodiment, when i is greater than 1, the input data of the i-th convolutional layer is feature map data.
Specifically, the convolutional layer of the convolutional neural network includes a plurality of layers, and when i is greater than 1, the input data of the convolutional layer of the i-th layer is feature map data. The feature map data represents the computation results of a convolutional layer, which is an intermediate computation result for the whole convolutional neural network. For the input data of the i-th convolutional layer, the input data is the characteristic diagram data of the i-1-th convolutional layer, namely the convolution result of the i-1-th convolutional layer. For example, assume that i is 5 and the input data of the 5 th convolutional layer is profile data. The profile data is the convolution result of the 4 th convolutional layer. That is, the convolution module 120 calculates the convolution result of the 4 th convolutional layer (feature map data of the 4 th convolutional layer). The 4 th convolutional layer may be a quantized convolutional layer or may be an unquantized convolutional layer.
Optionally, as an embodiment, when i is greater than 1, when the quantization module 110 quantizes the input data input to the i-th convolutional layer (i.e., quantizes the characteristic diagram of the i-1-th convolutional layer), the quantization formula shown in formula (4) may be used for quantization.
In the formula (4), Q (FM)i) Input data representing quantized i-th convolutional layers, FMi-1Input data representing the i-th convolutional layer (convolution result of the i-1 st convolutional layer), γiQuantization parameter, gamma, representing profile data of the i-th layer roll-to-rolljThe quantization parameter representing the characteristic map data of the jth convolutional layer is a parameter used in the quantization process and corresponds to an amplification multiplier. Alpha is alphaiQuantization parameter, W, representing the weight of the ith convolutional layeriRepresenting the weight, Bias, of the i-th convolutional layeriRepresenting the bias of the ith convolutional layer.
It should be understood that, in addition to the above equation (4) for quantizing the input data of the i-th convolutional layer (i-1 th convolutional layer) by the quantization module 110, other equations may be used to quantize the input data of the i-th convolutional layer. For example, any modified formula using formula (4), and the like. The embodiments of the present application are not limited thereto.
Optionally, as an embodiment, as shown in fig. 2, the convolution module 120 includes:
a multiplier 121, configured to multiply the quantized input data of the i-th convolutional layer and the quantized weight.
An adder 122 for adding the output result of the multiplier and the quantized offset to obtain the convolution result of the i-th convolutional layer.
Specifically, the convolution process performed by the convolution module 120 is essentially a multiply-accumulate process. Therefore, the convolution module 120 includes a multiplier 121 for multiplying the quantized input data of the i-th convolution layer by the quantized weight to obtain a calculation result. The adder 122 is used for adding the output result of the multiplier 121 and the quantized offset to obtain the convolution result of the i-th convolutional layer. Such as that shown in equation (2). The multiplier 121 first multiplies the quantized weight a by the quantized input data x to obtain a result, and the adder 122 adds the result obtained by calculation of the multiplier 121 to the quantized offset b to obtain a convolution result of the i-th convolutional layer.
It should be understood that when the ith convolutional layer is the last convolutional layer, the convolution result of the ith convolutional layer may be used as the output of the entire convolutional layer, and the output result may be used as the input of other processing layers (e.g., activation function layer) of the convolutional neural network, so that the convolutional neural network performs subsequent calculations, etc. The embodiments of the present application are not limited thereto.
Optionally, as an embodiment, when the data to be subjected to the i +1 th layer of convolution calculation is data to be quantized, that is, when the i +1 th layer of convolution layer is a convolution layer to be quantized, the quantization module 110 is further configured to:
and performing inverse quantization corresponding to the quantization of the weight and the quantization of the offset on the convolution result of the ith convolutional layer, wherein the convolution result of the ith convolutional layer after inverse quantization is input data of the (i + 1) th convolutional layer.
Specifically, before convolution calculation is performed on input data of the i-th convolutional layer, a convolution model (model) of the i-th convolutional layer is quantized (quantization of weights and offsets), and convolution calculation is performed using the quantized model. Therefore, after the convolution result of the i-th convolutional layer is obtained, in order to ensure that the value range in which the convolution result (feature map data) of the i-th convolutional layer is located is consistent with the value range in which the output result of the convolution calculation is performed by using the model without quantization, and to maintain the reversibility of quantization and the accuracy of the model, it is necessary to perform model dequantization on the convolution result of the i-th convolutional layer to ensure the accuracy and precision of the convolution result of the i-th convolutional layer. The convolution result of the unquantized ith convolutional layer is the input data of the (i + 1) th convolutional layer. The data of the (i + 1) th convolutional layer includes input data of the (i + 1) th convolutional layer, weights of the (i + 1) th convolutional layer, and offsets. Since the data of the i +1 th convolution calculation is the data to be quantized (the i +1 th convolution layer is also the convolution layer to be quantized), the processing flow is similar to that of the i-th convolution layer. For the i +1 th convolutional layer, the quantization module 110 also needs to quantize the input data, and quantize the weight and offset of the i +1 th convolutional layer. The convolution module 120 also needs to perform convolution calculation on the quantized input data, the quantized weight, and the quantized offset of the quantized i +1 th convolutional layer to obtain a convolution result of the i +1 th convolutional layer. This can improve the accuracy and precision of the convolutional neural network.
Optionally, when the quantization module 110 performs model inverse quantization (inverse quantization corresponding to quantization of weight and quantization of offset) on the convolution result of the i-layer convolutional layer, the inverse quantization may be performed by using the formula shown in formula (5):
in the formula (5), MODEL _ quantize _ reverse represents the convolution result of the i convolutional layers and performs MODEL inverse quantization, alphaiQuantization parameter, W, representing the weight of the ith convolutional layeriRepresents the weight of the i-th convolutional layer, Q (IMG) represents the original input picture, Bias, of the quantized 1-th convolutional layeriRepresents the offset of the i-th convolutional layer, Q (FM)i-1) The characteristic map data representing the quantized i-1 th convolutional layer (which may also be referred to as the convolution result of the quantized i-1 th convolutional layer), γiQuantization parameter, γ, representing a characteristic map of the ith convolutional layerjQuantization parameters representing the characteristic map data of the jth convolutional layer. The key to the inverse quantization of the model is to reduce the quantization parameter (which may also be referred to as a magnification) of the model and maintain the accuracy of the calculation result of the model.
After performing model inverse quantization on the convolution results of the i layers of convolution layers, input data of the i +1 th layer of convolution layer is obtained. The feature map data after inverse quantization of the model often have uneven numerical ranges, and therefore, input data after inverse quantization of the model needs to be quantized. This process is a process of quantizing the input data of the i +1 th convolutional layer by the quantization module 110.
When quantizing the input data of the i +1 th convolutional layer, the input data of the i +1 th convolutional layer can be quantized by the formula (6) for the i +1 th convolutional layer in combination with the above formula (4) and formula (5).
In equation (6), Q (FM)i+1) Representing quantized input data of layer i +1, alphaiQuantization parameter, W, representing the weight of the ith convolutional layeriRepresents the weight of the i-th convolutional layer, Q (IMG) represents the input data of the quantized 1-th convolutional layer, BiasiRepresents the offset of the i-th convolutional layer, Q (FM)i) Feature map data representing the quantized i-th convolutional layer (convolution result of quantized i-th convolutional layer), γjQuantization parameter, γ, representing a characteristic map of the ith convolutional layerjThe quantization parameter of the characteristic map data of the convolution layer of the j layer is represented, and n represents the total convolution layer number.
The inverse quantization of the model, which is substituted into the above equation (5), yields equation (7):
equation (7) is a formula for quantizing the input data of the i +1 th convolutional layer, where MODEL _ quantization _ reverse represents the result of inverse MODEL quantization performed on the convolution result of the i-th convolutional layer, and Q (FM) isi+1) Input data, γ, representing quantized i +1 th convolutional layeriA quantization parameter representing a feature map of the ith convolutional layer.
It should be understood that the quantization module 110 may quantize the input data of the (i + 1) th convolutional layer by using other formulas besides the above formula (7). The embodiments of the present application are not limited thereto.
Optionally, as an embodiment, when the data of the i +1 th layer convolution calculation to be performed is not the data to be quantized, that is, when the i +1 th layer convolution layer is not the convolution layer to be quantized, the quantization module 110 is further configured to:
performing inverse quantization corresponding to the quantization of the weight and the quantization of the offset on the convolution result of the ith convolution layer;
carrying out characteristic diagram inverse quantization on the result obtained after inverse quantization;
the convolution module 120 is further configured to: and performing convolution calculation on the result of the characteristic diagram after inverse quantization, the weight of the (i + 1) th convolutional layer and the bias of the (i + 1) th convolutional layer to obtain the convolution result of the (i + 1) th convolutional layer.
Specifically, before convolution calculation is performed on the i-th convolutional layer, a convolution model (model) of the i-th convolutional layer is quantized (quantization of weight and offset), and convolution calculation is performed using the quantized model. Therefore, after obtaining the convolution result of the i-th convolutional layer, in order to ensure that the value range in which the output result (feature map data) of the i-th convolutional layer is located is consistent with the value range in which the output result of the convolution calculation using the model without quantization is located, and to maintain the reversibility of quantization, it is necessary to perform model dequantization on the output result of the i-th convolutional layer. The inverse quantization of the output result of the i-th convolutional layer is similar to the inverse quantization process when the i + 1-th convolutional layer is the convolutional layer to be quantized, and is not described herein again.
After the convolution result of the i-th convolutional layer is dequantized, since the data calculated by the i + 1-th convolutional layer is not the data to be quantized (the i + 1-th convolutional layer is not the convolutional layer to be quantized), that is, the i + 1-th convolutional layer does not need to be quantized. However, when convolution calculation is performed on the i-th convolutional layer, input data of the i-th convolutional layer is also quantized. Therefore, in order to ensure that the value range in which the output result of the i-th convolutional layer (i-th convolutional layer characteristic map data) is located is consistent with the value range in which the output result of the convolutional calculation using the input data that is not quantized is located, and to maintain the reversibility of quantization, it is necessary to perform characteristic map dequantization (input data dequantization) on the result of model dequantization of the i-th convolutional layer. The convolution module 120 performs convolution calculation on the result of the inverse quantization of the feature map, the weight of the (i + 1) th convolutional layer, and the offset of the (i + 1) th convolutional layer to obtain the convolution result of the (i + 1) th convolutional layer.
Optionally, when the quantization module 110 performs inverse quantization on the feature map of the result after inverse quantization of the model of the ith convolutional layer, the feature map may be inverse quantized by using the following formula (8):
in the formula (8), αiQuantization parameter, W, representing the weight of the ith convolutional layeriRepresenting the weight, Bias, of the i-th convolutional layeriRepresenting the offset of the i-th convolutional layer, FMiThe results of inverse quantization of the characteristic diagram for the i-th convolutional layer are shown. Q (FM)i-1) Characteristic map data, γ, representing the quantized i-1 th convolutional layerjQuantization parameters representing input data of the i-th convolutional layer. Gamma rayjAnd a quantization parameter representing characteristic map data of the jth convolutional layer, wherein the jth convolutional layer is a convolutional layer which is continuous with the ith convolutional layer and needs quantization before the ith convolutional layer.
Specifically, when inverse quantization of the feature map is performed on the result of inverse quantization of the model of the i-th convolutional layer, it is necessary to consider the quantization parameter for quantizing the input data of the i-th convolutional layer, and to perform quantization on the input data of the convolutional layer to be quantized, which is continuous with the i-th convolutional layer, before the i-th convolutional layer is combined. For example, assume that the i-th convolutional layer is the 5 th convolutional layer, and both the 4 th convolutional layer and the 3 rd convolutional layer are convolutional layers that need quantization. When the feature map of the 5 th convolutional layer is dequantized, it is necessary to perform inverse feature map quantization by combining the quantization parameters of the input data of the 4 th convolutional layer and the 3 rd convolutional layer. I.e. where j has a value of 4 and 3, respectively. If the 4 th convolutional layer and the 3 rd convolutional layer are both convolutional layers requiring quantization and the 3 rd convolutional layer is a convolutional layer not requiring quantization, when performing inverse feature map quantization on the 5 th convolutional layer, inverse feature map quantization needs to be performed in combination with the quantization parameter of the input data of the 4 th convolutional layer, that is, j is 4.
It should also be understood that the quantization module 110 may perform inverse feature map quantization by using other formulas besides inverse feature map quantization by using the above formula (8) to perform inverse feature map quantization on the result of inverse model quantization of the i-th convolutional layer. The embodiments of the present application are not limited thereto.
Optionally, as an embodiment, as shown in fig. 3, the system 100 further includes:
a quantization parameter obtaining module 130, configured to obtain a quantization parameter of the input data of the i-th convolutional layer, a quantization parameter of the weight of the i-th convolutional layer, and a quantization parameter of the offset;
the quantization module 110 is specifically configured to: quantizing the input data of the i-th convolutional layer according to the quantization parameter of the input data of the i-th convolutional layer, quantizing the weight according to the quantization parameter of the weight of the i-th convolutional layer, and quantizing the offset according to the quantization parameter of the offset of the i-th convolutional layer.
Specifically, during model training of the convolutional neural network, statistical data analysis can be performed on all weights and biases in the model. For example, the maximum value corresponding to the weight and the offset can be found and a power of 2 can be found, so that the weight or the offset multiplied by the power of 2 can be maximally close to the preset value range. For example, if 8-bit quantization is given, the range of the value range is between-128 and 127, and a corresponding maximum Shift (Max Shift) value, i.e., a Shift length in binary, is calculated, and then a quantization parameter is determined according to the Max Shift value. The principle underlying this approach is that since all weights are 32-bit floating-point numbers less than 1 due to the addition of the normalization layer during the training process, it is possible to apply a relatively large power of 2 to multiply these decimals to a predetermined range of quantization values, for example, between the scale of-128 to 127, i.e., using a fixed 8-bit representation.
Therefore, the quantization parameter obtaining module 130 is configured to obtain a quantization parameter of the input data of the i-th convolutional layer, a quantization parameter of the weight of the i-th convolutional layer, and a quantization parameter of the offset. The quantization module 110 quantizes the input data of the i-th convolutional layer according to the quantization parameter of the input data of the i-th convolutional layer, quantizes the weight according to the quantization parameter of the weight of the i-th convolutional layer, and quantizes the offset according to the quantization parameter of the offset of the i-th convolutional layer.
The quantization parameter for acquiring the input data of the i-th convolutional layer (i-th convolutional-calculated input data), the quantization parameter for the weight of the i-th convolutional layer, and the quantization parameter for the offset will be specifically described below.
1. Obtaining a weight quantization parameter:
and performing data statistical analysis on all the weights in the ith layer of convolution model. Maximum value of weights Max (abs (W) for i-th convolutional layeri) Based on the target bit width k of the weight quantization), a given value range 2 is obtainedk. First, the quantization multiplier of the weight of the i-th convolutional layer is calculated according to the formula (9)
Max stands for maximum value, abs for absolute value, WiRepresents the weight of the i-th convolutional layer, k represents the target bit width,the multipliers are quantized for the weights.
Since the quantization itself is to replace the whole flow calculation with integer, the quantization multiplier is a real number and is not suitable for the shift operation in the calculation process, and considering that the shift operation in the computer is to perform the power-of-2 division, it is better to reduce the quantization multiplier to the maximum power of 2 less than the value, and then the following formula (10) is used to calculate the quantization parameter α of the i-th convolutional layer weightiComprises the following steps:
floor stands for rounded down, alphaiQuantization parameters representing the weights of the ith convolutional layer.
By developing the formula, i.e. the quantization parameter alpha of the ith convolutional layer weightiThe calculation formula of (2) is as follows:
2. acquiring an offset quantization parameter:
maximum value of offset Max (abs (b)) for ith convolutional layeri) Based on the target bit width k of the weight quantization), a given value range 2 is obtainedk. First, the quantization multiplier of the weight of the i-th convolutional layer is calculated according to the formula (12)
Max stands for maximum value, abs for absolute value, biRepresents the ith layer weight, k represents the target bit width,the multiplier is quantized for offset.
Similar to the quantization parameter for calculating the weights. Then the following formula (13) is used to calculate the quantization parameter β of the current i-th convolutional layer biasiComprises the following steps:
floor stands for rounded down,. beta.iA quantization parameter representing the i-th convolutional layer bias.
The formula (13) is developed, namely the quantization parameter beta of the i-th convolutional layer biasiThe calculation formula of (2) is as follows:
3. acquiring input data (feature map) quantization parameters:
and performing data statistical analysis on all characteristic maps in the ith convolutional layer. Maximum value Max (abs (FM)) of characteristic map of i-th convolutional layeri) Based on the quantized target bit width k), a given value range 2 is obtainedkQuantization parameter γ of i-th layer convolution layer feature mapiAs shown in equation (15):
in the formula (15), γiQuantization parameter representing the i-th convolutional layer feature map, floor represents rounding-down, FMiA characteristic diagram representing the ith convolutional layer.
The process of quantizing the weight of the i-th convolutional layer according to the quantization parameter of the weight and quantizing the offset according to the quantization parameter of the offset of the i-th convolutional layer will be specifically described below.
4. Quantizing the weight of the ith convolutional layer according to the quantization parameter of the weight:
the weight-quantized data of the i-th convolutional layer is Q (W) based on the weight quantization parameter and shift and rounding operationi) (taking an 8-bit fixed-point quantization as an example), it can be calculated by equation (16):
Q(Wi) Weight data representing quantized i-th convolutional layer, round represents rounding, Wi<<αiRepresents a left shift calculation, meaning represents to shift WiLeft shift of alphaiA bit.
5. Quantizing the offset according to the quantization parameter of the offset of the i-th convolutional layer:
quantizing the offset of the i-th convolutional layer is similar to quantizing the weight. The offset quantized data of the i-th convolutional layer is Q (beta) based on the offset quantization parameter and shift and rounding operationi) (taking an 8-bit fixed-point quantization as an example), it can be calculated by equation (17):
in formula (17), Q (Bias)i) Represents the quantized Bias data of the i-th convolutional layer, round represents rounding, Biasi<<βiRepresenting the calculation of left shift, meaning represents the calculation of BiasiLeft shift by betaiA bit.
It should be understood that the above-described acquisition of quantization parameters and quantization of weights and biases may be performed in a training process (off-line process). The convolution module 120 performs the convolution calculation in the actual process of analyzing the input data (on-line process).
It should also be understood that, besides the above-mentioned manner of obtaining the quantization parameters of the input data, the quantization parameters of the weights, the quantization parameters of the offsets, and quantizing the weights and the offsets, the quantization parameters of the respective weights and the offsets may also be obtained according to other manners or formulas, and the like. The embodiments of the present application are not limited thereto.
Optionally, as an embodiment, the quantization module 110 is further configured to:
correcting the quantized bias;
the convolution module 120 is specifically configured to: and carrying out convolution calculation by utilizing the corrected bias to obtain the convolution result.
In particular, in the actual process of performing convolution calculation, some characteristics of the convolutional neural network need to be fine-tuned to ensure the accuracy of the convolutional neural network and to ensure the reversibility of the quantization process. Since the convolution calculation is actually a multiply-accumulate process. The weights are multiplied by the profile and then accumulated with the bias. Under the condition of meeting the quantization reversibility, the quantized offset needs to be corrected, so that the multiplication term and the addition term have the same multiplier factor, and thus, the multiplier factor can be divided in the inverse quantization process, and the quantization precision is further improved on the basis of ensuring the quantization reversibility. The precision and the accuracy of the model are ensured. After correcting the quantized offset, the convolution module 120 may perform convolution calculation on the input data of the quantized i-th convolutional layer, the quantized weight, and the corrected offset to obtain a convolution result of the i-th convolutional layer.
The quantization module 110 may correct the quantized offset. The quantized offset may be corrected by using the quantization parameter of the weight, the quantization parameter of the offset, and the quantization parameter of the feature map. The quantization module 110 may correct the quantized offset of the current layer according to the quantization parameter of the current layer weight, the quantization parameter of the current layer offset, and the feature map quantization parameter of one or more consecutive convolutional layers that need quantization before the current layer. One or more successive convolutional layers that need quantization before the current layer can be understood as: assuming that the current layer is the 5 th convolutional layer, if the 4 th convolutional layer and the 3 rd convolutional layer are both convolutional layers requiring quantization and the 2 nd convolutional layer is a convolutional layer not requiring quantization, one or more consecutive convolutional layers requiring quantization before the current layer are the 4 th convolutional layer and the lower layer convolutional layer. If the 4 th convolutional layer and the 2 nd convolutional layer are both convolutional layers requiring quantization and the 3 rd convolutional layer is a convolutional layer not requiring quantization, one or more continuous convolutional layers requiring quantization before the current layer are the 4 th convolutional layer. The quantization module 110 corrects the quantized offset, including correcting the quantization parameter of the weight and correcting the quantization parameter of the feature map.
The correction of the quantization parameter of the weight and the correction of the quantization parameter of the feature map will be specifically described below.
And correcting the quantization parameters of the weights:
after the quantization process, the convolution expression of the i-th convolutional layer is shown in formula (18) (the rounding calculation is omitted):
Q(Wi) Weight data, FM, representing quantized i-th convolutional layersi-1And (3) representing characteristic diagram data of the i-1 th convolutional layer, wherein the i-1 st convolutional layer can be a convolutional layer needing quantization or a convolutional layer not needing quantization. The value of i is greater than 1. FM in equation (18) when i equals 1i-1To input-img, Q (Bias)i) Bias data, W, representing quantized i-th convolutional layeriRepresenting the weight, Bias, of the i-th convolutional layeriRepresenting the offset, α, of the ith convolutional layeriQuantization parameter, β, representing the weight of the ith convolutional layeriA quantization parameter representing the i-th convolutional layer bias. Wherein, Q (W)i) And Q (Bias)i) Are not identical, which is not feasible for uniformly linearly transforming the entire convolution, requiring alignment of the multiplier factors in their quantization process.
Thus, in the case of quantized BiasiAfter entering into the calculation, useIt is corrected once as shown in equation (19):
it can be seen that the modified bias and weight share a multiplier factorTherefore, when inverse quantization is carried out, the multiplication factor can be divided by the inverse quantization, so that the convolution process meets the reversibility, and the precision and the accuracy of the model are ensured.
In some cases, for example, where the input to the offset is not constrained in its bit width (or loosely constrained), the offset may be directly quantized using the quantization parameter of the weight. Instant powerThe re-sum offset shares a quantization parameter. Namely satisfy
And (3) correcting the quantization parameters of the characteristic diagram:
for the quantification of the feature map, the offset also needs to be corrected. Taking the 1 st and 2 nd convolutional layers as examples, both the first and second convolutional layers are convolutional layers that need quantization.
For the layer 1 convolutional layer, the convolution formula is shown in formula (20):
FM1=∑W1*input-img+Bias1 (20)
in formula (20), IMG represents an original input picture inputted to the 1 st convolutional layer, FM1After input data (original input picture) quantization, weight quantization and offset quantization are performed on feature map data representing the convolutional layer 1, the convolution formula is shown as formula (21):
α1quantization parameter, W, representing the weight of the layer 1 convolutional layer1Representing the weight, Bias, of the convolutional layer 11Represents the bias of the convolutional layer 1, and Q (IMG) represents the quantized input data for the convolutional layer 1.
After inverse model quantization and feature map quantization, a quantized Q (FM) is obtained1),Q(FM1) The feature map data representing the quantized convolutional layer 1 (the convolution result of the quantized convolutional layer 1) is represented by equation (22):
in the formula (22), γ1Quantization parameters representing the profile data of layer 1 convolutional layer. FM1Characteristic diagram data of the layer 1 convolutional layer is shown.
For the second layer convolution layer, after the model parameter quantization and the offset quantization parameter modification, the convolution expression is shown as formula (23):
in the formula (23), α2Quantization parameter, W, representing the weight of the layer 2 convolutional layer2Representing the weight of the convolutional layer 2, Bias2Representing the offset of the 2 nd convolutional layer, FM1Characteristic diagram data, Q (FM), representing the layer 1 convolutional layer1) Characteristic map data, γ, representing the quantized 1 st convolutional layer1In order to maintain the quantized offset to be consistent with the quantized multiplier factor of the characteristic diagram after multiplication and accumulation, the quantization parameter representing the characteristic diagram of the layer 1 convolutional layer needs to be corrected again, corrected and combined with the first-time corrected and combined to obtain the quantized Q (FM)2) As shown in equation (24):
γ2quantization parameter, Q (FM), representing layer 2 convolutional layer feature map data2) And (3) feature map data representing the quantized layer 2 convolutional layer.
It should be understood that the quantization module 110 may modify the quantization parameter of the offset by using other formulas besides the above formula. The embodiments of the present application are not limited thereto.
It should be understood that, in the embodiment of the present application, inverse quantization of the feature map may also be performed by the feature map inverse quantizer. The feature map quantization may be performed by a feature map quantizer, the model dequantization may be performed by a model dequantizer, and the quantization parameter modification may be performed by a quantization parameter modifier, and the feature map dequantizer, the feature map quantizer, the model dequantizer, and the quantization parameter modifier may be integrated in the quantization module or may be separately provided. The quantization module may further include a model quantizer for quantizing the model, a quantization parameter obtaining module, and the like, and the embodiments of the present application are not limited herein.
The convolutional neural network system 200 provided by the present application will be described below with reference to fig. 4. As shown in fig. 4, the system 200 includes a picture quantizer 211, a model quantizer 212, a multiplier 213, an adder 214, a model inverse quantization module 215, a feature map inverse quantizer 216, and a feature map quantizer 217. The picture quantizer 211, the model quantizer 212, and the feature map quantizer 217 may be integrated into one quantization module, or may be separately provided. The model dequantization module 215 and the feature map dequantization module 216 may be integrated into one dequantization module or may be separately provided. The multiplier 213 and the adder 214 may also be integrated into one convolution module, or may be separately provided, and the embodiment of the present application is not limited herein.
As shown in fig. 4. Fig. 4 shows a process flow of the system 200 for determining the current convolutional layer as a convolutional layer to be quantized. The model quantizer 212 performs training to quantize the weight and the offset of each convolutional layer to be quantized, and obtains the quantized weight and offset of each convolutional layer. Assuming that the data of the convolution calculation of the layer 1 is data to be quantized, for an original input picture input to the convolution neural network layer 1 convolution layer, the picture quantizer 211 performs picture quantization on the original input picture to obtain a quantized picture, and performs convolution calculation by combining the quantized weight and offset of the layer 1 convolution layer. The multiplier 213 multiplies the quantized picture by the quantized weight of the layer 1 convolutional layer, and the adder 214 adds the output result of the multiplier and the quantized offset of the layer 1 convolutional layer to obtain the convolution result of the layer 1 convolutional layer. The model dequantizer 215 model-dequantizes the convolution result of layer 1. And determining the processing flow according to whether the 2 nd convolutional layer is a convolutional layer needing to be quantized.
If the convolutional layer 2 is a convolutional layer to be quantized, that is, if the data of the convolutional calculation of layer 2 is to-be-quantized data, the feature map quantizer 217 quantizes the inverse quantization result of the model inverse quantizer 215, and inputs the quantized result of the feature map into the multiplier 213, the multiplier 213 multiplies the quantized result of the feature map by the quantized weight of the convolutional layer 2, and the adder 214 adds the quantized offset of the convolutional layer 2 to the output result of the multiplier 211 to obtain the feature map data of the convolutional layer 2. The subsequent convolutional layers that need quantization are then processed using the same process flow.
If the convolutional layer 2 is a convolutional layer which does not need quantization, that is, if the data of the convolutional calculation of layer 2 is not the data to be quantized, the feature map inverse quantizer 216 performs feature map inverse quantization on the inverse quantization result of the model inverse quantizer 215, and inputs the result of the feature map inverse quantization into the multiplier 213, the multiplier 213 performs multiplication on the result of the feature map inverse quantization and the weight of the convolutional layer 2, and the adder 214 adds the offset of the convolutional layer 2 and the output result of the multiplier 213 to obtain the feature map data of the convolutional layer 2. The subsequent convolutional layers that do not require quantization are then processed using the same process flow.
Optionally, the system further includes a quantization modifier 218, where the quantization modifier 218 may modify the quantized offset, and the offset input to the adder 214 may also be the offset modified by the quantization modifier 218. The quantization modifier 218 may be integrated into the model quantizer 212 or may be separately provided.
For the case that the current layer is the last convolutional layer, if the last convolutional layer is the convolutional layer to be quantized, the model dequantizer 215 performs model dequantization on the convolutional result of the last convolutional layer first, and the result obtained by the feature map dequantizer 216 after performing feature map dequantization on the result of the model dequantizer 215 is the output result of all convolutional layers. If the last convolutional layer is a convolutional layer which does not need to be quantized, the feature map data input to the last layer from the penultimate layer needs to be determined according to whether the penultimate layer is a convolutional layer which needs to be quantized.
For the layer 1 convolutional layer, the quantized picture is input to the multiplier 221, and for the other layer convolutional layer, the output of the feature map quantizer 217 or the feature map dequantizer 216 is input to the multiplier 213. That is, the picture quantizer 211 inputs and outputs only once for convolution calculation of an original picture, i.e., quantized pictures are input for the layer 1 convolution layer. For other convolutional layers. The picture quantizer 211 does not output. Shown by the dotted line in fig. 4 is that data is input to the multiplier 213 only to the layer 1 convolutional layer picture quantizer 211, and data is not input to the multiplier 213 to the other convolutional layer picture quantizers 211.
For example, assuming that 10 convolutional layers are in total, if the 10 th convolutional layer is a convolutional layer to be quantized, the adder 214 adds the quantized offset of the 10 th convolutional layer to the output result of the multiplier 213 to obtain the feature map data (convolution result) of the 10 th convolutional layer, and the model dequantizer 215 performs model dequantization on the convolution result of the 10 th layer. The inverse feature map quantizer 216 performs inverse feature map quantization on the result of the inverse model quantizer 215 to obtain the convolution result of the entire convolutional layer.
Similarly, the process flow for the ith convolutional layer requiring quantization is similar to that described above. For brevity, no further description is provided herein.
The convolutional neural network system provided by the application quantizes the weight and the offset of each convolutional layer in convolutional layers needing to be quantized in the convolutional layers and input data input into the convolutional layers through the quantization module, and the convolution module performs convolution calculation by using the quantized input data, the quantized weight and the quantized offset to obtain the calculation result of each convolutional layer. And inverse quantization is performed on the quantized model and the feature map, so that the quantization satisfies reversibility. The method ensures the quantization precision, can ensure that the calculation result obtained by the system is more accurate, and improves the precision and the accuracy of the convolutional neural network system. The calculation amount of the convolutional neural network is reduced, and simultaneously, the storage data amount of the convolutional neural network model and the convolution calculation result can be reduced.
The present application further provides a method for quantizing a convolutional neural network, where the method 300 for quantizing a convolutional neural network can be applied to a convolutional neural network system (apparatus), where the convolutional neural network system can be the convolutional neural network system provided in the above application, and can also be an existing convolutional neural network system, and the embodiments of the present application are not limited herein. Fig. 6 shows a schematic flow diagram of a method 300 of convolutional neural network quantization provided herein. The method 300 may be performed by a chip, which may include a quantization module, a convolution module, and the like. Or may be executed by a computer system, which may include a quantization module, a convolution module, etc., and the chip or the computer system may be the convolution neural network system (device) provided in the present application. The application is not limited herein. As shown in fig. 5, the method 300 includes:
s310, input data of the ith convolutional layer of the convolutional neural network, the weight of the ith convolutional layer and the bias are quantized respectively, wherein i is a positive integer. The data calculated by the i-th layer convolution is the data to be quantized.
S320, carrying out convolution calculation on the quantized input data of the ith convolutional layer, the quantized weight and the quantized bias to obtain a convolution result of the ith convolutional layer.
The convolutional neural network quantization method provided by the application quantizes the weight and the offset of a convolutional layer to be quantized in the convolutional layer and input data input into the convolutional layer, and performs convolutional calculation by using the quantized input data, the quantized weight and the quantized offset to obtain a calculation result of each convolutional layer. The obtained calculation result is more accurate, the calculation amount of the convolutional neural network is reduced, the storage data amount of the convolutional neural network model and the convolution calculation result can be reduced, and the quantization precision of the convolutional neural network is improved. The realization on hardware design is convenient.
Optionally, as an embodiment, as shown in fig. 6, in S320, performing convolution calculation on the quantized input data of the i-th convolutional layer, the quantized weight, and the quantized offset to obtain a convolution result of the i-th convolutional layer includes:
s321, multiplying the quantized input data of the i-th convolutional layer and the quantized weight.
S322, adding the result of the multiplication and the quantized offset to obtain the convolution result of the i-th convolutional layer.
Optionally, as an embodiment, when i is equal to 1, the input data of the i-th convolutional layer is an original input picture; or when i is larger than 1, the input data of the i-th convolutional layer is characteristic diagram data.
Optionally, as an embodiment, as shown in fig. 7, when the data to be subjected to the i +1 th layer convolution calculation is data to be quantized, the method 300 further includes:
s330, inverse quantization corresponding to the quantization of the weight and the quantization of the offset is performed on the convolution result of the ith convolution layer, wherein the convolution result of the ith convolution layer after the inverse quantization is the input data of the (i + 1) th convolution layer.
Optionally, as an embodiment, as shown in fig. 8, when the data to be subjected to the i +1 th layer convolution calculation is not the data to be quantized, the method 300 further includes:
s340, inverse quantization corresponding to the quantization of the weight and the quantization of the offset is performed on the convolution result of the i-th convolutional layer.
And S350, carrying out characteristic diagram inverse quantization on the result obtained after inverse quantization.
S360, carrying out convolution calculation on the result of the characteristic diagram after inverse quantization, the weight of the (i + 1) th convolutional layer and the bias of the (i + 1) th convolutional layer to obtain the convolution result of the (i + 1) th convolutional layer.
Optionally, as an embodiment, as shown in fig. 9, the method 300 further includes:
s311 corrects the quantized offset.
In S320, the performing convolution calculation on the quantized input data of the i-th convolutional layer, the quantized weight, and the quantized offset includes:
and performing convolution calculation on the quantized input data of the ith convolutional layer, the quantized weight and the corrected bias to obtain a convolution result of the ith convolutional layer.
Optionally, as an embodiment, the method 300 further includes:
obtaining a quantization parameter of input data of the ith convolutional layer, a quantization parameter of the weight of the ith convolutional layer and a quantization parameter of the offset;
in S310. The quantizing the input data of the i-th convolutional layer of the convolutional neural network, the weight of the i-th convolutional layer and the bias respectively comprises:
quantizing the input data of the i-th convolutional layer according to the quantization parameter of the input data of the i-th convolutional layer, quantizing the weight according to the quantization parameter of the weight of the i-th convolutional layer, and quantizing the offset according to the quantization parameter of the offset of the i-th convolutional layer.
It should be understood that the specific steps of each implementation of method 300 may be described with reference to the corresponding description of the convolutional neural network system described above. For example, the formulas for quantization or dequantization of various embodiments in method 300 may utilize the corresponding formulas of the convolutional neural network system described above, and so on. To avoid repetition, further description is omitted here.
It should also be understood that the above description is only for the purpose of facilitating a better understanding of the embodiments of the present application by those skilled in the art, and is not intended to limit the scope of the embodiments of the present application. It will be apparent to those skilled in the art that various equivalent modifications or changes may be made, or certain steps may be newly added, etc., based on the examples given above. Or a combination of any two or more of the above embodiments. Such modifications, variations, or combinations are also within the scope of the embodiments of the present application.
It should also be understood that the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic thereof, and should not constitute any limitation to the implementation process of the embodiments of the present application.
It should also be understood that the foregoing descriptions of the embodiments of the present application focus on highlighting differences between the various embodiments, and that the same or similar elements that are not mentioned may be referred to one another and, for brevity, are not repeated herein.
Embodiments of the present application also provide a computer readable medium for storing a computer program code, the computer program including instructions for performing the method for convolutional neural network quantization of the embodiments of the present application in the method 300 described above. The readable medium may be a read-only memory (ROM) or a Random Access Memory (RAM), which is not limited in this embodiment of the present application.
The present application also provides a computer program product comprising instructions that, when executed, cause an apparatus to perform operations corresponding to the above-described methods.
The present application further provides a computer system including a chip or an apparatus for performing the method for convolutional neural network quantization of the embodiments of the present application. The chip or the device may be the convolutional neural network system provided in the present application.
An embodiment of the present application further provides a system chip, where the system chip includes: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, a pin or a circuit, etc. The processing unit can execute computer instructions to cause a chip in the communication device to execute any one of the above-mentioned methods for convolutional neural network quantization provided by the embodiments of the present application.
Optionally, the computer instructions are stored in a storage unit.
Alternatively, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the terminal, such as a ROM or other types of static storage devices that can store static information and instructions, a RAM, and the like. The processor mentioned in any of the above may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits for executing programs for controlling the above method for quantizing a convolutional neural network. The processing unit and the storage unit may be decoupled, and are respectively disposed on different physical devices, and are connected in a wired or wireless manner to implement respective functions of the processing unit and the storage unit, so as to support the system chip to implement various functions in the foregoing embodiments. Alternatively, the processing unit and the memory may be coupled to the same device.
It should be understood that the foregoing descriptions of the embodiments of the present application focus on highlighting differences between the various embodiments, and that the same or similar parts that are not mentioned may be referred to one another, and thus, for brevity, will not be described again.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (15)

1. A convolutional neural network system, comprising:
a quantization module, configured to quantize input data of an ith convolutional layer of the system, a weight of the ith convolutional layer, and a bias, respectively, where i is a positive integer;
and the convolution module is used for performing convolution calculation on the quantized input data of the ith convolutional layer, the quantized weight and the quantized bias to obtain a convolution result of the ith convolutional layer.
2. The system of claim 1, wherein the convolution module comprises:
a multiplier for multiplying the quantized input data of the i-th convolutional layer by the quantized weight;
and the adder is used for adding the output result of the multiplier and the quantized offset to obtain the convolution result of the ith convolution layer.
3. The system according to claim 1 or 2, characterized in that:
when i is equal to 1, the input data of the ith layer of convolution layer is an original input picture; alternatively, the first and second electrodes may be,
and when i is larger than 1, the input data of the ith convolutional layer is characteristic diagram data.
4. The system according to any one of claims 1 to 3, wherein when the data on which the (i + 1) th layer convolution calculation is to be performed is the data to be quantized,
the quantization module is further to:
and performing inverse quantization corresponding to the quantization of the weight and the quantization of the offset on the convolution result of the ith convolution layer, wherein the convolution result of the ith convolution layer after the inverse quantization is input data of the (i + 1) th convolution layer.
5. The system of any of claims 1 to 3, wherein when the data on which the (i + 1) th layer convolution calculation is to be performed is not data to be quantized, the quantization module is further configured to:
performing inverse quantization corresponding to quantization of the weight and quantization of the offset on the convolution result of the i-th convolutional layer;
carrying out characteristic diagram inverse quantization on the result obtained after inverse quantization;
the convolution module is further to: and performing convolution calculation on the result of the characteristic diagram after inverse quantization, the weight of the (i + 1) th convolutional layer and the bias of the (i + 1) th convolutional layer to obtain a convolution result of the (i + 1) th convolutional layer.
6. The system of any of claims 1-5, wherein the quantization module is further configured to:
correcting the quantized offset;
the convolution module is specifically configured to: and carrying out convolution calculation on the quantized input data of the ith convolutional layer, the quantized weight and the corrected bias to obtain a convolution result of the ith convolutional layer.
7. The system of claim 6, further comprising:
a quantization parameter obtaining module, configured to obtain a quantization parameter of input data of the ith convolutional layer, a quantization parameter of the weight of the ith convolutional layer, and a quantization parameter of the offset;
the quantization module is specifically configured to: quantizing the input data of the ith convolutional layer according to the quantization parameter of the input data of the ith convolutional layer, quantizing the weight according to the quantization parameter of the weight of the ith convolutional layer, and quantizing the offset according to the quantization parameter of the offset of the ith convolutional layer.
8. A method of convolutional neural network quantization, comprising:
quantizing input data of the ith convolutional layer of the convolutional neural network, the weight of the ith convolutional layer and the bias respectively, wherein i is a positive integer;
and carrying out convolution calculation on the quantized input data of the ith convolutional layer, the quantized weight and the quantized bias to obtain a convolution result of the ith convolutional layer.
9. The method of claim 8, wherein the convolving the quantized input data of the i-th convolutional layer, the quantized weights, and the quantized offsets to obtain the convolution result of the i-th convolutional layer comprises:
multiplying the quantized input data of the i-th convolutional layer by the quantized weight;
and adding the result of the multiplication operation and the quantized offset to obtain a convolution result of the ith convolution layer.
10. The method according to claim 8 or 9, characterized in that:
when i is equal to 1, the input data of the ith layer of convolution layer is an original input picture; alternatively, the first and second electrodes may be,
and when i is larger than 1, the input data of the ith convolutional layer is characteristic diagram data.
11. The method according to any one of claims 8 to 10, wherein when the data on which the (i + 1) th layer convolution calculation is to be performed is data to be quantized, the method further comprises:
and performing inverse quantization corresponding to the quantization of the weight and the quantization of the offset on the convolution result of the ith convolution layer, wherein the convolution result of the ith convolution layer after the inverse quantization is input data of the (i + 1) th convolution layer.
12. The method according to any one of claims 8 to 10, wherein when the data on which the (i + 1) th layer convolution calculation is to be performed is not the data to be quantized, the method further comprises:
performing inverse quantization corresponding to quantization of the weight and quantization of the offset on the convolution result of the i-th convolutional layer;
carrying out characteristic diagram inverse quantization on the result obtained after inverse quantization;
and performing convolution calculation on the result of the characteristic diagram after inverse quantization, the weight of the (i + 1) th convolutional layer and the bias of the (i + 1) th convolutional layer to obtain a convolution result of the (i + 1) th convolutional layer.
13. The method according to any one of claims 8 to 12, further comprising:
correcting the quantized offset;
performing convolution calculation on the quantized input data of the i-th convolutional layer, the quantized weight, and the quantized bias, includes:
and carrying out convolution calculation on the quantized input data of the ith convolutional layer, the quantized weight and the corrected bias to obtain a convolution result of the ith convolutional layer.
14. The method of claim 13, further comprising:
obtaining a quantization parameter of input data of the ith convolutional layer, a quantization parameter of the weight of the ith convolutional layer and a quantization parameter of the bias;
the quantizing the input data of the ith convolutional layer of the convolutional neural network, the weight of the ith convolutional layer, and the bias, respectively, includes:
quantizing the input data of the ith convolutional layer according to the quantization parameter of the input data of the ith convolutional layer, quantizing the weight according to the quantization parameter of the weight of the ith convolutional layer, and quantizing the offset according to the quantization parameter of the offset of the ith convolutional layer.
15. A computer-readable storage medium storing a computer program for executing the method of convolutional neural network quantization of any of claims 8-14.
CN201810603231.XA 2018-06-12 2018-06-12 Convolutional neural network system and method for quantizing convolutional neural network Pending CN110598839A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810603231.XA CN110598839A (en) 2018-06-12 2018-06-12 Convolutional neural network system and method for quantizing convolutional neural network
PCT/CN2019/090660 WO2019238029A1 (en) 2018-06-12 2019-06-11 Convolutional neural network system, and method for quantifying convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810603231.XA CN110598839A (en) 2018-06-12 2018-06-12 Convolutional neural network system and method for quantizing convolutional neural network

Publications (1)

Publication Number Publication Date
CN110598839A true CN110598839A (en) 2019-12-20

Family

ID=68842786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810603231.XA Pending CN110598839A (en) 2018-06-12 2018-06-12 Convolutional neural network system and method for quantizing convolutional neural network

Country Status (2)

Country Link
CN (1) CN110598839A (en)
WO (1) WO2019238029A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111258839A (en) * 2020-02-16 2020-06-09 苏州浪潮智能科技有限公司 AI accelerator card simulation test system based on ResNet50 network and working method thereof
CN111368972A (en) * 2020-02-21 2020-07-03 华为技术有限公司 Convolution layer quantization method and device thereof
CN111914996A (en) * 2020-06-30 2020-11-10 华为技术有限公司 Method for extracting data features and related device
CN112580492A (en) * 2020-12-15 2021-03-30 深兰人工智能(深圳)有限公司 Vehicle detection method and device
CN112990438A (en) * 2021-03-24 2021-06-18 中国科学院自动化研究所 Full-fixed-point convolution calculation method, system and equipment based on shift quantization operation
CN113420788A (en) * 2020-10-12 2021-09-21 黑芝麻智能科技(上海)有限公司 Integer-based fusion convolution layer in convolutional neural network and fusion convolution method
CN113762500A (en) * 2020-06-04 2021-12-07 合肥君正科技有限公司 Training method for improving model precision of convolutional neural network during quantification
CN113762497A (en) * 2020-06-04 2021-12-07 合肥君正科技有限公司 Low-bit reasoning optimization method of convolutional neural network model
CN113850374A (en) * 2021-10-14 2021-12-28 安谋科技(中国)有限公司 Neural network model quantization method, electronic device, and medium
WO2022088063A1 (en) * 2020-10-30 2022-05-05 华为技术有限公司 Method and apparatus for quantizing neural network model, and method and apparatus for processing data

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468935B (en) * 2020-05-08 2024-04-02 上海齐感电子信息科技有限公司 Face recognition method
CN113780513B (en) * 2020-06-10 2024-05-03 杭州海康威视数字技术股份有限公司 Network model quantization and reasoning method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760933A (en) * 2016-02-18 2016-07-13 清华大学 Method and apparatus for fixed-pointing layer-wise variable precision in convolutional neural network
US20160328646A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Fixed point neural network based on floating point neural network quantization
CN106951962A (en) * 2017-03-22 2017-07-14 北京地平线信息技术有限公司 Compound operation unit, method and electronic equipment for neutral net
CN107239826A (en) * 2017-06-06 2017-10-10 上海兆芯集成电路有限公司 Computational methods and device in convolutional neural networks

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203624B (en) * 2016-06-23 2019-06-21 上海交通大学 Vector Quantization and method based on deep neural network
CN107688855B (en) * 2016-08-12 2021-04-13 赛灵思公司 Hierarchical quantization method and device for complex neural network
CN115841137A (en) * 2017-06-06 2023-03-24 格兰菲智能科技有限公司 Method and computing device for fixed-point processing of data to be quantized

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328646A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Fixed point neural network based on floating point neural network quantization
CN105760933A (en) * 2016-02-18 2016-07-13 清华大学 Method and apparatus for fixed-pointing layer-wise variable precision in convolutional neural network
CN106951962A (en) * 2017-03-22 2017-07-14 北京地平线信息技术有限公司 Compound operation unit, method and electronic equipment for neutral net
CN107239826A (en) * 2017-06-06 2017-10-10 上海兆芯集成电路有限公司 Computational methods and device in convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHU-CHANG ZHOU; YU-ZHI WANG; HE WEN; QIN-YAO HE; YU-HENG ZOU: "Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks", 《JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY》 *
蔡瑞初 等: "面向"边缘"应用的卷积神经网络量化与压缩方法", 《计算机应用》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111258839A (en) * 2020-02-16 2020-06-09 苏州浪潮智能科技有限公司 AI accelerator card simulation test system based on ResNet50 network and working method thereof
CN111258839B (en) * 2020-02-16 2022-11-29 苏州浪潮智能科技有限公司 AI accelerator card simulation test system based on ResNet50 network and working method thereof
CN111368972A (en) * 2020-02-21 2020-07-03 华为技术有限公司 Convolution layer quantization method and device thereof
CN111368972B (en) * 2020-02-21 2023-11-10 华为技术有限公司 Convolutional layer quantization method and device
CN113762500B (en) * 2020-06-04 2024-04-02 合肥君正科技有限公司 Training method for improving model precision during quantization of convolutional neural network
CN113762500A (en) * 2020-06-04 2021-12-07 合肥君正科技有限公司 Training method for improving model precision of convolutional neural network during quantification
CN113762497A (en) * 2020-06-04 2021-12-07 合肥君正科技有限公司 Low-bit reasoning optimization method of convolutional neural network model
CN113762497B (en) * 2020-06-04 2024-05-03 合肥君正科技有限公司 Low-bit reasoning optimization method for convolutional neural network model
CN113919479A (en) * 2020-06-30 2022-01-11 华为技术有限公司 Method for extracting data features and related device
CN111914996A (en) * 2020-06-30 2020-11-10 华为技术有限公司 Method for extracting data features and related device
CN113420788A (en) * 2020-10-12 2021-09-21 黑芝麻智能科技(上海)有限公司 Integer-based fusion convolution layer in convolutional neural network and fusion convolution method
WO2022088063A1 (en) * 2020-10-30 2022-05-05 华为技术有限公司 Method and apparatus for quantizing neural network model, and method and apparatus for processing data
CN112580492A (en) * 2020-12-15 2021-03-30 深兰人工智能(深圳)有限公司 Vehicle detection method and device
CN112990438A (en) * 2021-03-24 2021-06-18 中国科学院自动化研究所 Full-fixed-point convolution calculation method, system and equipment based on shift quantization operation
CN112990438B (en) * 2021-03-24 2022-01-04 中国科学院自动化研究所 Full-fixed-point convolution calculation method, system and equipment based on shift quantization operation
CN113850374A (en) * 2021-10-14 2021-12-28 安谋科技(中国)有限公司 Neural network model quantization method, electronic device, and medium

Also Published As

Publication number Publication date
WO2019238029A1 (en) 2019-12-19

Similar Documents

Publication Publication Date Title
CN110598839A (en) Convolutional neural network system and method for quantizing convolutional neural network
CN110363279B (en) Image processing method and device based on convolutional neural network model
CN110610237A (en) Quantitative training method and device of model and storage medium
CN112508125A (en) Efficient full-integer quantization method of image detection model
CN111488986A (en) Model compression method, image processing method and device
Hong et al. Daq: Channel-wise distribution-aware quantization for deep image super-resolution networks
EP4087239A1 (en) Image compression method and apparatus
CN110826685A (en) Method and device for convolution calculation of neural network
CN111105017B (en) Neural network quantization method and device and electronic equipment
CN109284761B (en) Image feature extraction method, device and equipment and readable storage medium
WO2020001401A1 (en) Operation method and apparatus for network layer in deep neural network
CN113780549A (en) Quantitative model training method, device, medium and terminal equipment for overflow perception
WO2022111002A1 (en) Method and apparatus for training neural network, and computer readable storage medium
Peric et al. Floating point and fixed point 32-bits quantizers for quantization of weights of neural networks
CN112990438A (en) Full-fixed-point convolution calculation method, system and equipment based on shift quantization operation
CN111937011A (en) Method and equipment for determining weight parameters of neural network model
CN112561050B (en) Neural network model training method and device
CN115062777B (en) Quantization method, quantization device, equipment and storage medium of convolutional neural network
CN115705486A (en) Method and device for training quantitative model, electronic equipment and readable storage medium
CN117348837A (en) Quantization method and device for floating point precision model, electronic equipment and storage medium
CN111614358B (en) Feature extraction method, system, equipment and storage medium based on multichannel quantization
CN113850374A (en) Neural network model quantization method, electronic device, and medium
CN114065913A (en) Model quantization method and device and terminal equipment
CN114222997A (en) Method and apparatus for post-training quantization of neural networks
CN114386469A (en) Method and device for quantizing convolutional neural network model and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191220

RJ01 Rejection of invention patent application after publication