CN110929862B - Fixed-point neural network model quantification device and method - Google Patents

Fixed-point neural network model quantification device and method Download PDF

Info

Publication number
CN110929862B
CN110929862B CN201911174616.XA CN201911174616A CN110929862B CN 110929862 B CN110929862 B CN 110929862B CN 201911174616 A CN201911174616 A CN 201911174616A CN 110929862 B CN110929862 B CN 110929862B
Authority
CN
China
Prior art keywords
operator
model
processor
input
quantization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911174616.XA
Other languages
Chinese (zh)
Other versions
CN110929862A (en
Inventor
陈子祺
田甲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201911174616.XA priority Critical patent/CN110929862B/en
Publication of CN110929862A publication Critical patent/CN110929862A/en
Application granted granted Critical
Publication of CN110929862B publication Critical patent/CN110929862B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

The application relates to a fixed-point neural network model quantification method and device. The method comprises the following steps: and (3) checking: the check graph model is a directed acyclic graph, and the multi-input graph model is converted into a single-input model; the preparation stage: the graph model is subjected to equivalent conversion so as to facilitate subsequent quantization; a scale stage: inputting all samples, executing in a floating point model, counting the output of each operator in the model, and predicting possible output threshold values of operators in all samples according to the characteristics of output data; quantization stage: and carrying out fixed-point conversion on operators according to the topological order of the model. The method and the device can effectively reduce the storage and calculation cost of the model, eliminate uncertainty caused by rounding errors in floating point operation, and improve the high efficiency, transparency and safety of the deep neural network model.

Description

Fixed-point neural network model quantification device and method
Technical Field
The application relates to a fixed-point neural network model quantification device and method, which are applicable to the technical field of artificial intelligence.
Background
The deep neural network model is widely applied to machine vision tasks such as image classification, object detection and the like, and has achieved great success. However, the storage and computation of neural network models on embedded chips and specifically designed neural network chips remains a significant challenge due to memory space and power consumption limitations. Meanwhile, the existing neural network model is designed by only considering the accuracy, but not the reproducibility and consistency of operation, and the operation result is possibly inconsistent under different architectures and even the same computing environment. The application of the neural network algorithm in fields with higher security requirements such as finance, trusted computing, blockchain, intelligent contracts and the like is greatly limited by the influence.
Model localization, namely, converting floating point operation of a deep neural network into shaping operation, can solve the problems in two aspects. One of the most widely adopted model compression methods in the field of deep learning can reduce the storage and calculation cost of a model, and the two-localization shaping operation can avoid rounding errors in floating point operation and eliminate uncertainty in operation.
The existing mainstream quantization method maps a parameter domain onto a discrete integer domain, for example, maps a convolution kernel parameter onto an INT8 integer domain, and can be divided into symmetric quantization and asymmetric quantization according to whether the discrete integer domain is symmetric or not, and meanwhile, whether quantization is performed based on a model channel or not can be considered, and whether zero offset is required to be increased or not so as to improve the model quantization capability. However, on the one hand, the existing model quantization technology is not mature enough, and the accuracy of the model can not be effectively ensured while the performance of the model is improved. On the other hand, in the existing model quantization device, acceleration is only performed for some specific operators (such as convolution operator and matrix multiplication operator), but a large number of floating point number intermediate values still exist in the calculation process, and rounding errors in model operation still cannot be completely avoided by the semi-shaping quantization.
Disclosure of Invention
The purpose of the application is to provide a fixed-point neural network model quantization device and method, which can effectively reduce the storage and calculation cost of the model, eliminate uncertainty caused by rounding errors in floating point operation and improve the efficiency, transparency and safety of a deep neural network model.
The application relates to a neural network model quantization device of localization, including:
model memory: configured to store at least one model;
a data storage: configured to store data;
operator localization processor: configured to execute at least one program to pinpoint operators in the neural network;
and the central processing unit: and reading the model and the data from the model memory and the data memory, calling the corresponding operators in the operator localization processor, and counting the output results of each operator in the process of actually executing the sample data.
Preferably, the central processor comprises the following program elements:
a reading program unit that reads the model from the model memory and reads the sample data from the data memory;
the checking program unit is used for performing topological ordering on the operators on the model and calling the operators according to the sequence to execute checking configuration of the corresponding operators in the processor;
a preparation program unit for performing topological ordering of operators on the model, and calling the operators according to the sequence to execute preparation configuration of the corresponding operators in the processor;
a scaler unit that counts output results of respective operators inside when actually executing the sample data, based on the read sample data;
and the quantization program unit is used for performing topological sequencing on the operators on the model and calling quantization configuration of the corresponding operators in the operator fixed-point processor according to the sequence.
Preferably, the apparatus further comprises re-quantization means configured to perform a reshaped data precision re-quantization procedure; each operator is configured as a pluggable operator execution processor; the tanh, the simgaid and the exp operator processor configure the quantization stage, the original floating point number corresponding to the input shaping data is mapped to the discrete domain INT16 by adopting a table look-up method, an index table is established, and the processor is fixed to transform the operator into an index instruction and the index table; more preferably, the tanh, simgoid, exp operator processor configures the maximum input precision to be 16.
Preferably, the softmax operator processor configures a quantization stage, adopts a table look-up method and a shaping data operation, namely, firstly quantizing according to the operator table look-up method, and then shaping and adding division; wherein, the floating point in the original expression method expects discrete operation to be the nearest rounding after the floating point is fixed by a legal method, so the floating point is converted into half of the numerator which needs to be added with denominator after the integer division; more preferably, the softmax operator processor is configured to have a maximum precision of 16.
Preferably, the convolution, matrix multiplication processor configures the verification phase to support only 2D-NCHW input formats; the convolution, matrix multiplication processor configuration preparation stage, when the result of matrix multiplication causes data overflow, it is necessary to carry out matrix decomposition operation, and the large matrix multiplication operator is converted into a plurality of small matrix operators to be combined and added; the convolution and matrix multiplication processor configures a quantization stage, and the original floating point convolution and matrix multiplication operations can be equivalently converted into shaping equivalent operators; more preferably, the convolution, matrix multiplication processor configures a maximum precision of 8.
Preferably, the normalization operator processor configures the preparation phase, and the normalization operation can be equivalently converted into matrix multiplication and addition; or the normalization operator processor configures the preparation phase, if the input data is the convolution operation result, the normalization operation can be combined into the convolution operation.
Preferably, the relu operator processor input precision is not limited; a configuration preparation stage, wherein if the child node is a transposition operation, the node and the child node are sequentially exchanged; other phases employ default operations.
Preferably, the auto-extended matrix multiplier processor configures the input precision 16, with other stages using default operations; or alternatively
The dimension addition operator processor configures the input precision to be 8, and other stages adopt default operation; or alternatively
The matrix adding and subtracting operator processor configures the input precision to be 16, and other stages adopt default operation; or alternatively
The automatic expansion matrix adding and subtracting operator processor configures the input precision to be 16, and other stages adopt default operation; or alternatively
The input precision of the matrix linking operator processor is not limited, and default operation is adopted in other stages; or alternatively
The input precision of the embedded operator processor is not limited, and other stages adopt default operation; or alternatively
The input precision of the maximum value pooling operator processor is not limited, and other stages adopt default operation; or alternatively
The processor of the average value pooling operator configures a verification stage, and when the pooling kernel window slides, the calculated average value at least comprises a peripheral filling pane; or alternatively
The input precision of the matrix interception and truncation operator processor is not limited, and default operation is adopted in other stages; or alternatively
Matrix inversion, dimension repetition, repeated linking operator processor input precision without limitation, and other stages adopting default operation; or alternatively
Dimension expansion and elimination, the input precision of the reforming operator processor is not limited, and default operation is adopted in other stages; or alternatively
The input precision of the transpose operator processor is not limited; a configuration preparation stage, if the input data is a transposition operation result, a transposition operation can be synthesized; other stages adopt default operation; or alternatively
Flattening, maximum, minimum and supersampling operator processor input precision is not limited; other phases employ default operations.
The application also relates to a fixed-point neural network model quantification method, which uses the neural network model quantification device, and comprises the following steps:
and (3) checking: the check graph model is a directed acyclic graph, and the multi-input graph model is converted into a single-input model;
the preparation stage: the graph model is subjected to equivalent conversion so as to facilitate subsequent quantization;
a scale stage: inputting all samples, executing in a floating point model, counting the output of each operator in the model, and predicting possible output threshold values of operators in all samples according to the characteristics of output data;
quantization stage: and carrying out fixed-point conversion on operators according to the topological order of the model.
Preferably, the method further comprises a re-quantization stage, wherein a maximum precision of the input data is set for the operator, and when the precision of the input data of the operator is greater than the set maximum precision, the input data is subjected to a reduced precision process.
The application also relates to a computer-readable medium storing instructions that cause a computer to:
(1) The check graph model is a directed acyclic graph, and the multi-input graph model is converted into a single-input model;
(2) The graph model is subjected to equivalent conversion so as to facilitate subsequent quantization;
(3) Inputting all samples, executing in a floating point model, counting the output of each operator in the model, and predicting possible output threshold values of operators in all samples according to the characteristics of output data;
(4) And carrying out fixed-point conversion on operators according to the topological order of the model.
Preferably, instructions that cause the computer to: setting the maximum precision of the input data for the operator, and performing precision reduction processing on the input data when the precision of the input data of the operator is greater than the set maximum precision.
The fixed-point neural network model quantification device and method have the following technical advantages:
(1) The equivalent transformation design of various graphs can effectively reduce the calculated amount of the graphs, improve the execution efficiency and transparency of the deep neural network model, and enable the deep neural network model to be better applied to embedded chips and neural network reasoning chips;
(2) Operator protocols of the fixed-point model are docked, and conversion from a common floating-point model to the fixed-point model is realized;
(3) The full integer quantization can be realized in the quantization process, no floating point data exists in the execution process, and the rounding error in the floating point operation is eliminated so as to achieve the certainty of model calculation;
(4) Better precision can be obtained in the quantization process, more subsequent complex fine adjustment operations of the model are not needed, and the use is convenient.
Drawings
Fig. 1 is a schematic diagram of a data discretization method in the present application.
Fig. 2 is a schematic diagram of a localization neural network model quantization apparatus of the present application.
Fig. 3 is a schematic diagram of a processing flow module of the central processing unit of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in detail hereinafter with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be arbitrarily combined with each other.
According to a first aspect of the present application, a process of a fixed-point neural network model quantization method includes:
and (3) checking: the check graph model is a directed acyclic graph, and the multi-input graph model is converted into a single-input model; the method comprises the steps that a graph model has no repeated naming symbol, useless parameters of the model are removed, and if the graph model is checked to be out of the standard, error reporting information can be fed back to a user; wherein, the related check of the operator can be carried out, and the content of the part can refer to the configuration in the subsequent operator fixed-point processor;
the preparation stage: the graph model is subjected to equivalent conversion so as to facilitate subsequent quantization, and operator related equivalent conversion can refer to the configuration of a subsequent operator fixed-point processor;
a scale stage: all samples are input, the operation is carried out in a floating point model, the output of each operator in the model is counted, and possible output threshold values of operators in all samples are predicted according to the maximum and minimum values, the mean value, the variance and other characteristics of output data. Taking the actual factors and time cost into consideration, collecting a small part of samples (16 samples are used in the test) to simulate all data prediction result characteristics, wherein the actual quantization effect is not inferior to all sample data;
quantization stage: performing fixed-point conversion on the operators according to the topological order, wherein the fixed-point operation enables the operators to be converted from the received floating point data to the acceptable shaping data with the fixed scaling factor m; thus, when an operator is processed, it can be generalized that the input data X thereof is mapped onto the shaping field INTp assuming that the input data X has been fixed-point, where p is defined as the precision of the input X and the fixed scale factor is m. The operator should accept the input in the original floating point model as x=x/m. An obvious example is when the operator logic abstraction is a homogeneous operation, i.e. the operator satisfies the requirement m×f (x) =f (mx), then the shaping data of the operator is input, and the output data will also have the same scaling factor m.
Re-quantization stage: considering that the operator increases in the range of the number domain after manipulating the data, an obvious example is matrix multiplication, with the result that the data accuracy is theoretically twice that of the input data. The memory of the quantization device adopts INT32 occupation space by default, the data precision of model input data can be gradually increased after the model input data is executed by a plurality of operators until the data overflow occurs beyond INT32, and the uncertainty of the result is brought. Therefore, the operator can set the maximum precision q of the input data, and data overflow is prevented from occurring when the data is executed in the operator, so that when the precision p of the input data of the operator is larger than q, the input is required to be processed with reduced precision, and the step is defined as re-quantization operation, which belongs to the preferred operation.
The method is characterized in that input and output data are mapped to a discrete integer domain by taking reference to a part of main stream quantization method thought, the discretization method of the data is as shown in fig. 1, oblique lines are actual data, horizontal lines are quantized data distribution, and mathematics are strictly described as floating point rounding.
According to a second aspect of the present application, there is provided a fixed-point neural network model quantization apparatus, as shown in fig. 2, comprising:
model memory: configured to store at least one model;
a data storage: configured to store data, including sample data, intermediate data results, final data results, and the like;
operator localization processor: an operator execution processor configured to execute at least one program to fix operators in the neural network, operators being present in the model, the operator execution processor configured to be pluggable for each operator;
and a re-quantization device: configured to perform a reshaped data precision re-quantization procedure, calculating simulated parameters for floating point operations using reshaping; given the shaping domain M to which the floating point input maps: INT (p), with scaling factor sp, expects the addition of data to map to the shaping domain N: on INT (q), the scaling factor is sq, and the expression n=m (sq/sp), i.e. M times the scaling factor (sq/sp), is given. After the re-quantization device executes the floating point data division, the floating point result hardware binary representation is directly extracted to have sq/sp=a/2^b according to the IEEE floating point data representation standard, and an operation instruction (M.smata) > > b is returned.
And the central processing unit: and reading the model and the data from the model memory and the data memory, calling the corresponding operators in the operator localization processor, counting the output results of each operator in the process of actually executing the sample data, and carrying out re-quantization processing on the data, wherein the specific configuration is shown in fig. 3.
FIG. 3 shows a process flow module of the CPU, which coordinates other components to process the neural network, implement the fixed-point conversion of the model, and internally is a series of program units.
The reader program unit 31 reads the model from the model memory and reads the sample data from the data memory.
The checker unit 32 implements a checker phase that performs a topological ordering of operators on the model, invoking the operators in order to perform a checker configuration of the corresponding operators in the processor.
The preparation program unit 33 implements a preparation phase that performs a topological ordering of the operators on the model, invoking the operators in order to perform a preparation configuration of the corresponding operators in the processor.
The scaling program unit 34 implements a scaling stage, and counts the output results of the internal operators when actually executing the sample data, and generalizes the features of the maximum and minimum values, the mean value, the variance and the like, so as to facilitate calculation of scaling factors in the subsequent execution of re-quantization operations.
The quantization program unit 35 implements a quantization phase that performs a topological ordering of the operators for the model, invoking the quantization configuration of the corresponding operators in the operator-spotting processor in sequence. As described above, firstly, the maximum precision of the input data is obtained from the operator processor, and if the input data precision is detected to be greater than the maximum precision, the re-quantization device is required to be called for performing the data precision reduction processing; and then invoking the quantization configuration of the operator processor to process the reduced precision data.
Referring to fig. 2, the neural network localization device quantifies the model into a full-shaping network by the interaction of the central processing unit with the model memory and the data memory, and executing a software method configured in the central processing unit in cooperation with the pluggable operator localization processor. The operator localization processor is configured to be pluggable, firstly, because of the concise design, all operators provide the same interface, the localization expandability of the device is improved, more operators can be configured, and more updated models can be continuously supported; secondly, the operator processor is convenient to install and deploy by hot plug, and different fixed-point conversion devices can be configured according to different application scenes.
All operators are needed to be configured with the three-stage software methods, namely a verification stage, a preparation stage and a quantization stage. Some concepts involved in the program method configured in the existing operator localization processor are described below:
constant cancellation: in the preparation stage, three types of nodes exist in the neural network, namely input, parameters and operators, wherein the parameters are known variables, and the operators are logic abstractions of data operation. Assuming that the inputs of an operator are parameters or that there is no input, the operator can be calculated ahead of time on the processor as a result, i.e. the operator can be constant eliminated.
Transpose cancellation: transpose refers to transforming a matrix of (M1, M2, M3, …, mk) dimensional size into (N1, N2, N3, …, nk), where N is a rearrangement of the M arrays; for some specific neural networks, a large number of transpose operations may be eliminated. For example two consecutive transposes can be eliminated as one transpose.
The above is a common concept in the neural network localization method, and the configuration method in the operator localization processor in the neural network localization device is described in detail below, the default operation assumes that the operator is the homogeneous operation described above, the output scaling factor is the input scaling factor, and the operator operation logic is not changed. It should be noted that, the operator setting in the application can be freely combined and selected according to different processing objects and working contents as required, and the operator setting does not need to be set at the same time.
The input precision of the relu operator processor is not limited; a configuration preparation stage, wherein if the child node is a transposition operation, the node and the child node are sequentially exchanged; other phases employ default operations.
the maximum input precision of the tanh, the simgaid and the exp operator processor is configured to be 16;
the tanh, siminoid and exp operator processor configures quantization stage, these operators are nonlinear functions, and normal shaping operators can not be adopted to simulate floating point calculation, so that a TABLE look-up method is adopted to map the original floating point number corresponding to the input shaping data to a discrete domain INT16, an index TABLE TABLE is established, and the processor is fixed to convert the operators into index instructions and index TABLEs; other phases employ default operations. For example, assuming that the input shaping data is an INT8 threshold, each of the input INT8 values corresponds to an original input floating point data, and an original floating point operator calculates an original output floating point data, and the floating point output data is mapped onto an INT16 discrete domain by default, that is, an output shaping value, in short, a two-dimensional array table may be built, and each of the input INT8 shaping values is directly mapped to its shaping value result, which is called a table lookup method.
The softmax operator processor configures the maximum precision to be 16;
the softmax operator processor configures a quantization stage, adopts a table look-up method and a shaping data operation, adopts an operator logic abstract expression as Y [ I ] = exp (I)/sum (ej, j in X), quantizes the exp (I) expression according to the operator exp table look-up method, and then performs shaping addition and division. Wherein, the expected discrete operation after floating point division in the original expression method is rounded nearby, so the numerator after conversion to integer division needs half of denominator, and the mathematical expression after quantization is Y [ I ] =round (TABLE (I)/TOTAL) = (TBALE (I) +total/2)/TOTAL, wherein total=sum (TABLE (j), j in X); other phases employ default operations.
A Convolution (Convolution), matrix multiplication (transform) processor configures a maximum precision of 8;
the Convolution (matrix) processor configures the verification phase to support only the 2D-NCHW input format.
A Convolution (Convolution), matrix multiplication (transform) processor configures a preparation phase, and performs a matrix decomposition method: vector multiplication leads to a large number of multiply-add operations, INT16 is the multiplied input data of INT8, and under the 32-bit occupation space representation, data overflow can be caused by adding K >65536 times theoretically. When K of matrix product A x B meets the above condition, matrix decomposition operation is needed to convert large matrix product operator into numerous small matrix operators for merging and adding.
The Convolution (constellation), matrix multiplication (transform) processor configures the quantization stage, and the original mathematical expression y=x×w+b, assuming that there is x=xi×sx, w=wi×sw, where Xi, wi are shaping data, sx, sw are scaling factors. The original mathematical expression is equivalent to y=xi×sxwi+b=xi×wi (sx×sw) +b, let the scaling factor of the offset B be sx×sw, and b=bi (sx×sw), where the first bracket is a shaped convolution operation, and sx×sw is the scaling factor carried by the convolution operation output. The original floating point convolution and matrix multiplication operations can be equivalently converted into shaping equivalent operators, wherein the scaling factors of the input X, W and B are Sx and Sw respectively, (Sx is Sw), and the output scaling factor is Sx is Sw.
The normalization (BatchNorm) operator processor configuration preparation stage has an operator logic abstract expression of Y= (X-mean)/var gamma+beta=X (1/var) + (beta-mean gamma/var) =X gamma+b, i.e. the normalization operation can be equivalently converted into matrix multiplication and addition, and other stages do not need configuration.
In the preparation stage of configuration of the normalization (BatchNorm) operator processor, if the input data is the convolution operation result, the above expression can be written as y=x×a+b= (d×w+b) ×a+b=d (w×a) + (b×a+b), and the convolution operation with the weight w×a and the offset b×a+b, i.e., the normalization operation can be combined into the convolution operation.
An auto-extended matrix multiplication (broadcast multiply) operator processor configuration input precision 16; other phases employ default operations.
A dimension addition (sum over axis) operator processor configures the input precision to be 8; other phases employ default operations.
Matrix addition and subtraction operator processor configuration input precision is 16; other phases employ default operations.
An automatic expansion matrix adding and subtracting (broadcast_add, broadcast_sub) operator processor is configured with the input precision of 16; other phases employ default operations.
The input precision of a matrix linking (connectate) operator processor is not limited; other phases employ default operations.
The input precision of an embedded (Embedding) operator processor is not limited; other phases employ default operations.
The maximum value pooling operator (max pooling) processor input precision is not limited; other phases employ default operations.
An average pooling operator (average pooling) processor configures a verification stage, and when the pooled kernel window slides, the calculated average value at least includes a peripheral patch pane.
An average pooling operator (average pooling) processor configures a preparation stage, an operator logic abstract expression is y=sum { kernel (X) }/size of kernel, and the operator logic abstract expression is equivalently converted into a convolution operation with a kernel size of kernel and values of 1/size of kernel, and other stages do not need configuration.
Matrix interception (slice), the input precision of the interception (clip) operator processor is not limited; other phases employ default operations.
Matrix inversion (negative), dimension repetition (repeat), repeated link (tile) operator processor input precision is not limited; other phases employ default operations.
Dimension expansion (expand dims), elimination (squeeze), reforming (reshape) operator processor input precision is not limited; other phases employ default operations.
The transpose (transfer) operator processor input precision is not limited; a configuration preparation stage, if the input data is a transposition operation result, a transposition operation can be synthesized; other phases employ default operations.
Flattening (flat), maximum, minimum (max, min), supersampling (supersampling) operator processor input accuracy is not limited; other phases employ default operations.
Although the embodiments disclosed in the present application are described above, the descriptions are merely for facilitating understanding of the present application, and are not intended to limit the present application. Any person skilled in the art to which this application pertains will be able to make any modifications and variations in form and detail of implementation without departing from the spirit and scope of the disclosure, but the scope of the patent claims of this application shall be subject to the scope of the claims that follow.

Claims (12)

1. A fixed-point neural network model quantization apparatus, comprising:
model memory: configured to store at least one model;
a data storage: configured to store data;
operator localization processor: configured to execute at least one program to pinpoint operators in the neural network;
and the central processing unit: reading the model and the data from the model memory and the data memory, calling the corresponding operators in the operator localization processor, and counting the output results of each operator in the process of actually executing the sample data; wherein the central processing unit comprises the following program units:
a reading program unit that reads the model from the model memory and reads the sample data from the data memory;
the checking program unit is used for performing topological ordering on the operators on the model and calling the operators according to the sequence to execute checking configuration of the corresponding operators in the processor;
a preparation program unit for performing topological ordering of operators on the model, and calling the operators according to the sequence to execute preparation configuration of the corresponding operators in the processor;
a scaler unit that counts output results of respective operators inside when actually executing the sample data, based on the read sample data;
and the quantization program unit is used for performing topological sequencing on the operators on the model and calling quantization configuration of the corresponding operators in the operator fixed-point processor according to the sequence.
2. The neural network model quantization device of claim 1, further comprising a re-quantization device configured to perform a reshaped data precision re-quantization procedure.
3. The neural network model quantization apparatus of claim 1 or 2, wherein each operator is configured as a pluggable operator execution processor.
4. The neural network model quantization device according to claim 1 or 2, wherein the tanh, simkey, exp operator processor configures the quantization stage, maps the original floating point number corresponding to the input shaping data to the discrete domain INT16 by using a table look-up method, creates an index table, and the processor points to transform the operator into an index instruction and the index table.
5. The neural network model quantization device according to claim 1 or 2, wherein the softmax operator processor configures the quantization stage to perform the table look-up and the shaping data operation, i.e. quantization is performed according to the operator table look-up first, and then shaping addition and division are performed; in the original expression, floating point division method is utilized to multiply the floating point division method, and then discrete operation is expected to be a nearby rounding, so that the floating point division method is converted into half of the numerator which needs to be added with denominator after the integer division.
6. The neural network model quantization device of claim 1 or 2, wherein the convolution, matrix multiplication processor configures a verification phase to support only 2D-NCHW input formats; the convolution, matrix multiplication processor configuration preparation stage, when the result of matrix multiplication causes data overflow, it is necessary to carry out matrix decomposition operation, and the large matrix multiplication operator is converted into a plurality of small matrix operators to be combined and added; the convolution and matrix multiplication processor configures a quantization stage, and the original floating point convolution and matrix multiplication operations can be equivalently converted into shaping equivalent operators.
7. The neural network model quantization apparatus of claim 1 or 2, wherein the normalization operator processor configures a preparation phase, the normalization operation being equivalently converted into matrix multiplication and addition; or alternatively
The normalization operator processor configures the preparation phase and if the input data is the result of a convolution operation, the normalization operation may be incorporated into the convolution operation.
8. The neural network model quantization device of claim 1 or 2, wherein the relu operator processor input accuracy is not limited; a configuration preparation stage, wherein if the child node is a transposition operation, the node and the child node are sequentially exchanged; other phases employ default operations.
9. The neural network model quantization apparatus of claim 1 or 2, wherein the auto-extended matrix multiplier processor configures the input precision 16, and other stages employ default operations; or alternatively
the maximum input precision of the tanh, the simgaid and the exp operator processor is configured to be 16; or alternatively
The softmax operator processor configures the maximum precision to be 16; or alternatively
Convolving, and configuring the maximum precision of the matrix multiplication processor to be 8; or alternatively
The dimension addition operator processor configures the input precision to be 8, and other stages adopt default operation; or alternatively
The matrix adding and subtracting operator processor configures the input precision to be 16, and other stages adopt default operation; or alternatively
The automatic expansion matrix adding and subtracting operator processor configures the input precision to be 16, and other stages adopt default operation; or alternatively
The input precision of the matrix linking operator processor is not limited, and default operation is adopted in other stages; or alternatively
The input precision of the embedded operator processor is not limited, and other stages adopt default operation; or alternatively
The input precision of the maximum value pooling operator processor is not limited, and other stages adopt default operation; or alternatively
The processor of the average value pooling operator configures a verification stage, and when the pooling kernel window slides, the calculated average value at least comprises a peripheral filling pane; or alternatively
The input precision of the matrix interception and truncation operator processor is not limited, and default operation is adopted in other stages; or alternatively
Matrix inversion, dimension repetition, repeated linking operator processor input precision without limitation, and other stages adopting default operation; or alternatively
Dimension expansion and elimination, the input precision of the reforming operator processor is not limited, and default operation is adopted in other stages; or alternatively
The input precision of the transpose operator processor is not limited; a configuration preparation stage, if the input data is a transposition operation result, a transposition operation can be synthesized; other stages adopt default operation; or alternatively
Flattening, maximum, minimum and supersampling operator processor input precision is not limited; other phases employ default operations.
10. A method for quantifying a neural network model using the neural network model quantifying apparatus according to any one of claims 1 to 9, comprising the steps of:
and (3) checking: the check graph model is a directed acyclic graph, and the multi-input graph model is converted into a single-input model;
the preparation stage: the graph model is subjected to equivalent conversion so as to facilitate subsequent quantization;
a scale stage: inputting all samples, executing in a floating point model, counting the output of each operator in the model, and predicting possible output threshold values of operators in all samples according to the characteristics of output data;
quantization stage: and carrying out fixed-point conversion on operators according to the topological order of the model.
11. The neural network model quantization method of claim 10, further comprising a re-quantization stage in which a maximum precision of the input data is set for the operator, and the input data is subjected to a reduced precision process when the operator input data precision is greater than the set maximum precision.
12. A computer-readable medium storing instructions for causing a computer to perform the neural network model quantization method of claim 10 or 11.
CN201911174616.XA 2019-11-26 2019-11-26 Fixed-point neural network model quantification device and method Active CN110929862B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911174616.XA CN110929862B (en) 2019-11-26 2019-11-26 Fixed-point neural network model quantification device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911174616.XA CN110929862B (en) 2019-11-26 2019-11-26 Fixed-point neural network model quantification device and method

Publications (2)

Publication Number Publication Date
CN110929862A CN110929862A (en) 2020-03-27
CN110929862B true CN110929862B (en) 2023-08-01

Family

ID=69852012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911174616.XA Active CN110929862B (en) 2019-11-26 2019-11-26 Fixed-point neural network model quantification device and method

Country Status (1)

Country Link
CN (1) CN110929862B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468935B (en) * 2020-05-08 2024-04-02 上海齐感电子信息科技有限公司 Face recognition method
CN112200296B (en) * 2020-07-31 2024-04-05 星宸科技股份有限公司 Network model quantization method and device, storage medium and electronic equipment
CN114492778A (en) * 2022-02-16 2022-05-13 安谋科技(中国)有限公司 Operation method of neural network model, readable medium and electronic device
CN115019150B (en) * 2022-08-03 2022-11-04 深圳比特微电子科技有限公司 Target detection fixed point model establishing method and device and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760933A (en) * 2016-02-18 2016-07-13 清华大学 Method and apparatus for fixed-pointing layer-wise variable precision in convolutional neural network
CN109409514A (en) * 2018-11-02 2019-03-01 广州市百果园信息技术有限公司 Fixed-point calculation method, apparatus, equipment and the storage medium of convolutional neural networks
CN109697083A (en) * 2018-12-27 2019-04-30 深圳云天励飞技术有限公司 Fixed point accelerated method, device, electronic equipment and the storage medium of data
CN109902745A (en) * 2019-03-01 2019-06-18 成都康乔电子有限责任公司 A kind of low precision training based on CNN and 8 integers quantization inference methods
CN110084739A (en) * 2019-03-28 2019-08-02 东南大学 A kind of parallel acceleration system of FPGA of the picture quality enhancement algorithm based on CNN
CN110135580A (en) * 2019-04-26 2019-08-16 华中科技大学 A kind of full integer quantization method and its application method of convolutional network
CN110163359A (en) * 2018-02-13 2019-08-23 上海寒武纪信息科技有限公司 A kind of computing device and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11562208B2 (en) * 2018-05-17 2023-01-24 Qualcomm Incorporated Continuous relaxation of quantization for discretized deep neural networks

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760933A (en) * 2016-02-18 2016-07-13 清华大学 Method and apparatus for fixed-pointing layer-wise variable precision in convolutional neural network
CN110163359A (en) * 2018-02-13 2019-08-23 上海寒武纪信息科技有限公司 A kind of computing device and method
CN109409514A (en) * 2018-11-02 2019-03-01 广州市百果园信息技术有限公司 Fixed-point calculation method, apparatus, equipment and the storage medium of convolutional neural networks
CN109697083A (en) * 2018-12-27 2019-04-30 深圳云天励飞技术有限公司 Fixed point accelerated method, device, electronic equipment and the storage medium of data
CN109902745A (en) * 2019-03-01 2019-06-18 成都康乔电子有限责任公司 A kind of low precision training based on CNN and 8 integers quantization inference methods
CN110084739A (en) * 2019-03-28 2019-08-02 东南大学 A kind of parallel acceleration system of FPGA of the picture quality enhancement algorithm based on CNN
CN110135580A (en) * 2019-04-26 2019-08-16 华中科技大学 A kind of full integer quantization method and its application method of convolutional network

Also Published As

Publication number Publication date
CN110929862A (en) 2020-03-27

Similar Documents

Publication Publication Date Title
CN110929862B (en) Fixed-point neural network model quantification device and method
CN105844330B (en) The data processing method and neural network processor of neural network processor
WO2020123185A1 (en) Residual quantization for neural networks
WO2015125714A1 (en) Method for solving convex quadratic program for convex set
WO2020176250A1 (en) Neural network layer processing with normalization and transformation of data
WO2022021834A1 (en) Neural network model determination method and apparatus, and electronic device, and medium, and product
CN112884146A (en) Method and system for training model based on data quantization and hardware acceleration
CN110781686A (en) Statement similarity calculation method and device and computer equipment
CN110807428B (en) Coal sample identification method, device, server and storage medium
CN116644804A (en) Distributed training system, neural network model training method, device and medium
CN116762080A (en) Neural network generation device, neural network operation device, edge device, neural network control method, and software generation program
CN116681127B (en) Neural network model training method and device, electronic equipment and storage medium
CN113409307A (en) Image denoising method, device and medium based on heterogeneous noise characteristics
CN115953651B (en) Cross-domain equipment-based model training method, device, equipment and medium
CN109582917B (en) Signal denoising method and device based on SSA, terminal equipment and storage medium
CN114372539B (en) Machine learning framework-based classification method and related equipment
CN114065913A (en) Model quantization method and device and terminal equipment
CN112561050B (en) Neural network model training method and device
CN115965062A (en) FPGA (field programmable Gate array) acceleration method for BERT (binary offset Transmission) middle-layer normalized nonlinear function
CN115147283A (en) Image reconstruction method, device, equipment and medium
US20200134434A1 (en) Arithmetic processing device, learning program, and learning method
CN114595627A (en) Model quantization method, device, equipment and storage medium
CN114730331A (en) Data processing apparatus and data processing method
CN116450636B (en) Internet of things data completion method, equipment and medium based on low-rank tensor decomposition
CN114692879B (en) Quantum preprocessing method and device based on sparse linear system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant