CN112085186B - Method for determining quantization parameter of neural network and related product - Google Patents

Method for determining quantization parameter of neural network and related product Download PDF

Info

Publication number
CN112085186B
CN112085186B CN201910888626.3A CN201910888626A CN112085186B CN 112085186 B CN112085186 B CN 112085186B CN 201910888626 A CN201910888626 A CN 201910888626A CN 112085186 B CN112085186 B CN 112085186B
Authority
CN
China
Prior art keywords
data
quantization
quantized
bit width
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910888626.3A
Other languages
Chinese (zh)
Other versions
CN112085186A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Publication of CN112085186A publication Critical patent/CN112085186A/en
Application granted granted Critical
Publication of CN112085186B publication Critical patent/CN112085186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1476Error detection or correction of the data by redundancy in operation in neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/81Threshold
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/865Monitoring of software

Abstract

The embodiment of the application discloses a method for determining quantization parameters of a neural network and related products, wherein a board card in the related products comprises the following steps: a memory device, an interface device, and a control device, and an artificial intelligence chip; wherein the artificial intelligent chip is respectively connected with the storage device, the control device and the interface device; the storage device is used for storing data; the interface device is used for realizing data transmission between the artificial intelligent chip and external equipment; the control device is used for monitoring the state of the artificial intelligent chip. The board card may be used to perform artificial intelligence operations.

Description

Method for determining quantization parameter of neural network and related product
Related application:
the application is filed on the following 6 th and 27 th days of 2019, and the application number is 201910570125.0, and the invention is entitled "a method for determining quantization parameters of a neural network and a priority of related products".
The application claims priority to the method and the device for quantifying the neural network and related products, which are filed on 6 th and 12 th of 2019, and are provided with application number 201910505239.7.
The application claims priority to a quantization parameter adjustment method, a quantization parameter adjustment device and related products, which are filed on 6 and 18 days of 2019 and have application number 201910528537.8.
The application requires the priority of a neural network operation method and device and related products, which are filed on 6-14 th 2019 and are named as 201910515355.7.
Technical Field
Embodiments of the present disclosure relate to a method for determining quantization parameters of a neural network and related products.
Background
Neural Networks (NNs) are a mathematical or computational model that mimics the structure and function of biological neural networks. The neural network continuously corrects the network weight and the threshold value through training of sample data to enable the error function to descend along the negative gradient direction and approach the expected output. The method is a recognition classification model which is widely applied and is used for function approximation, model recognition classification, data compression, time sequence prediction and the like.
In practical application, the data of the neural network is commonly 32 bits, and the existing data of the neural network occupies more bits, so that although the accuracy is ensured, a higher storage space and a higher processing bandwidth are required, and the cost is increased.
Disclosure of Invention
In order to solve the above-mentioned technical problems, the disclosure provides a method for determining quantization parameters of a neural network and related products.
To achieve the above object, the present disclosure provides a quantization parameter determining method of a neural network, the method including:
traversing operators in a computational graph corresponding to the neural network, and selecting a current operator and an operator to be fused from the computational graph;
determining a split size according to the available storage capacity of the on-chip memory of the artificial intelligence processor;
splitting the output data of the operator to be fused into a plurality of data blocks according to the splitting size;
mapping to obtain the size of the data block of the input data of the current operator and the size of the data block of the intermediate data between the current operator and the operator to be fused based on the size of the data block of the output data of the operator to be fused;
the data blocks of the output data of the operator to be fused, the corresponding data blocks of the input data of the current operator and the data blocks of the intermediate data between the current operator and the operator to be fused are used as data to be quantized, and a statistical result of each type of data to be quantized is obtained; the data to be quantized comprises at least one data of neurons, weights, gradients and biases of the neural network;
Determining corresponding quantization parameters by using the statistical result of each type of data to be quantized and the data bit width; the quantization parameter is used for correspondingly quantizing the data in the operation process of the neural network by the artificial intelligence processor; the quantization parameter is a point location parameter.
To achieve the above object, the present disclosure provides a quantization parameter determining device of a neural network, including a memory and a processor, where the memory stores a computer program that can be run on the processor, and the processor implements the steps of the method when executing the computer program.
To achieve the above object, the present disclosure provides a computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the method described above.
In the neural network operation process, the technical scheme disclosed by the disclosure is utilized to determine the quantization parameter during quantization, and the quantization parameter is used for quantizing data in the neural network operation process by the artificial intelligent processor, so that high-precision data are converted into low-precision fixed point numbers, and all space sizes of data storage involved in the neural network operation process can be reduced. For example: conversion of float32 to fix8 can reduce model parameters by a factor of 4. Because the data storage space is reduced, a smaller space is used when the neural network is deployed, so that the on-chip memory on the artificial intelligent processor chip can accommodate more data, access to the data of the artificial intelligent processor chip is reduced, and the computing performance is improved
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure, not to limit the present disclosure.
FIG. 1 is a schematic diagram of a neural network architecture;
FIG. 2 is a flowchart of a method for determining quantization parameters of a neural network according to the present application;
FIG. 3 is a schematic representation of a symmetric fixed point number;
FIG. 4 is a schematic representation of a fixed point number for introducing an offset;
FIG. 5a is a graph of the magnitude of the variation of the weight data of the neural network during training;
FIG. 5b is a graph showing the magnitude of the variation of the weight data of the neural network during training;
FIG. 6 is one of the flow charts of a method of determining a target iteration interval;
FIG. 7 is a second flowchart of a method for determining a target iteration interval;
FIG. 8 is a third flowchart of a method for determining a target iteration interval;
fig. 9 is a block diagram of a hardware configuration of a quantization parameter determination apparatus of a neural network proposed in the present application;
FIG. 10 is a schematic diagram illustrating an application of the device for determining quantization parameters of a neural network in an artificial intelligence processor chip;
FIG. 11 is a functional block diagram of a quantization parameter determination apparatus of a neural network proposed in the present application;
FIG. 12 is a block diagram of a board card according to an embodiment of the present application;
FIG. 13 is a division of output data of layers to be fused;
FIG. 14 is a schematic diagram of obtaining a data block size of input data of the current layer corresponding to the output block and a data block size of intermediate data between the current layer and a layer to be fused based on the output block mapping according to one embodiment of the present application;
FIG. 15 is a process flow diagram of the determination of data to be quantized.
Detailed Description
The following description of the embodiments of the present disclosure will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the disclosure. Based on the embodiments in this disclosure, all other embodiments that may be made by those skilled in the art without the inventive effort are within the scope of the present disclosure.
It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, specification, and drawings of this disclosure are used for distinguishing between different objects and not for describing a particular sequential order. The terms "comprises" and "comprising" when used in the specification and claims of the present disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present disclosure is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the present disclosure and claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Definition of technical terms:
Floating point number: the IEEE floating point standard represents a number in the form of v= (-1)/(sign × 2 Λ E. Wherein sign is a sign bit, 0 represents a positive number, and 1 represents a negative number; e represents a step code, weighting the floating point number, the weight being the power of E (possibly the negative power) of 2; mantissa represents mantissa, and mantissa is a binary fraction ranging from 1 to 2-epsilon, or 0-epsilon. The representation of a floating point number representation in a computer is divided into three fields, which are encoded separately:
(1) A single sign bit s directly encodes the sign s.
(2) The k-bit code field encodes a code, exp=e (k-1). E (1) e (0).
(3) The small number segment mantissa of n bits encodes mantissa. But the encoding result depends on whether the step stage is all 0 s.
Fixed point number: consists of three parts, namely a shared exponent (exponents), a sign bit (sign) and a mantissa (mantissa). Wherein the shared exponent is that the exponent is shared in a real number set that needs quantization; the sign bit marks the positive and negative of the fixed point number. The mantissa determines the number of significant digits, i.e., the precision, of the fixed-point number. Taking an 8bit fixed point number type as an example, the numerical calculation method comprises the following steps:
value=(-1) sign ×(mantissa)×2 (exponent-127)
binary decimal: any decimal number can be represented by the formula Σj 10 i And (3) representing. For example, decimal number 12.34, in the formula1 is expressed as: 12.34 =1×10 1 +2*10 0 +3*10 -1 +4*10 -2 The left hand side of the decimal point is counted as the positive power of 10 and the right hand side of the decimal point is counted as the negative power of 10. Similarly, a binary fraction can be expressed in such a way that the left of the point is a positive power of 2 and the right of the point is a negative power of 2, and a decimal fraction of 5.75 can be expressed in a binary fraction of 101.11 expressed as 5.75= 1*2 2 +0*2 1 +1*2 0 +1*2 -1 +1*2 -2
Overflow-in the fixed point operator, the number representation has a range. During the operation. If the number is out of the range that the fixed point number can represent, it is called "overflow".
KL (Kullback-Leibler divergence) divergence: also known as relative entropy (relative entropy), information divergence (information divergence), information gain (information gain). KL divergence is a measure of the asymmetry of the difference between the two probability distributions P and Q. KL divergence is a measure of the number of additional bits required to encode the average of samples from P using Q-based encoding. Typically, P represents the true distribution of the data, and Q represents the theoretical distribution of the data, the model distribution, or an approximate distribution of P.
Data bit width: the data is represented by how many bits.
Quantification: the process of converting the high-precision number expressed by 32bit or 64bit into the fixed point number which occupies less memory space in the past causes a certain loss in precision.
Detailed descriptions of a method for determining quantization parameters of a neural network and specific implementations of related products according to embodiments of the present disclosure are provided below with reference to the accompanying drawings.
Neural Networks (NNs) are a mathematical model that mimics the structure and function of biological neural networks, which are computed from a large number of neuronal connections. Thus, a neural network is a computational model, consisting of a large number of nodes (or "neurons") connected to each other. Each node represents a specific output function, called an activation function (activation function). The connection between every two neurons represents a weight through the connection signal, called a weight, which corresponds to the memory of the neural network. The output of the neural network varies according to the connection between neurons and the weights and activation functions. In a neural network, neurons are the fundamental units of the neural network. It takes a certain number of inputs and a bias and multiplies a weight when the signal (value) arrives. A connection is a connection of one neuron to another layer or another neuron of the same layer, the connection being accompanied by a weight associated therewith. In addition, the bias is an additional input to the neuron, which is always 1 and has its own connection weight. This ensures that even if all inputs are empty (all 0 s), the neuron will activate.
In application, if a nonlinear function is not applied to neurons in a neural network, the neural network is simply a linear function, then it is not more powerful than a single neuron. If a neural network is enabled to output between 0 and 1, for example, in the case of cat and dog discrimination, an output close to 0 may be considered a cat and an output close to 1 may be considered a dog. To accomplish this, an activation function is introduced into the neural network, such as: sigmoid activates a function. With respect to this activation function, it is only necessary to know that its return value is a number between 0 and 1. Thus, the activation function is used to introduce nonlinearities into the neural network, which reduces the result of the neural network operation to a smaller extent. In practice, it is not important how the activation function is expressed, and it is important that a nonlinear function is parameterized by weights that can be changed by changing the weights.
As shown in fig. 1, a schematic diagram of a neural network is shown. In the neural network shown in fig. 1, three layers are included, namely an input layer, an hidden layer and an output layer, and the hidden layer shown in fig. 1 is 5 layers. The leftmost layer of the neural network is called an input layer, and neurons of the input layer are called input neurons. The input layer acts as the first layer in the neural network, accepting the desired input signals (values) and passing them on to the next layer. It generally does not operate on the input signal (value) and has no associated weights and offsets. In the neural network shown in fig. 1, there are 4 input signals x1, x2, x3, x4.
The hidden layer contains neurons (nodes). In the neural network shown in fig. 1, there are 5 hidden layers. The first hidden layer has 4 neurons (nodes), layer 2 has 5 neurons, layer 3 has 6 neurons, layer 4 has 4 neurons, and layer 5 has 3 neurons. Finally, the hidden layer passes the operational value of the neuron to the output layer. The neural network shown in fig. 1 makes a complete connection between each neuron in 5 hidden layers, i.e., each neuron in each hidden layer is connected to each neuron in the next layer. It should be noted that not the hidden layers of each neural network are completely connected.
The rightmost layer of the neural network of fig. 1 is referred to as the output layer, and the neurons of the output layer are referred to as output neurons. The output layer receives the output from the last hidden layer. In the neural network shown in fig. 1, the output layer has 3 neurons and 3 output signals y1, y2, y3.
In practical application, a large amount of sample data (including input and output) is pre-trained on the initial neural network, and after the training is completed, the trained neural network is obtained. The neural network can give a correct output for future inputs to the real environment.
Before beginning the discussion of training of neural networks, a loss function needs to be defined. The loss function is a performance function that measures the performance of a neural network in performing a particular task. In some embodiments, the loss function may be obtained as follows: in the process of training a certain neural network, each sample data is transmitted along the neural network to obtain an output value, then the difference between the output value and an expected value is squared, the calculated loss function is the distance between the predicted value and the actual value, and the aim of training the neural network is to reduce the distance or the value of the loss function. In some embodiments, the loss function may be expressed as:
in the above formula, y represents an expected value,referring to the actual result obtained by the neural network for each sample data in the sample data set, i is the index of each sample data in the sample data set. />Representing the expected value y and the actual result +.>And the error value m is the number of sample data in the sample data set, or the identification of cats and dogs is taken as an example. There is a dataset consisting of pictures of cats and dogs, corresponding tags being 1 if the picture is a dog and 0 if the picture is a cat. The label corresponds to the expected value y in the formula, and when each sample picture is transmitted to the neural network, the identification result is actually obtained through the neural network. In order to calculate the loss function, each sample picture in the sample dataset has to be traversed to obtain the corresponding actual result for each sample picture >The loss function is then calculated as defined above. If the loss function is relatively large, the neural network is not trained, and the weight value needs to be further adjusted.
The weights are randomly initialized when training the neural network begins. It is clear that the initialized neural network does not provide a good result. During training, a network with high accuracy can be obtained by training, assuming a very poor neural network.
The training process of the neural network is divided into two stages, wherein the first stage is the forward processing of signals, and the signals pass through an hidden layer from an input layer and finally reach an output layer. The second stage is to counter-propagate gradients from the output layer to the hidden layer and finally to the input layer, and sequentially adjust weights and biases of each layer in the neural network according to the gradients.
During the forward processing, an input value is input to the input layer of the neural network, and an output of a so-called predicted value is obtained from the output layer of the neural network. When an input value is provided to the input layer of the neural network, it does not perform any operation. In the hidden layers, the second hidden layer acquires the predicted intermediate result value from the first hidden layer, performs calculation operation and activation operation, and then transfers the obtained predicted intermediate result value to the next hidden layer. The same operations are performed in the later layers, and finally the output values are obtained in the output layer of the neural network.
After the forward processing, an output value called a predicted value is obtained. In order to calculate the error, the predicted value is compared with the actual output value to obtain a corresponding error value. The back propagation uses the chain law of differentiation in which the derivative of the error value corresponding to the last layer of weights of the neural network is calculated first. These derivatives are called gradients, which are then used to calculate the gradient of the penultimate layer in the neural network. This process is repeated until a gradient is obtained for each weight in the neural network. And finally, subtracting the corresponding gradient from each weight in the neural network, so that the weight is updated once, and the purpose of reducing the error value is achieved.
For the neural network, the fine tuning is to load the trained neural network, the fine tuning process is the same as the training process and is divided into two stages, wherein the first stage is forward processing of signals, the second stage is backward propagation gradient, and the weight of the trained neural network is updated. Training differs from trimming in that training is to randomly process the initialized neural network, training the neural network from scratch, rather than trimming.
In the training or fine tuning process of the neural network, each time the neural network goes through the forward processing of the signal and the back propagation process of the corresponding error, the weight in the neural network is updated once by using the gradient, and the process is called iteration (iteration). In order to obtain a neural network with a precision that meets expectations, a very large sample data set is required during the training process. In this case, it is impossible to input the sample data set into the computer at one time. Therefore, in order to solve this problem, it is necessary to divide the sample data set into a plurality of blocks, each block is transferred to the computer, and the weights of the neural network are updated once after each block of data set is processed forward. When a complete sample data set has been processed forward through the neural network and a weight update has been returned accordingly, this process is referred to as a cycle (epoch). In practice, it is not enough to transfer the complete data set once in the neural network, and the complete data set needs to be transferred multiple times in the same neural network, that is, multiple periods are needed, so as to finally obtain the neural network with the accuracy meeting the expectations.
In the training or fine tuning of neural networks, it is generally desirable that the faster the speed, the better the accuracy, and the higher the accuracy. The data of the neural network is represented by a high-precision data format, such as floating point numbers, so that the data involved in the training or fine tuning process are all in the high-precision data format, and then the trained neural network is quantized. Taking the example that the quantized object is the weight of the whole neural network and the quantized weight is the 8bit fixed point number, since millions of connections are often in a neural network, almost all the space is occupied by the weights of the neuron connections. Moreover, these weights are all different floating point numbers. Each layer of weights tends to have a normal distribution of certain defined intervals, e.g., (-3.0,3.0). Storing the maximum value and the minimum value corresponding to the weight value of each layer in the neural network, and representing each floating point value by adopting 8bit fixed point numbers. Wherein, the interval is divided into 256 quantization intervals in the maximum value and minimum value range, and each quantization interval is represented by an 8bit fixed point number. For example: within the (-3.0,3.0) interval, byte 0 represents-3.0 and byte 255 represents 3.0. Similarly, byte 128 represents 0.
For data represented by a high-precision data format, taking a floating point number as an example, according to a computer architecture, the floating point arithmetic calculation mode is more complex for fixed point arithmetic and floating point arithmetic with the same length based on an arithmetic expression rule of the floating point number and an arithmetic expression rule of the fixed point number, and more logic devices are needed to form a floating point arithmetic unit. Thus, the volume of the floating point operator is larger than that of the fixed point operator in terms of volume. Also, floating point operators require more resources to process, such that the power consumption gap between fixed point operations and floating point operations is typically of the order of magnitude. In short, floating point operators occupy many times more chip area and power consumption than fixed point operators.
However, floating point operations are not substitutable. First, fixed point operations, while intuitive, determine the integer and fractional parts of a fixed number of bits, which is disadvantageous for expressing particularly large numbers or particularly small numbers at the same time, and may cause overflow.
Furthermore, floating point operators are often favored when training or fine tuning is performed using artificial intelligence processor chips, mainly because in supervised learning neural networks, only floating point operations can record and capture small increments in training. Therefore, how to greatly improve the operation capability of the chip for training is a problem which needs to be solved at present on the premise of not increasing the chip area and the power consumption of the artificial intelligent processor.
It is known to those skilled in the art that training using fixed-point numbers represented by low bit-widths requires processing of the counter-propagating gradient using higher than 8bit fixed-point numbers through practical feedback, which makes the process of training using fixed-point numbers represented by low bit-widths extremely complex. How to replace a floating point arithmetic unit with a fixed point arithmetic unit, achieve the fast speed of fixed point arithmetic, and meet the precision of floating point arithmetic required by arithmetic while improving the peak computing power of an artificial intelligent processor chip is the technical problem solved by the specification.
Based on the description of the technical problem, one characteristic of the neural network is that it is highly tolerant to input noise. If the recognition of objects in a photograph is considered, the neural network can ignore the dominant noise, focusing attention on important similarities. This function means that the neural network can use low-precision calculations as a source of noise, yet still produce accurate predictions in a numerical format that accommodates less information. To perform low-precision training or fine tuning, a data representation with universality is found, so that the overflow condition of the data can be improved, and the data near 0 in the target interval range can be better expressed. Thus, this data indicates that adaptation is required, which can be adjusted with the training or fine tuning process.
In addition, most neural network models require a large number of operations and memory accesses. Some neural network accelerators may provide higher computational performance. However, the computational power of the currently mainstream neural network accelerators is far beyond the bandwidth of the current external memory. The calculation amount and the memory access amount of each layer in the ResNet-18 neural network are taken as examples for the following description.
In ResNet-18 neural networks, the ratio of the amount of computation to the amount of memory access in each layer is different, and thus has different requirements for bandwidth and computational power. Taking element-wise layer as an example, if the computational power is 1GFLOPS (the number of Floating point operations performed per second by Giga flowing-point Operations Per Second), then the required bandwidth is 12GB/s. Meanwhile, for the convolutional layer, the bandwidth requirement is only 10MB/s for the same computing power of 1 GFLOPS. Although the hardware of neural network accelerators has been optimally designed in an attempt to maximize a balance between memory bandwidth and computational power, optimal performance has not been achieved. Under the caffe framework, the inventors of the present application further counted the ratio of computational power to memory access for each layer in the overall ResNet-18 neural network, and found that more than 95% of the data traffic was in some layers (including convolutional, batchNorm, scale, reLU, and element-by-element layers). However, the computational effort in these layers, except for the convolutional layers, is very small, less than 1% of the overall neural network. Thus, memory access is currently a serious bottleneck in the execution of neural networks by artificial intelligence processors.
The operator in the computation graph mapped by the neural network is realized on a CPU and an artificial intelligent processor through a kernel function, and is a mode of 'off-chip storage, on-chip computation and off-chip storage', namely, the input data and the output number of the operator in the neural network are stored in a global storage, the kernel function needs to read the input data from the global storage to complete computation, and a result is stored in the global storage. This presents two problems: first, the access of each operator to the input data and output data cannot be avoided by optimization within the operator; second, each operator requires startup overhead, especially for heterogeneous computing devices outside the CPU. To solve these problems, the kernel functions of two or more successive operators in the computational graph corresponding to the neural network are combined into a new kernel function, so that the computational tasks corresponding to the operators only need one scheduling overhead. Thus, a large amount of data transfer from external memory (DRAM) to on-chip memory, and data transfer from on-chip memory to external memory, can be eliminated. Through testing, the inventors have found that in a ResNet-18 neural network, 99.6% less data transmission can be achieved if all operators can be fused together.
However, it is difficult to fuse all operators in an actual neural network together. The reasons for this include: in practice, there is a mismatch between the size of the on-chip memory and the size of the data processed by the neural network, because the area overhead of the artificial intelligence processor is unlikely to be too large, and accordingly, there is a limit to the area overhead of the on-chip memory of the artificial intelligence processor. Also, the power consumption overhead required for on-chip memory of an artificial intelligence processor should be within a reasonable range. These reasons result in some limitations on the size of the data stored on-chip of the artificial intelligence processor. Thus, if all operators in the neural network are fused together, the data size of the intermediate data of those fused operators does not match the data size of the actual storage of the on-chip memory. To alleviate this contradiction, further analysis shows that intermediate results between these operators are included in the optimization range of the fused kernel, and that part of the memory access of intermediate results is therefore possible to be optimized, this optimization of intermediate results usually being based on the local independence of data available in the calculation process. Based on this principle of operation, in an operator, each point in the output data set depends only on a defined region within the input data set. Therefore, the input data and the output data can be separated or split into a plurality of blocks, each block can be calculated independently, and more operators in the calculation graph corresponding to the neural network are fused together.
Based on the above description, as shown in fig. 2, a flowchart of a method for determining quantization parameters of a neural network is provided. The quantization parameter determined by the technical scheme shown in fig. 2 is used for data representation of the data to be quantized, so as to confirm the number of fixed-point points after quantization. The quantized fixed-point number is used for training, fine tuning or reasoning of the neural network. The method comprises the following steps:
step 201): traversing operators in a computational graph corresponding to the neural network, and selecting a current operator and an operator to be fused from the computational graph.
Taking caffe as an example, the neural network includes a plurality of processing layers including, but not limited to, a convolutional layer, a BatchNorm layer, a Scale layer, a ReLU layer, a Pooling layer, an element-by-element layer, an inner laminate layer (InnerProduct layer), a SoftMax layer, and the like.
The operator corresponding layer to be fused is called a layer to be fused, the current operator corresponding layer is called a current layer, and the layer to be fused is positioned at the downstream of the current layer. Those skilled in the art will readily appreciate that the operator corresponding layer to be fused may also be located upstream of the current operator corresponding layer. Taking a convolution layer and a BatchNorm layer as an example, if the convolution layer is used as the current layer and the BatchNorm layer is used as the layer to be fused, the BatchNorm layer may be located upstream of the convolution layer, i.e., the output data of the BatchNorm layer is the input data of the convolution layer. The BatchNorm layer may also be located downstream of the convolutional layer, i.e., the output data of the convolutional layer is the input data of the BatchNorm layer.
In addition, according to a preferred embodiment of the present application, a first layer of the neural network is selected as a current layer, a next layer closely adjacent to the first layer is selected as a layer to be fused, and fusion judgment is performed layer by layer.
Step 202): the split size is determined based on the available storage capacity of the on-chip memory of the artificial intelligence processor.
Step 203): and splitting the output data of the operator to be fused into a plurality of data blocks according to the splitting size.
Fig. 13 shows output data OD2 of the layer to be fused, which is, for example, data in m×n dimensions. According to a preset splitting size, the output data OD2 of the layer to be fused is split into m×n output blocks, where M is less than or equal to M, N is less than or equal to N, and is OD2 (1, 1), OD2 (1, 2), and up to OD2 (M, N), respectively. According to a preferred embodiment of the present application, the splitting size is selected such that the output data OD2 of the layer L2 to be fused can be split uniformly into m×n shares. However, the present application is not limited thereto, and non-uniform splitting may be implemented, for example, in fig. 13, the sizes of the output blocks of the m-th row and the output blocks of the n-th column may be smaller than the sizes of the remaining output blocks, which are all within the protection scope of the present application.
Step 204): and mapping to obtain the size of the data block of the input data of the current operator and the size of the data block of the intermediate data between the current operator and the operator to be fused based on the size of the data block of the output data of the operator to be fused.
Fig. 14 illustrates one embodiment of step 204. As shown in fig. 14, the current layer L1 and the layer to be fused L2 are shown in a data transformation manner, and the layer structure of the entity is not shown. For the current layer L1, the input data is ID1, after the current layer L1 performs a preset transformation process on the input data ID1, output data OD1 is obtained, the output data OD1 is provided as input data to the layer L2 to be fused, and the output data OD1 may also be referred to as intermediate data between the current layer L1 and the layer L2 to be fused. After the fusion layer L2 performs preset transformation processing on the intermediate data OD1, output data OD2 is obtained.
Since the data transformation process performed by each of the current layer L1 and the layer to be fused L2 may be preset, the data block of the input data of that layer may be reversely deduced from the output block of the output data. For example, in fig. 14, taking the output block OD2 (m, 1) of the output data as an example, the data block size of the data block OD1 (m, 1) in the intermediate data OD1 may be derived according to the transform process performed by the layer to be fused L2, and the data block size of the data block OD1 (m, 1) may be larger, smaller, or both than the size of the output block OD2 (m, 1), which are within the scope of the present application. Similarly, the size of the data block ID1 (m, 1) of the input data ID1 of the current layer L1 can be obtained from the data block size of the data block OD1 (m, 1) in the intermediate data OD1 and from the transform process performed by the current layer L1. In other words, the above procedure is to reversely derive the data block size of the input data required in the current layer and the data block size of the intermediate data according to the output data block size of the layer to be fused.
Fig. 14 shows that the layer L2 to be fused is located downstream of the current layer L1, and the two layers are closely adjacent, and the output of the current layer L1 is the input of the layer L2 to be fused. The protection scope of the present application is not limited thereto, and the layer to be fused L2 and the current layer L1 may be separated by more layers. In this case, the teachings described above can also be applied to obtain, by reverse derivation, the data block size of the input data required in the current layer, and the data block size of the intermediate data, which, of course, in this case, has multiple layers of intermediate data, which are within the scope of the present application.
As shown in fig. 15, an engineering flow chart is determined for the data to be quantized. The following describes in detail with reference to fig. 15.
Step 1: traversing operators in the computation graph corresponding to the neural network, and selecting a current operator and an operator to be fused from the computation graph.
Step 2: the split size is determined based on the available storage capacity of the on-chip memory of the artificial intelligence processor.
Step 3: and splitting the output data of the operator to be fused into a plurality of data blocks according to the splitting size.
Step 4: and performing memory allocation. In practical applications, the on-chip memory or a specified portion of the on-chip memory for artificial intelligence processing is allocated to the output block, the data block of the input data, and the data block of the intermediate data.
Step 5: and judging whether the memory allocation is successful or not. For example, the sum of the size of the output block (i.e., the split size), the data block size of the input data, and the data block size of the intermediate data may be compared to the storage space of the on-chip memory available for allocation, and if the storage space is not exceeded, the allocation is successful. Meanwhile, the data to be quantized is determined; if the storage space is exceeded, the allocation fails and the process proceeds to step 6.
And 6, judging whether the splitting size can be reduced. It is easily understood by those skilled in the art that the split size may be dynamically changed, for example, the split size may be set to a larger value at the stage of determining whether an operator to be fused can be fused with the current operator at the beginning. If at this split size, the decision is not fusible, then an attempt may be made to reduce the split size, as shown in step 7. The magnitude of the split size reduction can be set as desired. Of course, those skilled in the art will readily appreciate that the resolution size cannot be reduced without limitation, and that its lower threshold may be set. In step 6, when the splitting size is judged not to reach the lower limit threshold value, the method proceeds to step 7, reduces the splitting size, returns to step 2, and re-splits the output data of the operator to be fused into corresponding output blocks according to the reduced splitting size, and proceeds to subsequent processing and judgment; and when the splitting size is judged to have reached the lower threshold value, judging that the splitting size cannot be further fused until the judgment result is that the current operator and the operator to be fused are fused together.
Step 205): the data blocks of the output data of the operator to be fused, the corresponding data blocks of the input data of the current operator and the data blocks of the intermediate data between the current operator and the operator to be fused are used as data to be quantized, and a statistical result of each type of data to be quantized is obtained; the data to be quantized comprises at least one of neuron, weight, gradient and bias of the neural network.
As described above, in training or fine tuning the neural network, each layer of the neural network includes four types of data, namely neurons, weights, gradients, and biases. In the reasoning process, each layer of the neural network comprises three types of data, namely neurons, weights and biases. These data are all represented in a high-precision data format, and this specification exemplifies floating point numbers as high-precision data. It should be clear that, taking the floating point number as an example, the floating point number is only an exemplary case and not an exhaustive case, and those skilled in the art, while understanding the spirit of the present technical solution, may generate other modifications or transformations based on the technical solution of the present application, for example: the high-precision data can be fixed point numbers with high data bit width, which have large expression range and small expression minimum precision, and the technical scheme can be adopted to convert the high-precision data into the fixed point numbers with low data bit width. But should be within the scope of the present application as long as the functions and technical effects achieved are similar to those of the present application.
Regardless of the neural network structure, the data to be quantized includes at least one of neurons, weights, gradients, and biases of the neural network during training or fine tuning of the neural network, and includes at least one of neurons, weights, and biases of the neural network during reasoning.
The following takes two data, namely, neuron and weight of the target layer in the neural network as an example of the data to be quantized, and describes the technical scheme in detail. In this step, the neurons and weights of each layer in the target layer are counted respectively to obtain the maximum value and the minimum value of each type of data to be quantized, and the maximum value of the absolute value of each type of data to be quantized can also be obtained. The target layer is used as a layer needing quantization in the neural network, and can be one layer or multiple layers. The absolute value maximum value of each data to be quantized can be confirmed by the maximum value and the minimum value in each data to be quantized in a layer unit. Or firstly solving the absolute value of each piece of data to be quantized, and traversing the result after solving the absolute value to obtain the maximum value of the absolute value of each piece of data to be quantized.
In practical application, the reason that the absolute value maximum value of each type of data to be quantized is obtained according to the maximum value and the minimum value in each type of data to be quantized is that, during quantization, the maximum value and the minimum value corresponding to the data to be quantized in each layer of the target layer are saved in a conventional manner, more resources are not required to be consumed to calculate the absolute value of the data to be quantized, and the absolute value maximum value is obtained directly based on the saved maximum value and the minimum value corresponding to the data to be quantized.
Step 206): determining corresponding quantization parameters by using the statistical result of each type of data to be quantized and the data bit width; the quantization parameter is used for the artificial intelligence processor to correspondingly quantize the data in the operation process of the neural network.
In this step, the quantization parameter can be divided into the following six cases. First case: the quantization parameter is a point location parameter s. In this case, the quantized data I can be obtained by quantizing the data to be quantized using the following formula (1) x
Wherein s is a point location parameter, I x Representing the value for n-bit binary representation after quantization of data x, F x For the floating point value before quantization of data x, round is the rounding operation performed for rounding. It should be noted that, the method is not limited to round, and other rounding methods may be used, for example: and (3) adopting rounding operations such as rounding upwards, rounding downwards, rounding to zero and the like to replace round rounding operation in the formula (1). At this time, the maximum value A of the floating point number, which can be represented by the n-bit fixed point number, is 2 s (2 n-1 -1), then the n-bit fixed point number may represent a maximum of 2 in the number domain of the data to be quantized s (2 n-1 -1) the n-bit fixed point number may represent a minimum value of-2 in the number domain of the data to be quantized s (2 n-1 -1). As can be seen from the formula (1), when the quantization parameter corresponding to the first case is adopted to quantize the data to be quantized, the quantization interval is 2 s The quantization interval is denoted as C.
Let Z be the absolute maximum of all floating point numbers in the domain of the data to be quantized, then A needs to contain Z, and Z is greater thanThere is therefore the following constraint of equation (2):
2 s (2 n-1 -1)≥Z>2 s-1 (2 n-1 -1) (2)
thus, the first and second substrates are bonded together,obtain-> />
The quantized n-bit binary representation value I of data x according to equation (3) x Performing inverse quantization to obtain inverse quantized dataWherein the dequantized data +.>Data format of (2) and corresponding pre-quantization data F x The data formats of the data are the same and are all floating point values.
Second case: the quantization parameter is a first scaling factor f 1 . In this case, the quantized data I can be obtained by quantizing the data to be quantized using the following equation (4) x
Wherein f 1 For the first scaling factor, I x Representing the value for n-bit binary representation after quantization of data x, F x For the floating point value before quantization of data x, round is the rounding operation performed for rounding. It should be noted that, the method is not limited to round, and other rounding methods may be used, for example: the round rounding operation in the formula (4) is replaced by rounding operations such as up rounding, down rounding, zero rounding and the like . As can be seen from the equation (4), when the quantization parameter corresponding to the second case is adopted to quantize the data to be quantized, the quantization interval is f 1 The quantization interval is denoted as C.
For the first scaling factor f 1 In terms of this, there is a case where: the point position parameter s is a fixed known value, no change occurs anymore, set 2 s T, T is a fixed value, then the maximum value a that can be expressed as a floating point number with an n-bit fixed point number is (2 n-1 -1) x T. In this case, the maximum value a depends on the data bit width n. Let Z be the maximum absolute value of all numbers in the number domain of the data to be quantizedAt this time z= (2 n-1 -1)×f 1 . The n-bit fixed point number may represent that the maximum value in the number domain of the data to be quantized is (2 n-1 -1)×f 1 The n-bit fixed point number may represent that the minimum value in the number domain of the data to be quantized is- (2) n-1 -1)×f 1 . In yet another case, in engineering applications, 2 s ×f 2 As a whole, as a first scaling factor f 1 . At this time, it can be regarded that there is no independent point location parameter s. Wherein f 2 Is the second scaling factor. Let Z be the absolute maximum of all numbers in the number domain of the data to be quantized, then +.>At this time z= (2 n-1 -1)×f 1 . The n-bit fixed point number may represent that the maximum value in the number domain of the data to be quantized is (2 n-1 -1)×f 1 The n-bit fixed point number may represent that the minimum value in the number domain of the data to be quantized is- (2) n-1 -1)×f 1
The quantized n-bit binary representation value I of the data x according to (5) x Performing inverse quantization to obtain inverse quantized dataWherein the dequantized data +.>Data format of (2) and corresponding pre-quantization data F x The data formats of the data are the same and are all floating point values.
Third case: the quantization parameter is a point location parameter s and a second scaling factor f 2 . In this case, the quantized data I can be obtained by quantizing the data to be quantized using the following equation (6) x
Wherein s is a point location parameter, f 2 For the second scaling factor is,I x representing the value for n-bit binary representation after quantization of data x, F x For the floating point value before quantization of data x, round is the rounding operation performed for rounding. It should be noted that, the method is not limited to round, and other rounding methods may be used, for example: and (3) adopting rounding operations such as rounding upwards, rounding downwards, rounding to zero and the like to replace round rounding operation in the formula (6). The maximum value a in the number domain of the data to be quantized, which can be represented by n-bit fixed-point number, is 2 s (2 n-1 -1). As can be seen from the equation (6), when the quantization parameter corresponding to the third case is adopted to quantize the data to be quantized, the quantization interval is 2 s ×f 2 The quantization interval is denoted as C.
Let Z be the absolute maximum of all numbers in the number domain of the data to be quantized, at this time, it is available according to equation (2):
i.e. < ->
When, according to equation (2), Z can be represented with lossless accuracy. When f 2 When=1, formula (6) and formula (1), ++>The n-bit fixed point number may represent that the maximum value in the number domain of the data to be quantized is (2 n-1 -1)×2 s ×f 2 The n-bit fixed point number may represent that the minimum value in the number domain of the data to be quantized is- (2) n-1 -1)×2 s ×f 2
N-bit binary representation of quantized data x according to equation (7) I x Performing inverse quantization to obtain inverse quantized dataWherein the dequantized data +.>Data format of (2) and corresponding pre-quantization data F x The data formats of the data are the same and are all floating point values.
As shown in FIG. 3, the symmetric fixed point number represents a schematic diagram. The number domain of the data to be quantized shown in fig. 3 is distributed with "0" as the center of symmetry. Z is the absolute maximum value of all floating-point numbers in the number domain of the data to be quantized, A is the maximum value of the floating-point number which can be represented by n-bit fixed-point numbers in FIG. 3, and the floating-point number A is converted into fixed-point number 2 n-1 -1. In order to avoid overflow, a needs to contain Z. In practice, floating point data in the neural network operation process tends to be normally distributed in a certain determined interval, but the distribution taking "0" as a symmetry center is not necessarily satisfied, and when the floating point data is represented by a fixed point number, an overflow condition is easy to occur. Is that This is improved by introducing an offset into the quantization parameter, as shown in fig. 4. In FIG. 4, the number field of the data to be quantized is not distributed with "0" as the center of symmetry, Z min Is the minimum value of all floating point numbers in the number domain of the data to be quantized, Z max Is the maximum of all floating point numbers in the number domain of the data to be quantized. P is Z min ~Z max And (3) the center point is used for integrally shifting the number domain of the data to be quantized, so that the number domain of the data to be quantized after the shifting is distributed by taking 0 as a symmetrical center, and the maximum value of the absolute value in the number domain of the data to be quantized after the shifting is Z. As can be seen from fig. 4, the offset is the horizontal distance between the "0" point and the "P" point, which is referred to as offset O. Wherein,
based on the above description about the offset O, a case of the fourth quantization parameter occurs. Fourth case: the quantization parameters include point location parameters and offsets. In this case, the quantized data I can be obtained by quantizing the data to be quantized using the following equation (8) x
Wherein s is a point location parameter, O is an offset,I x representing the value for n-bit binary representation after quantization of data x, F x For the floating point value before quantization of data x, round is the rounding operation performed for rounding. It should be noted that, the method is not limited to round, and other rounding methods may be used, for example: and replacing round rounding operation in the formula (8) by rounding operations such as rounding up, rounding down and the like. At this time, the maximum value A of the floating point number, which can be represented by the n-bit fixed point number, is 2 s (2 n-1 -1), then the n-bit fixed point number may represent a maximum of 2 in the number domain of the data to be quantized s (2 n-1 -1) +o, the n-bit fixed point number may represent a minimum value of-2 in the number domain of the data to be quantized s (2 n-1 -1) +o. As can be seen from equation (8), when the quantization parameter corresponding to the fourth case is used to quantize the data to be quantized, the quantization interval is 2 s The quantization interval is denoted as C.
Let Z be the absolute maximum of all floating point numbers in the domain of the data to be quantized,then A needs to contain Z, and Z is greater than +.>Obtain +.>And then obtain
The quantized n-bit binary representation value I of data x according to equation (9) x Performing inverse quantization to obtain inverse quantized dataWherein the dequantized data +.>Data format of (2) and corresponding pre-quantization data F x The data formats of the data are the same and are all floating point values.
Based on the above description about the offset O, a case of a fifth quantization parameter occurs. Fifth case: the quantization parameter includes a first scaling factor f 1 And an offset O. In this case, the quantized data I can be obtained by quantizing the data to be quantized using the following formula (10) x
Wherein f 1 For the first scaling factor, O is the offset, I x Representing the value for n-bit binary representation after quantization of data x, F x For the floating point value before quantization of data x, round is the rounding operation performed for rounding. It should be noted that, the method is not limited to round, and other rounding methods may be used, for example: and replacing round rounding operation in the formula (10) by rounding operations such as rounding up, rounding down and the like. At this time, there is a case that: the point position parameter s is a fixed known value, no change occurs anymore, set 2 s =t, T being a fixed value. Then, the maximum value A of the floating point number, which can be expressed by the n-bit fixed point number, is (2 n-1 -1) x T. In this case, the maximum value a depends on the data bit width n. Let Z be the maximum absolute value of all numbers in the number domain of the data to be quantizedAt this time z= (2 n-1 -1)×f 1 . The n-bit fixed point number may represent that the maximum value in the number domain of the data to be quantized is (2 n-1 -1)×f 1 The n-bit fixed point number may represent that the minimum value in the number domain of the data to be quantized is- (2) n-1 -1)×f 1 . In yet another case, in engineering applications, 2 s ×f 2 As a whole as a first scaling factor f 1 . At this time, it can be regarded that no independent point position parameter s exists. Wherein f 2 Is the second scaling factor. Let Z be the absolute maximum of all numbers in the number domain of the data to be quantized, then +.>At this time z= (2 n-1 -1)×f 1 . The n-bit fixed point number may represent that the maximum value in the number domain of the data to be quantized is (2 n-1 -1)×f 1 The +O, n-bit fixed point number may represent the minimum value in the number domain of the data to be quantizedIs- (2) n-1 -1)×f 1 +O。
As can be seen from the equation (10), when the quantization parameter corresponding to the fifth case is used for quantizing the data to be quantized, the quantization interval is f 1 The quantization interval is denoted as C.
The quantized n-bit binary representation value I of data x according to equation (11) x Performing inverse quantization to obtain inverse quantized data Wherein the dequantized data +.>Data format of (2) and corresponding pre-quantization data F x The data formats of the data are the same and are all floating point values.
Based on the above description about the offset O, a case of a sixth quantization parameter occurs. Sixth case: the quantization parameter includes a point position parameter, a second scaling factor f 2 And an offset O. In this case, the quantized data I can be obtained by quantizing the data to be quantized using the following formula (12) x
Wherein s is the point location parameter, the offset O, f 2 For the second scaling factor is,I x representing the value for n-bit binary representation after quantization of data x, F x For the floating point value before quantization of data x, round is the rounding operation performed for rounding. It should be noted that, the method is not limited to round, and other rounding methods may be used, for example: adopts upward rounding and downward rounding,Rounding to zero or the like replaces the round rounding in equation (12). The maximum value a in the number domain of the data to be quantized, which can be represented by n-bit fixed-point number, is 2 s (2 n-1 -1). As can be seen from the equation (12), when the quantization parameter corresponding to the sixth case is used to quantize the data to be quantized, the quantization interval is 2 s ×f 2 The quantization interval is denoted as C.
Let Z be the absolute maximum of all numbers in the number domain of the data to be quantized, at this time, it is available according to equation (2):
i.e. < ->
When, according to equation (2), Z can be represented with lossless accuracy. When f 2 When the number of the codes is =1,the n-bit fixed point number may represent that the maximum value in the number domain of the data to be quantized is (2 n-1 -1)×2 s ×f 2 The +O, n-bit fixed point number may represent that the minimum value in the number domain of the data to be quantized is- (2) n-1 -1)×2 s ×f 2 +O。
The quantized n-bit binary representation value I of data x according to equation (13) x Performing inverse quantization to obtain inverse quantized dataWherein the dequantized data +.>Data format of (2) and corresponding pre-quantization data F x The data formats of the data are the same and are all floating point values.
The above detailed description of the determination of the 6 quantization parameters is merely an example. The kind of quantization parameter may be different from the above description in different embodiments. From equations (1) through (13), both the point location parameter and the scaling factor are related to the data bit width. Different data bit widths result in different point location parameters and scaling factors, thereby affecting quantization accuracy. In the training or fine tuning process, the same data bit width quantization is used within a certain iteration (iterations) frequency range, so that the overall accuracy of the neural network operation is not greatly influenced. Beyond a certain number of iterations, the requirement of training or fine tuning on precision cannot be met by using the same data bit width quantization. This requires an adjustment of the data bit width n with the training or fine tuning process. Simply, the data bit width n can be set manually. And calling the corresponding data bit width n which is set in advance in different iteration frequency ranges. However, it has been mentioned above that the process of implementing training using fixed point numbers of low bit width representations is exceptionally complex. This way of manually setting the data bit width in advance is basically not in line with the needs of practical applications.
In the present solution, according to the quantization error diff bit The data bit width n is adjusted. In further detail, the quantization error diff bit And comparing with a threshold value and obtaining a comparison result. The threshold value comprises a first threshold value and a second threshold value, the first threshold value is larger than the second threshold value, and the comparison result has three conditions, wherein the first condition is: quantization error diff bit And is equal to or greater than the first threshold, in which case the data bit width is increased. The second case is: quantization error diff bit And less than or equal to the second threshold, in which case the data bit width is reduced. The third case is: quantization error diff bit Between the first threshold and the second threshold, in which case the data bit width remains unchanged. In practical applications, the first threshold value and the second threshold value may be empirical values, or may be variable super parameters. Conventional methodThe above-mentioned optimization methods are suitable for the first threshold and the second threshold, and the above-mentioned optimization schemes are not repeated here.
It should be emphasized that the data bit width can be adjusted according to a fixed bit number step, or according to a difference between the quantization error and the error threshold, the data bit width can be adjusted according to a variable adjustment step, and finally, according to the actual requirement of the neural network operation process, the data bit width is adjusted to be longer or shorter. Such as: the data bit width n of the current convolution layer is 16, according to the quantization error diff bit The data bit width n is adjusted to 12. That is, in practical application, the data bit width n can meet the requirement of the neural network operation on the precision without 16, so that the fixed-point operation speed can be greatly improved within the precision allowable range, and the resource utilization rate of the artificial intelligent processor chip is improved.
For quantization error diff bit The quantization error is determined from the quantized data and the corresponding pre-quantized data. In practical application, three quantization error determination modes are available, and all the quantization error determination modes are applicable to the technical scheme. The first way is: and determining quantization errors according to a formula (14) according to the quantization interval, the number of quantized data and the corresponding data before quantization.
Wherein C is the corresponding quantization interval during quantization, m is the number of quantized data obtained after quantization, F i And i is a floating point value corresponding to the data to be quantized, wherein i is a subscript of the data in the data set to be quantized.
The second way is: determining quantization error diff according to equation (15) from quantized data and corresponding inverse quantized data bit
Wherein F is i To be treatedAnd quantizing the corresponding floating point value, wherein i is the subscript of the data in the data set to be quantized. And (5) inversely quantizing the data corresponding to the floating point value.
Third mode: determining quantization error diff according to formula (16) based on quantized data and corresponding inverse quantized data bit
Wherein F is i And i is a floating point value corresponding to the data to be quantized, wherein i is a subscript of the data in the data set to be quantized.And (5) inversely quantizing the data corresponding to the floating point value.
It should be emphasized that the above described acquisition of quantization error diff bit The manner of (a) is merely an exemplary partial case, and not an exhaustive case, and those skilled in the art, while understanding the spirit of the technical solution of the present application, may generate other modifications or transformations based on the technical solution of the present application, and all that supports the modification formula for determining the quantization error according to the quantized data and the corresponding data before quantization, but all that is required is to fall within the scope of protection of the present application as long as the function and the achieved technical effect are similar to those of the present application.
For data bit width, fig. 5a is one of graphs of the magnitude of the weight data fluctuation of the neural network during training. FIG. 5b is a graph showing the variation range of the weight data of the neural network during training. In fig. 5a and 5b, the abscissa represents the number of iterations and the ordinate represents the maximum value of the logarithm of the weight. The weight data fluctuation range curve shown in fig. 5a shows the weight data fluctuation conditions corresponding to different iterations in the same period (epoch) of any convolution layer of the neural network. In fig. 5B, the conv0 layer corresponds to the weight data fluctuation width curve a, the conv1 layer corresponds to the weight data fluctuation width curve B, The conv2 layer corresponds to the weight data fluctuation range curve C, the conv3 layer corresponds to the weight data fluctuation range curve D, and the conv4 layer corresponds to the weight data fluctuation range curve e. As can be seen from fig. 5a and 5b, the weight change range per iteration is relatively large in the initial training period (epoch). The change amplitude of the weight value of each iteration is not too large in the middle and later stages of training. In this case, in the middle and later stages of training, because the variation amplitude of the weight data before and after each iteration is not large, the weight data of the corresponding layers of each generation have similarity in a certain iteration interval, and the data bit width used in the quantization of the corresponding layers in the last iteration can be adopted in the quantization of the data involved in each layer in the neural network training process. However, in the initial stage of training, since the variation amplitude of the weight data before and after each iteration is relatively large, in order to meet the precision of floating point operation required by quantization, in each iteration in the initial stage of training, the weight data of the corresponding layer of the current generation is quantized by using the data bit width adopted in the quantization of the corresponding layer of the last iteration, or the weight data of the current layer is quantized based on the preset data bit width n of the current layer, so as to obtain the quantized fixed point number. Determining quantization error diff according to quantized weight data and corresponding weight data before quantization bit According to quantization error diff bit And (3) comparing the data bit width n adopted in the quantization of the corresponding layer of the previous iteration or the preset data bit width n of the current layer with a threshold value, and applying the adjusted data bit width to the quantization of the weight data of the corresponding layer of the current iteration. Further, in the training or fine tuning process, the weight data of each layer of the neural network are independent of each other and have no similarity. The neuron data among each layer are independent of each other because the weight data do not have similarity, and the weight data do not have similarity. Thus, the data bit width of each layer within each iteration of the neural network is only applicable to the corresponding neural network layer during the neural network training or trimming process.
In the training or fine tuning process of the neural network, the data bit widths corresponding to the neuron data and the gradient data respectively are also taken as examples of the weight data, and are not repeated here.
In neural netIn the process of channel reasoning, weight data among each layer of the neural network are mutually independent and have no similarity. The neuron data among each layer are independent of each other because the weight data do not have similarity, and the weight data do not have similarity. Thus, in the neural network reasoning process, the data bit width of each layer of the neural network is applied to the corresponding layer. In practical application, the input neuron data of each time in the reasoning process is quite possibly different or dissimilar, and because the weight data between each layer of the neural network are mutually independent, the input neuron data of each layer in the hidden layers of the neural network are dissimilar. In quantization, the data bit width used by the input neuron data of the upper layer is not suitable for the input neuron data of the current layer. Based on the above, in order to meet the precision of floating point operation required by quantization, during reasoning, the input neuron data of the current layer is quantized by using the data bit width adopted during the quantization of the input neuron data of the previous layer, or the input neuron data of the current layer is quantized based on the preset data bit width n of the current layer, so as to obtain the quantized fixed point number. Determining quantization error diff based on pre-quantized input neuron data and corresponding post-quantized input neuron data bit According to quantization error diff bit And (3) comparing the data bit width n with a threshold value, which is adopted when the input neuron data of the upper layer is quantized, or the preset data bit width n of the current layer is adjusted, and the adjusted data bit width is applied to the quantization of the input neuron data of the current layer. The same is true for the data bit width corresponding to the weight data, and the description thereof is omitted here.
As can be seen from fig. 5a, the weight change range per iteration is relatively large in the initial training period (epoch). In the middle and later stages of training, as the change amplitude of the weight data before and after each iteration is not large, the weight data of the corresponding layers of each iteration has similarity in a certain iteration interval, so that the data of each layer of the current iteration can be prolonged by the quantization parameter of the corresponding data of the corresponding layer of the last iteration during quantization, the quantization parameter is reconfirmed without substitution in the middle and later stages of training, and the quantization parameter is confirmed only in each layer of each iteration in the initial stage of training, thereby still meeting the precision of floating point operation required by neural network operation and greatly improving the efficiency during quantization. Further, in the training or fine tuning process, the weight data of each layer of the neural network are independent of each other and have no similarity. The neuron data among each layer are independent of each other because the weight data do not have similarity, and the weight data do not have similarity. Thus, in the neural network training or trimming process, the quantization parameters of each layer within each iteration of the neural network are applied to the corresponding data to be quantized of the corresponding layer.
In the training or fine tuning process of the neural network, the quantization parameters corresponding to the neuron data and the gradient data are also respectively used as examples of the weight data, which is not described herein.
In the neural network reasoning process, weight data among each layer of the neural network are mutually independent and have no similarity. The neuron data among each layer are independent of each other because the weight data do not have similarity, and the weight data do not have similarity. Therefore, in the neural network reasoning process, the quantization parameter of each layer of the neural network is applied to the data to be quantized of the corresponding layer. Such as: the current layer of the neural network is a convolution layer, and the quantization parameter of the data to be quantized of the current convolution layer is obtained according to the technical scheme shown in fig. 2 according to the data to be quantized of the convolution layer, wherein the quantization parameter can only be applied to the current convolution layer, but cannot be applied to other layers of the neural network, and even if the other layers are convolution layers, the quantization parameter is not applicable.
In summary, the delay policy of the data bit width and the quantization parameter is determined based on the similarity between the data, if the data have similarity, the data bit width and the quantization parameter may be delayed, and if the data do not have similarity, the data bit width or the quantization parameter may need to be adjusted. The measure of similarity between data is typically measured by KL divergence, and may also be measured by the following equation (17).
absmax (A) ≡absmax (B) and mean (A) ≡mean (B) (17)
In some embodiments, if data A and data B satisfy equation (17), then it is determined that there is similarity between data A and data B.
It should be noted that, regarding the above described method for determining quantization error, method for adjusting data bit width, and delay strategy of data bit width and quantization parameter are only some of the exemplary cases, and are not exhaustive, for example: the method for confirming the quantization error, the method for adjusting the data bit width, the data bit width and the delay strategy of the quantization parameter are all applicable to the fine tuning process of the neural network. Also, regarding the measurement of the similarity between data, the above list of KL divergence and the measurement of the similarity by equation (17) is only a partial example, and not exhaustive, such as: histogram matching method, matrix decomposition method, image similarity calculation method based on feature points, proximity metric method, etc. Those skilled in the art, with the understanding of the technical solution of the present application, may generate other modifications or changes based on the technical solution of the present application, but all shall fall within the protection scope of the present application as long as the functions and technical effects achieved by the present application are similar to those of the present application.
In summary, in the middle and later stages of training, since the variation amplitude of the weight data before and after each iteration is not large, the weight data of the corresponding layer of each iteration has similarity in a certain iteration interval, in order to make the technical scheme have better universality in training or fine tuning, and meet the requirement that the resources of the artificial intelligent processor chip reach reasonable application, a strategy is needed to determine the iteration interval, so that the data bit width n of the corresponding layer of each iteration is kept unchanged within the iteration interval range, and exceeds the iteration interval, the data bit width n is changed, and generation is not needed to determine whether to adjust the data bit width n. Similarly, the quantization parameters are also the same, so that the peak computing power of the artificial intelligent processor chip is improved, and meanwhile the precision of floating point operation required by quantization is met.
As shown in fig. 6, one of the flow charts of the method for determining the target iteration interval is provided. In the technical solution shown in fig. 6, the target iteration interval includes at least one weight updating iteration, and the same data bit width is adopted in the quantization process in the same target iteration interval. The determining step of the target iteration interval comprises the following steps:
step 601): determining a change trend value of the position parameter of the corresponding point of the data to be quantized in the weight iterative process at a pre-judging time point; the pre-judging time point is used for judging whether the data bit width needs to be adjusted or not, and corresponds to the time point when the weight updating iteration is completed.
In this step, according to formula (18), the change trend value of the point location parameter is determined according to the sliding average value of the point location parameter in the weight iteration process corresponding to the current pre-determination time point and the sliding average value of the point location parameter in the weight iteration process corresponding to the previous pre-determination time point, or according to the point location parameter in the weight iteration process corresponding to the current pre-determination time point and the sliding average value of the point location parameter in the weight iteration process corresponding to the previous pre-determination time point. The expression of formula 18 is:
diff update1 =|M (t) -M (t-1) |=α|s (t) -M (t-1) | (18)
in equation 18, M is a running average of the point location parameter s as the training iteration increases. Wherein M is (t) For the point position parameter s corresponding to the t < th > pre-judgment time point, obtaining M according to a formula (19) along with the increasing sliding average value of training iteration (t) 。s (t) And the point position parameter s corresponding to the t < th > pre-judging time point is obtained. M is M (t-1) And the value is a sliding average value of point position parameters s corresponding to the t-1 th pre-judging time point, and alpha is a super parameter. diff (diff) update1 Measuring the change trend of the point position parameter s, wherein the change of the point position parameter s also changes the phase and represents the maximum value Z of the data in the current data to be quantized max Is a variation of (2). diff (diff) update1 The larger the value range, the more widely varying the value range, requiring a shorter interval of update frequency, i.e., a smaller target iteration interval.
M (t) ←α×s (t-1) +(1-α)×M (t-1) (19)
Step 602): and determining the corresponding target iteration interval according to the change trend value of the point location parameter.
In the present solution, a target iteration interval is determined according to equation (20). For the target iteration interval, the same data bit width is adopted in the quantization process in the same target iteration interval, and the data bit widths adopted in the quantization process in different target iteration intervals can be the same or different.
In formula (20), I is the target iteration interval. diff (diff) update1 Is the variation trend value of the point location parameter. Beta and gamma are empirical values, and may be variable super parameters. The conventional super-parameter optimization method is suitable for beta and gamma, and the super-parameter optimization scheme is not repeated here.
For the technical scheme, the pre-judging time point comprises a first pre-judging time point, and the first pre-judging time point is determined according to the target iteration interval. Specifically, at the t-th pre-judgment time point in the training or fine tuning process, the weight data of the corresponding layer of the current iteration is quantized by utilizing the data bit width adopted in the quantization of the corresponding layer of the last iteration, the number of fixed points after quantization is obtained, and the quantization error diff is determined according to the weight data before quantization and the corresponding weight data before quantization bit . Will quantize the error diff bit And comparing the data bit width with the first threshold and the second threshold respectively, and determining whether to adjust the data bit width adopted in the quantization of the corresponding layer of the last iteration by using the comparison result. If: the t first prejudgement time point corresponds to the 100 th iteration, and the data bit width used by the 99 th iteration is n 1 . At 100 th iteration, according to data bit width n 1 Confirming quantization error diff bit Will quantize the error diff bit And comparing the first threshold value with the second threshold value to obtain a comparison result. If the data bit width n is confirmed based on the comparison result 1 Without changing, using equation (20) to confirm that the target iteration interval is 8 iterations, when the 100 th iteration is the starting iteration within the current target iteration interval, the 100 th to 107 th iterations are the current target iteration interval, and when the 100 th iteration is the last iteration of the previous target iteration interval, the 101 st to 101 th iterations108 iterations were taken as the current target iteration interval. Each generation still extends the data bit width n used in the last target iteration interval when quantizing within the current target iteration interval 1 . In this case, the data bit width used in quantization between different target iteration intervals may be the same. If the 100 th iteration-107 th iteration is taken as the current target iteration interval, the 108 th iteration in the next target iteration interval is taken as the t+1st first pre-judging time point, and if the 101 st iteration-108 th iteration is taken as the current target iteration interval, the 108 th iteration in the current target iteration interval is taken as the t+1st first pre-judging time point. At the t+1st first pre-judging time point, according to the data bit width n 1 Confirming quantization error diff bit Will quantize the error diff bit And comparing the first threshold value with the second threshold value to obtain a comparison result. Determining the data bit width n based on the comparison result 1 Need to be changed to n 2 And confirm the target iteration interval as 55 iterations using equation (20). Then the 108 th to 163 th iterations or the 109 th to 163 th iterations are used as the target iteration interval, and the data bit width n is used for each generation when the quantization is performed in the target iteration interval 2 . In this case, the data bit width used in quantization may be different between different target iteration intervals.
For the technical scheme, whether the first pre-judging time point is the initial iteration or the last iteration in the target iteration interval, the method is suitable for the formula (18) to obtain the change trend value of the point position parameter. If the first predetermined point in time at the current time is the starting iteration of the current target iteration interval, then M in equation (18) (t) A sliding average value s of point location parameter s corresponding to a point of time corresponding to a starting iteration of a current target iteration interval along with the increase of training iteration (t) Point location parameter s, M corresponding to the point in time corresponding to the starting iteration of the current target iteration interval (t-1) And the point position parameter s corresponding to the point in time corresponding to the initial iteration of the last target iteration interval is a sliding average value which increases along with the training iteration. If the first pre-determined time point at the current time is the last iteration of the current target iteration interval, then In the formula (18), M (t) A sliding average value s of point location parameter s corresponding to the time point corresponding to the last iteration of the current target iteration interval along with the increase of training iteration (t) Point location parameter s, M corresponding to the time point corresponding to the last iteration of the current target iteration interval (t-1) The point position parameter s corresponding to the time point corresponding to the last iteration of the last target iteration interval is a sliding average value which increases along with the training iteration.
For the technical scheme, the pre-judging time point can also comprise a second pre-judging time point on the basis of the first pre-judging time point. The second pre-determination time point is determined according to the data fluctuation range curve. The data fluctuation range curve shown in fig. 5a is obtained based on the data fluctuation range condition of big data in the neural network training process.
Taking weight data as an example, as can be seen from the data fluctuation width curve shown in fig. 5a, the data fluctuation width is very large every time the weight is updated in the iteration interval period from the training start to the T-th iteration. At the current prejudgement time point, when quantizing, the previous iteration firstly utilizes the data bit width n of previous iteration 1 Performing quantization to obtain quantization result and corresponding quantization error determined by the data before quantization, comparing the quantization error with a first threshold and a second threshold respectively, and comparing the data bit width n according to the comparison result 1 Adjusting to obtain data bit width n 2 . By means of the data bit width n 2 And quantizing the weight data to be quantized related to the current iteration. Then determining a target iteration interval according to the formula (20), thereby determining a first pre-judgment time point, judging whether to adjust the data bit width and how to adjust the data bit width at the first pre-judgment time point, and determining a next target iteration interval according to the formula (20) to obtain a next first pre-judgment time point. In the iteration interval period from the beginning of training to the T-th iteration, the change amplitude of the weight data before and after each iteration is very large, so that the weight data of the corresponding layers of each iteration are not similar, in order to meet the precision problem, the data of each layer of the current iteration can not be delayed by the corresponding quantization parameter of the corresponding layer of the last iteration during quantization, and the data of the corresponding layer of the current iteration can be substituted for modulation in the previous T-th iterationAnd (3) the whole data bit width is different from each iteration in the previous T iterations during quantization, and the target iteration interval is 1 iteration. For optimal utilization of resources of the artificial intelligence processor chip, the target iteration interval of the previous T iterations may be preset in advance according to the rule disclosed by the data fluctuation range graph shown in fig. 5a, that is: and directly presetting a target iteration interval of the previous T iterations according to the data fluctuation amplitude curve, and confirming a time point when the weight updating iteration corresponding to each iteration of the previous T iterations is completed without going through a formula (20) as a second pre-judging time point. Thereby making the resource of the artificial intelligent processor chip more reasonably utilized. The data fluctuation range curve shown in fig. 5a has small fluctuation range from the T iteration, the quantization parameter is reconfirmed without substitution in the middle and later stages of training, the quantization error is determined by using the data before quantization and the data after quantization corresponding to the current iteration in the T iteration or the t+1 iteration, whether the data bit width needs to be adjusted or not is determined according to the quantization error, and the target iteration interval is determined according to the formula (20). If the confirmed target iteration interval is 55 iterations, it is required to determine whether to adjust the data bit width and how to adjust the data bit width again from the time point corresponding to the T-th iteration or the t+1th iteration at an interval of 55 iterations as a first pre-determination time point, and determine the next target iteration interval according to the formula (20), thereby determining the next first pre-determination time point until all generation operations in the same period (epoch) are completed. On the basis, after each period (epoch), adaptive adjustment is carried out on the data bit width or the quantization parameter, and finally the quantized data is used to obtain the neural network with the accuracy meeting the expectations.
In particular, if: according to the weight data fluctuation range graph shown in fig. 5a, it is determined that the value of T is 130 (this value does not correspond to fig. 5a, for convenience of description, it is only assumed that the value of T is 130, and is not limited to the assumed value.) then 130 th iteration in the training process is taken as the second pre-determination time point, the current first pre-determination time point is 100 th iteration in the training process, and at 100 th iteration, the target iteration interval is determined to be 35 iterations through formula (20). Training is performed until the 130 th iteration reaches a second pre-judging time point within the target iteration interval, at this time, whether the data bit width needs to be adjusted and how to be adjusted are determined at the time point corresponding to the 130 th iteration, and the target iteration interval is determined according to a formula (20). If the target iteration interval determined in this case is 42 iterations. And taking the 130 th iteration to the 172 th iteration as a target iteration interval, wherein the 135 th iteration corresponding to the first pre-judging time point determined when the target iteration interval is 35 iterations is within 42 iterations of the target iteration interval, and judging whether the data bit width needs to be adjusted or not and how to adjust according to the formula (20) in the 135 th iteration. Evaluation of whether and how to adjust the data bit width may also be performed until the 172 th iteration without making evaluation pre-determination at the 135 th iteration. In summary, it is suitable for the present technical solution whether the evaluation and the prognosis are performed at the 135 th iteration.
In summary, a second pre-judgment time point is preset in advance according to the data fluctuation range curve, the resources of an artificial intelligent processor chip are not required to be spent to determine a target iteration interval in the initial stage of training or fine tuning, the data bit width is directly adjusted according to the quantization error at the preset second pre-judgment time point, and the data to be quantized related to the current iteration is quantized by utilizing the adjusted data bit width. At the middle and later stages of training or fine tuning, a target iteration interval is obtained according to formula (20), thereby determining corresponding first pre-determination time points, and determining whether and how to adjust the data bit width at each first pre-determination time point. Therefore, the precision of floating point operation required by neural network operation can be met, the resources of the artificial intelligent processor chip are reasonably utilized, and the efficiency in quantization is greatly improved.
In practice, not only according to the trend value diff of the point position parameter, but also to obtain a more accurate target iteration interval of the data bit width update1 The change trend value diff of the point position parameter can be considered at the same time update1 And a trend value diff of the data bit width update2 . As shown in fig. 7, a second flowchart of the method for determining the target iteration interval is provided. The determining step of the target iteration interval comprises the following steps:
Step 701): determining a change trend value of the position parameter of the corresponding point of the data to be quantized and a change trend value of the data bit width in the weight iterative process at a pre-judging time point; the pre-judging time point is used for judging whether the data bit width needs to be adjusted or not, and corresponds to the time point when the weight updating iteration is completed.
It should be emphasized that the technical solution content of determining the target iteration interval of the data bit width based on the trend value of the point location parameter shown in fig. 6 is applicable to the technical solution shown in fig. 7, and will not be described herein.
In this step, a trend value of the data bit width is determined using the quantization error according to equation (21).
In the formula (21), delta is a super parameter, diff bit Is quantization error; diff (diff) update2 Is the variation trend value of the data bit width. diff (diff) update2 Measuring change trend of data bit width n adopted in quantization, diff update2 The larger the bit width of the update setpoint, the more likely it is that a shorter interval of update frequency is required.
The trend value for the point location parameter referred to in fig. 7 is still obtainable from equation (18), for M in equation (18) (t) Obtained according to formula (19). diff (diff) update1 Measuring the change trend of the point position parameter s, wherein the change of the point position parameter s also changes the phase and represents the maximum value Z of the data in the current data to be quantized max Is a variation of (2). diff (diff) update1 The larger the value range, the more widely varying the value range, requiring a shorter interval of update frequency, i.e., a smaller target iteration interval.
Step 702): and determining the corresponding target iteration interval according to the change trend value of the point location parameter and the change trend value of the data bit width.
In the present solution, the target iteration interval is determined according to formula (22). For the target iteration interval, the same data bit width is adopted in the quantization process in the same target iteration interval, and the data bit widths adopted in the quantization process in different target iteration intervals can be the same or different.
In equation (22), I is the target iteration interval. Beta and gamma are super parameters. diff (diff) update1 Is the variation trend value of the point location parameter. diff (diff) update2 Is the variation trend value of the data bit width. Beta and gamma are empirical values, and may be variable super parameters. The conventional super-parameter optimization method is suitable for beta and gamma, and the super-parameter optimization scheme is not repeated here.
For the present technical proposal, diff update1 Is used to measure the change in the point location parameter s, but the change in the point location parameter s due to the change in the data bit width n is ignored. Since this is already at diff update2 The variation of the data bit width n is reflected. If at diff update1 If the target iteration interval I determined according to the formula (22) is inaccurate, resulting in excessive first predetermined time points, and if and how to update the data bit width n is easy to frequently perform in the training or fine tuning process, thereby resulting in unreasonable utilization of resources of the artificial intelligence processor chip.
Based on the above description, diff update1 According to M (t) And (5) determining. Assume that the data bit width corresponding to the t-1 th prejudgement time point is n 1 The corresponding point location parameter is s 1 The sliding average value of the point position parameter increased along with training iteration is m 1 . By means of the data bit width n 1 And quantizing the data to be quantized to obtain the quantized fixed-point number. Determining quantization error diff based on pre-quantization data and corresponding post-quantization data bit According to quantization error diff bit Comparing the data bit width n with the threshold value 1 Adjusted to n 2 Data bit width is adjusted by |n 1 -n 2 Level, t-th pre-determination time point quantizationA data bit width of n 2 . In order to ignore the change of the point position parameters caused by the change of the data bit width, M is determined (t) One of the following two optimization modes can be selected. The first way is: if the data bit width increases by |n 1 -n 2 The I bit is s (t-1) Take the value s 1 -|n 1 -n 2 |,M (t-1) Take the value of m 1 -|n 1 -n 2 I, will s (t-1) 、M (t-1) Substituting into formula (19) to obtain M (t) And the point position parameter corresponding to the t < th > pre-judging time point is a sliding average value which increases along with training iteration. If the data bit width is reduced by |n 1 -n 2 The I bit is s (t -1) Take the value s 1 +n 1 -n 2 |,M (t-1) Take the value of m 1 +|n 1 -n 2 I, will s (t-1) 、M (t-1) Substituting into formula (19) to obtain M (t) And the point position parameter corresponding to the t < th > pre-judging time point is a sliding average value which increases along with training iteration. The second way is: regardless of whether the data bit width is increased by |n 1 -n 2 The i bit is also reduced by n 1 -n 2 |,s (t-1) Take the value s 1 ,M (t-1) Take the value of m 1 Will s (t-1) 、M (t-1) Substituting into formula (19) to obtain M (t) . Increase in data bit width by |n 1 -n 2 When in the I bit, M is (t) Subtracting |n 1 -n 2 I, decrease in data bit width by |n 1 -n 2 When in the I bit, M is (t) Plus |n 1 -n 2 And the result is taken as a sliding average value of the point position parameter corresponding to the t < th > pre-judging time point along with the increase of training iteration. The two modes are equivalent, and the change of the point position parameters caused by the change of the data bit width can be ignored, so that a more accurate target iteration interval is obtained, and the resource utilization rate of the artificial intelligence processor chip is improved.
In practical application, the data bit width n and the point location parameter s have a great influence on the quantization accuracy, and the second scaling factor f in the quantization parameter 2 Offset O versus quantization accuracyThe sound is not loud. For the first scaling factor f 1 In other words, it has been mentioned above that if it is the second case, 2 will be s ×f 2 As a whole, as a first scaling factor f 1 Since the point position parameter s has a great influence on the quantization accuracy, the first scaling factor f in this case 1 Has great influence on quantification. Therefore, in the present technical solution, determining the target iteration interval of the point location parameter s is also a very significant matter, regardless of whether the data bit width n is changed or not, and the point location parameter s is variable, and the idea of the technical solution shown in fig. 6 may be applied to determining the target iteration interval of the point location parameter s. Thus, a method of determining the target iteration interval for the point location parameter s is shown in fig. 8. Comprising the following steps:
step 801): determining a change trend value of the position parameter of the corresponding point of the data to be quantized, which is involved in the weight iterative process, at a pre-judging time point; the pre-judging time point is a time point used for judging whether the quantization parameter needs to be adjusted or not, and corresponds to a time point when the weight updating iteration is completed.
Step 802): and determining the corresponding target iteration interval according to the change trend value of the point location parameter.
It should be emphasized that the technical solution content of fig. 6 regarding the determination of the target iteration interval of the quantization parameter based on the trend value of the point location parameter is applicable to the technical solution shown in fig. 8, and will not be described herein. For the solution shown in fig. 8, the quantization parameter is preferably a point location parameter.
It should be noted that, regarding the above-mentioned target iteration interval for determining the data bit width and the target iteration interval for determining the quantization parameter are only exemplary and not exhaustive, those skilled in the art, while understanding the gist of the present application, may generate other modifications or transformations based on the technical solution of the present application, for example: the target iteration interval for redetermining the quantization parameter within the target iteration interval for determining the data bit width is also applicable to the solutions shown in fig. 6, 7 and 8. But should be within the scope of the present application as long as the functions and technical effects achieved are similar to those of the present application.
According to the technical scheme, the quantization parameters are determined, the data bit width or the quantization parameters are adjusted according to the quantization errors, and whether the target iteration interval of the data bit width or the quantization parameters is adjusted is determined, so that the data bit width or the quantization parameters are adjusted at a proper time point in the neural network operation process, the proper quantization parameters are used at the proper iteration time point, the speed that the artificial intelligent processor chip executes the neural network operation to achieve fixed-point operation is realized, and the precision of floating point operation required by operation is met while the peak calculation force of the artificial intelligent processor chip is improved.
It should be noted that, for simplicity of description, the foregoing method embodiments are all depicted as a series of acts, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of acts described, as some steps may occur in other orders or concurrently in accordance with the disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.
Further, although the steps in the flowcharts of fig. 2, 6, 7, and 8 are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps of fig. 2, 6, 7, 8 may include multiple sub-steps or phases that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or phases are performed necessarily occur sequentially, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or phases of other steps.
As shown in fig. 9, a block diagram of a hardware configuration of a quantization parameter determination device of a neural network is provided. In fig. 9, the quantization parameter determining device 10 of the neural network may include a processor 110 and a memory 120. In the quantization parameter determination apparatus 10 of the neural network of fig. 9, only the constituent elements related to the present embodiment are shown. Thus, it will be apparent to those of ordinary skill in the art that: the quantization parameter determination apparatus 10 of the neural network may further include common constituent elements different from those shown in fig. 10. Such as: a fixed point arithmetic unit.
The quantization parameter determination apparatus 10 of the neural network may correspond to a computing device having various processing functions, for example, functions for generating the neural network, training or learning the neural network, quantizing the floating point type neural network into the fixed point type neural network, or retraining the neural network. For example, the quantization parameter determination apparatus 10 of the neural network may be implemented as various types of devices, such as a Personal Computer (PC), a server device, a mobile device, and the like.
The processor 110 controls all functions of the quantization parameter determining device 10 of the neural network. For example, the processor 110 controls all functions of the quantization parameter determination apparatus 10 of the neural network by executing a program stored in the memory 120 on the quantization parameter determination apparatus 10 of the neural network. The processor 110 may be implemented by a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an Application Processor (AP), an artificial intelligence processor chip (IPU), or the like provided in the quantization parameter determination apparatus 10 of the neural network. However, the present disclosure is not limited thereto.
The memory 120 is hardware for storing various data processed in the quantization parameter determination device 10 of the neural network. For example, the memory 120 may store processed data and data to be processed in the quantization parameter determination device 10 of the neural network. The memory 120 may store data sets involved in the neural network operation process that the processor 110 has processed or is to process, e.g., data of an untrained initial neural network, intermediate data of a neural network generated during training, data of a neural network that has completed all training, data of a quantized neural network, etc. Further, the memory 120 may store an application, a driver, or the like to be driven by the quantization parameter determining device 10 of the neural network. For example: the memory 120 may store various programs related to training algorithms, quantization algorithms, etc. of the neural network to be executed by the processor 110. The memory 120 may be a DRAM, but the present disclosure is not limited thereto. The memory 120 may include at least one of volatile memory or nonvolatile memory. The nonvolatile memory may include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), flash memory, phase change RAM (PRAM), magnetic RAM (MRAM), resistive RAM (RRAM), ferroelectric RAM (FRAM), and the like. Volatile memory can include Dynamic RAM (DRAM), static RAM (SRAM), synchronous DRAM (SDRAM), PRAM, MRAM, RRAM, ferroelectric RAM (FeRAM), and the like. In an embodiment, the memory 120 may include at least one of a Hard Disk Drive (HDD), a Solid State Drive (SSD), a high density flash memory (CF), a Secure Digital (SD) card, a Micro-secure digital (Micro-SD) card, a Mini-secure digital (Mini-SD) card, an extreme digital (xD) card, a cache (cache), or a memory stick.
The processor 110 may generate a trained neural network by iteratively training (learning) a given initial neural network. In this state, the parameters of the initial neural network are in a high-precision data representation format, for example, a data representation format having 32-bit floating point precision, in the sense of ensuring the processing accuracy of the neural network. Parameters may include various types of data input/output to/from the neural network, such as: input/output neurons of the neural network, weights, biases, etc. In contrast to fixed point operations, floating point operations require a relatively large number of operations and relatively frequent memory accesses. In particular, most of the operations required for neural network processing are known as various convolution operations. Thus, in mobile devices with relatively low processing capabilities (such as smartphones, tablets, wearable devices, etc., embedded devices, etc.), neural network high-precision data operations may make the resources of the mobile device underutilized. As a result, the amount of computation in the above-described apparatus can be sufficiently reduced in order to drive the neural network operation within the allowable accuracy loss range, and the high-accuracy data involved in the neural network operation can be quantized and converted into a low-accuracy fixed-point number.
In consideration of processing performance of devices such as mobile devices, embedded devices, and the like, where the neural network is deployed, the quantization parameter determining apparatus 10 of the neural network performs quantization of converting parameters of the trained neural network into fixed points having a specific number of bits, and the quantization parameter determining apparatus 10 of the neural network transmits corresponding quantization parameters to the devices where the neural network is deployed so as to be fixed point number operation when the artificial intelligence processor chip performs operation operations such as training, trimming, and the like. The device deploying the neural network may be an autonomous vehicle, a robot, a smart phone, a tablet device, an Augmented Reality (AR) device, an internet of things (IoT) device, or the like that performs voice recognition, image recognition, or the like by using the neural network, but the present disclosure is not limited thereto.
The processor 110 retrieves data from the memory 120 during operation of the neural network. The data comprises at least one data of neurons, weights, offsets and gradients, corresponding quantization parameters are determined by using the technical scheme shown in fig. 2, and target data in the neural network operation process is quantized by using the quantization parameters. And executing the neural network operation on the quantized data. The arithmetic operations include, but are not limited to, training, fine tuning, reasoning.
The processor 110 is based on the quantization error diff bit The data bit width n is adjusted and the processor 110 may execute the program of the method of the target iteration interval shown in fig. 6, 7 and 8 to determine the target iteration interval of the data bit width or the target iteration interval of the quantization parameter.
In summary, the specific functions of the memory 120 and the processor 110 of the quantization parameter determining device for a neural network provided in the embodiments of the present disclosure may be explained in comparison with the previous embodiments in the present disclosure, and the technical effects of the previous embodiments may be achieved, which will not be repeated herein.
In this embodiment, the processor 110 may be implemented in any suitable manner. For example, the processor 110 may take the form of, for example, a microprocessor or processor, and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (ApplicationSpecific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, among others.
As shown in fig. 10, an application diagram of the quantization parameter determining device of the neural network provided in the present application to an artificial intelligence processor chip is shown. Referring to fig. 10, as described above, in the quantization parameter determining apparatus 10 of the neural network such as a PC, a server, etc., the processor 110 performs quantization operation to quantize floating point data involved in the operation of the neural network into fixed point numbers, and the fixed point arithmetic unit on the artificial intelligence processor chip performs training, fine tuning, or reasoning using the fixed point numbers obtained by the quantization. An artificial intelligence processor chip is dedicated hardware for driving a neural network. Because the artificial intelligent processor chip is realized with relatively lower power or performance, the neural network operation is realized by adopting the low-precision fixed point number by utilizing the technical scheme, compared with high-precision data, the memory bandwidth required by reading the low-precision fixed point number is smaller, and the caches of the artificial intelligent processor chip can be better used, so that the memory access bottleneck is avoided. Meanwhile, when the SIMD instruction is executed on the artificial intelligence processor chip, more calculation is realized in one clock period, and the neural network operation is faster to execute.
Further, as seen from the comparison between fixed-point operations and high-precision data operations with the same length, especially fixed-point operations and floating-point operations, the floating-point operation calculation mode is more complex, and more logic devices are needed to form the floating-point operator. Thus, the volume of the floating point operator is larger than that of the fixed point operator in terms of volume. Also, floating point operators require more resources to handle, and the power consumption gap between fixed point operations and floating point operations is often orders of magnitude.
In summary, according to the technical scheme, the floating point arithmetic unit on the artificial intelligent processor chip can be replaced by the fixed point arithmetic unit, so that the power consumption of the artificial intelligent processor chip is lower. This is particularly important for mobile devices. That is, the technical proposal opens a large number of air-guide pipes which can not be transported efficientlyLine floating point computing codeEmbedded typeThe system gate makes the world wide application of the Internet of things possible.
In the present technical solution, the artificial intelligence processor chip may correspond to, for example, a Neural Processing Unit (NPU), a Tensor Processing Unit (TPU), a neural engine, etc., which are dedicated chips for driving the neural network, but the present disclosure is not limited thereto.
In this embodiment, the artificial intelligence processor chip may be implemented in a separate device independent of the quantization parameter determining device 10 of the neural network, and the quantization parameter determining device 10 of the neural network may also be implemented as a part of the functional modules of the artificial intelligence processor chip. The present disclosure is not limited thereto.
In the technical scheme, an operating system of a general processor (such as a CPU) generates an instruction based on the technical scheme, the generated instruction is sent to an artificial intelligent processor chip (such as a GPU), and the artificial intelligent processor chip executes the instruction operation to determine the quantization parameters of the neural network and perform the quantization process. In another application, the general processor directly determines the corresponding quantization parameter based on the technical scheme, the general processor directly quantizes the corresponding target data according to the quantization parameter, and the artificial intelligent processor chip executes fixed-point operation by using the quantized data. Furthermore, the general-purpose processor (such as a CPU) and the artificial intelligent processor chip (such as a GPU) operate in a pipelining mode, an operating system of the general-purpose processor (such as the CPU) generates instructions based on the technical scheme, and the artificial intelligent processor chip (such as the GPU) performs neural network operation while copying target data, so that certain time consumption can be hidden. The present disclosure is not limited thereto.
In this embodiment, the embodiment of the present application further provides a readable storage medium having stored thereon a computer program that when executed implements the method for determining quantization parameters of a neural network described above.
From the above, in the neural network operation process, the technical scheme disclosed by the disclosure is utilized to determine the quantization parameter during quantization, and the quantization parameter is used for the artificial intelligent processor to quantize the data in the neural network operation process, convert the high-precision data into the low-precision fixed point number, and can reduce the size of all the space of the data storage involved in the neural network operation process. For example: conversion of float32 to fix8 can reduce model parameters by a factor of 4. Because the data storage space is reduced, a smaller space is used when the neural network is deployed, so that the on-chip memory on the artificial intelligent processor chip can accommodate more data, access to the data of the artificial intelligent processor chip is reduced, and the calculation performance is improved.
Those skilled in the art will also appreciate that, in addition to implementing clients and servers in pure computer readable program code, it is well possible to implement the same functions by logically programming method steps such that clients and servers are implemented in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, and the like. Such clients and servers may therefore be considered as one hardware component, and the means included therein for performing various functions may also be considered as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
As shown in fig. 11, a functional block diagram of a quantization parameter determining device of a neural network is provided. The method comprises the following steps:
a statistical result obtaining unit a, configured to obtain a statistical result of each type of data to be quantized; the data to be quantized comprises at least one data of neurons, weights, gradients and biases of the neural network;
a quantization parameter determining unit b, configured to determine a corresponding quantization parameter using a statistical result of each type of data to be quantized and a data bit width; the quantization parameter is used for the artificial intelligence processor to correspondingly quantize the data in the operation process of the neural network.
In this embodiment, optionally, the quantization parameter determining device of the neural network further includes:
and the first quantization unit is used for quantizing the data to be quantized by utilizing the corresponding quantization parameters.
In this embodiment, optionally, the quantization parameter determining device of the neural network further includes:
a second quantization unit for quantizing the target data using the corresponding quantization parameter; wherein, the characteristics of the target data and the characteristics of the data to be quantized have similarity.
In this embodiment, the neural network operation process includes at least one operation of neural network training, neural network reasoning, and neural network fine tuning.
In this embodiment, the statistical result obtained by the statistical unit is a maximum value and a minimum value in each data to be quantized.
In this embodiment, the statistical result obtained by the statistical unit is the maximum absolute value in each type of data to be quantized.
In this embodiment, the statistics unit determines the absolute value maximum from the maximum and the minimum in each data to be quantized.
In this embodiment, the quantization parameter determining unit determines the quantization parameter according to the maximum value, the minimum value, and the data bit width in each type of data to be quantized.
In this embodiment, the quantization parameter determining unit determines the quantization parameter from the data bit width, which is the maximum value of the absolute value in each of the data to be quantized.
In this embodiment, the quantization parameter determined by the quantization parameter determining unit is a point location parameter or a first scaling factor.
In this embodiment, the quantization parameter determining unit determines the first scaling factor based on a point position parameter and a second scaling factor; the point location parameter used in determining the first scaling factor is a known fixed value, or the integral result of multiplying the point location parameter and the corresponding second scaling factor is used as the first scaling factor to be applied to data quantization in the neural network operation process.
In this embodiment, the quantization parameter determined by the quantization parameter determining unit includes a point position parameter and a second scaling factor.
In this embodiment, the quantization parameter determining unit determines the second scaling factor according to the point location parameter, the statistical result, and the data bit width.
In this embodiment, the quantization parameter determined by the quantization parameter determining unit further includes an offset.
In this embodiment, the quantization parameter determining unit determines the offset according to the statistical result of each data to be quantized.
In this embodiment, the data bit width used by the quantization parameter determining unit is a preset value.
In this embodiment, the quantization parameter determining unit includes an adjustment module and a quantization error determining module; wherein,
the adjusting module is used for adjusting the data bit width according to the corresponding quantization error;
the quantization error determining module is configured to determine the quantization error according to quantized data and corresponding pre-quantized data.
In this embodiment, the adjustment module is specifically configured to:
comparing the quantization error with a threshold value, and adjusting the data bit width according to a comparison result; wherein the threshold comprises at least one of a first threshold and a second threshold.
In this embodiment, the adjustment module includes a first adjustment submodule, where the first adjustment submodule is configured to:
and if the quantization error is greater than or equal to the first threshold value, increasing the data bit width.
In this embodiment, the adjusting module includes a second adjusting submodule, where the second adjusting submodule is configured to:
and if the quantization error is smaller than or equal to the second threshold value, reducing the data bit width.
In this embodiment, the adjustment module includes a third adjustment submodule, where the third adjustment submodule is configured to:
the quantization error is between the first threshold and the second threshold, the data bit width remains unchanged.
In this embodiment, the quantization error determining module includes:
a quantization interval determination submodule for determining a quantization interval according to the data bit width;
and the first quantization error determination submodule is used for determining quantization errors according to the quantization interval, the number of the quantized data and the corresponding data before quantization.
In this embodiment, the quantization error determining module includes:
the inverse quantization data determining submodule is used for carrying out inverse quantization on quantized data to obtain inverse quantization data; wherein, the data format of the inverse quantization data is the same as the data format of the corresponding data before quantization;
And the second quantization error determination submodule is used for determining quantization errors according to the quantized data and the corresponding inverse quantization data.
In this embodiment, the pre-quantization data used by the quantization error determination module is the data to be quantized.
In this embodiment, the pre-quantization data used by the quantization error determining module is data to be quantized involved in a weight updating iteration process within a target iteration interval; the target iteration interval comprises at least one weight updating iteration, and the same data bit width is adopted in the quantization process in the same target iteration interval.
In this embodiment, the quantization parameter determining apparatus of a neural network further includes a first target iteration interval determining unit; wherein the first target iteration interval determining unit includes:
the first change trend value determining module is used for determining a change trend value of point position parameters of the data to be quantized, which are involved in the weight updating iteration process, at a pre-judging time point; the pre-judging time point is used for judging whether the data bit width needs to be adjusted or not, and corresponds to the time point when the weight updating iteration is completed;
And the first target iteration interval module is used for determining the corresponding target iteration interval according to the change trend value of the point location parameter.
In this embodiment, the first target iteration interval determination unit includes:
the second change trend value determining module is used for determining a change trend value of point position parameters and a change trend value of data bit width of the data to be quantized, which are involved in the weight updating iteration process, at a pre-judging time point; the pre-judging time point is used for judging whether the data bit width needs to be adjusted or not, and corresponds to the time point when the weight updating iteration is completed;
and the second target iteration interval module is used for determining the corresponding target iteration interval according to the change trend value of the point location parameter and the change trend value of the data bit width.
In this embodiment, the first target iteration interval determining unit further includes a first pre-determination time point determining unit; wherein,
the first pre-judgment time point determining unit is used for determining the first pre-judgment time point according to the target iteration interval.
In this embodiment, the first target iteration interval determining unit further includes a second pre-judgment time point determining unit; the second pre-judging time point determining unit is used for determining a second pre-judging time point according to the data fluctuation range curve; the data fluctuation range curve is obtained by counting the data fluctuation range conditions in the weight updating iterative process.
In this embodiment, the first trend value determining module and the second trend value determining module determine the trend value of the point location parameter according to a sliding average value of the point location parameter corresponding to the current pre-determination time point and a sliding average value of the point location parameter corresponding to the previous pre-determination time point.
In this embodiment, the first trend value determining module and the second trend value determining module determine the trend value of the point location parameter according to the point location parameter corresponding to the current pre-determination time point and the sliding average value of the point location parameter corresponding to the previous pre-determination time point.
In this embodiment, the first trend value determining module and the second trend value determining module each include:
the point position parameter determining sub-module is used for determining the point position parameter corresponding to the current pre-judging time point according to the point position parameter corresponding to the last pre-judging time point and the adjustment value of the data bit width;
the adjustment result determining submodule is used for adjusting the sliding average value of the point position parameters corresponding to the previous pre-judging time point according to the adjustment value of the data bit width to obtain an adjustment result;
And the first sliding average value determining sub-module is used for determining the sliding average value of the point position parameter corresponding to the current pre-judging time point according to the point position parameter corresponding to the current pre-judging time point and the adjustment result.
In this embodiment, the first trend value determining module and the second trend value determining module each include:
the middle result determining sub-module is used for determining a middle result of the sliding average value of the point position parameters corresponding to the current pre-judging time point according to the point position parameters corresponding to the previous pre-judging time point and the sliding average value of the point position parameters corresponding to the previous pre-judging time point;
and the second moving average value determining sub-module is used for determining the moving average value of the point position parameter corresponding to the current pre-judging time point according to the intermediate result of the moving average value of the point position parameter corresponding to the current pre-judging time point and the adjustment value of the data bit width.
In this embodiment, the second trend value determining module determines the trend value of the data bit width according to the quantization error.
In this embodiment, the first target iteration interval determining unit further includes:
a quantization error determining module for determining a corresponding quantization error; the data before quantization corresponding to the quantization error is data to be quantized involved in the weight updating iterative process corresponding to the pre-judging time point;
And the data bit width determining module is used for determining the data bit width adopted in the quantization process in the target iteration interval according to the corresponding quantization error.
In this embodiment, the data bit width determining module is specifically configured to:
and comparing the quantization error with a threshold value, and adjusting the data bit width adopted in the quantization process in the previous target iteration interval according to the comparison result, wherein the adjustment result is used as the data bit width adopted in the quantization process in the current target iteration interval.
In this embodiment, the pre-quantization data used by the quantization error determining module is data to be quantized involved in a weight update iteration within a target iteration interval; the target iteration interval comprises at least one weight updating iteration, and the same quantization parameter is adopted in the quantization process in the same target iteration interval.
In this embodiment, the quantization parameter determining apparatus of a neural network further includes a second target iteration interval determining unit; wherein the second target iteration interval determining unit includes:
the third change trend value determining module is used for determining the change trend value of the point position parameter of the data to be quantized, which is related in the weight updating iteration process, at the pre-judging time point; the pre-judging time point is used for judging whether the quantization parameter needs to be adjusted or not, and corresponds to the time point when the weight updating iteration is completed;
And the third target iteration interval module is used for determining the corresponding target iteration interval according to the change trend value of the point location parameter.
In this embodiment, the quantization parameter determining unit determines the point location parameter based on the statistical result and the data bit width.
It should be understood that the apparatus embodiments described above are illustrative only and that the device of the present disclosure may be implemented in other ways. For example, the division of the units/modules in the above embodiments is merely a logic function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted or not performed.
The units or modules described as separate components may or may not be physically separate. The components described as units or modules may be physical units, may be located in one apparatus, or may be distributed over a plurality of apparatuses. The embodiments of the present disclosure may be implemented by selecting some or all of the units according to actual needs.
In addition, unless specifically stated, each functional unit/module in the embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may be integrated together. The integrated units/modules described above may be implemented either in hardware or in software program modules.
It should be understood that the above-described device embodiments are merely illustrative and that the device of the present disclosure may be implemented in other ways. For example, the division of the units/modules in the above embodiments is merely a logic function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted or not performed.
The units or modules described as separate components may or may not be physically separate. The components described as units or modules may be physical units, may be located in one apparatus, or may be distributed over a plurality of apparatuses. The embodiments of the present disclosure may be implemented by selecting some or all of the units according to actual needs.
In addition, unless specifically stated, each functional unit/module in the embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may be integrated together. The integrated units/modules described above may be implemented either in hardware or in software program modules.
The integrated units/modules, if implemented in hardware, may be digital circuits, analog circuits, etc. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. The artificial intelligence processor may be any suitable hardware processor, unless otherwise specified, such as: CPU, GPU, FPGA, DSP and ASIC, etc. The storage unit may be any suitable magnetic or magneto-optical storage medium, unless otherwise indicated, such as: a resistive Random Access Memory RRAM (Resistive Random Access Memory), a dynamic Random Access Memory DRAM (Dynamic Random Access Memory), a Static Random Access Memory SRAM (Static Random-Access Memory), an enhanced dynamic Random Access Memory EDRAM (Enhanced Dynamic Random Access Memory), a High-Bandwidth Memory HBM (High-Bandwidth Memory), a hybrid Memory cube HMC (Hybrid Memory Cube), and the like.
The integrated units/modules may be stored in a computer readable memory if implemented in the form of software program modules and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method described in the various embodiments of the present disclosure. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In this technical scheme, this disclosure still discloses an artificial intelligence chip, and it includes above-mentioned neural network's quantization parameter determination equipment.
In the technical scheme, the disclosure also discloses a board card, which comprises a storage device, an interface device, a control device and the artificial intelligent chip; wherein the artificial intelligent chip is respectively connected with the storage device, the control device and the interface device; the storage device is used for storing data; the interface device is used for realizing data transmission between the artificial intelligent chip and external equipment; the control device is used for monitoring the state of the artificial intelligent chip.
Fig. 12 shows a block diagram of a board according to an embodiment of the present disclosure, referring to fig. 12, which may include other mating components in addition to the chip 389, including but not limited to: a memory device 390, an interface device 391 and a control device 392;
the memory device 390 is connected to the artificial intelligence chip through a bus for storing data. The memory device may include multiple sets of memory cells 393. Each group of storage units is connected with the artificial intelligent chip through a bus. It is understood that each set of memory cells may be DDR SDRAM (English: double Data Rate SDRAM, double Rate synchronous dynamic random Access memory).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on both the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the memory device may include 4 sets of the memory cells. Each set of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the artificial intelligence chip may include 4 72-bit DDR4 controllers therein, where 64 bits of the 72-bit DDR4 controllers are used to transfer data and 8 bits are used for ECC verification. It is understood that the theoretical bandwidth of data transfer can reach 25600MB/s when DDR4-3200 granules are employed in each set of memory cells.
In one embodiment, each set of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each storage unit.
The interface device is electrically connected with the artificial intelligent chip. The interface device is used for realizing data transmission between the artificial intelligent chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X10 interface transmission is adopted, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device may be another interface, and the disclosure is not limited to the specific form of the other interface, and the interface unit may be capable of implementing a switching function. In addition, the results of the computation of the artificial intelligence chip are still transmitted back to the external device (e.g., server) by the interface device.
The control device is electrically connected with the artificial intelligence chip. The control device is used for monitoring the state of the artificial intelligent chip. Specifically, the artificial intelligent chip and the control device can be electrically connected through an SPI interface. The control device may comprise a single chip microcomputer (Micro Controller Unit, MCU). The artificial intelligent chip can comprise a plurality of processing chips, a plurality of processing cores or a plurality of processing circuits, and can drive a plurality of loads. Therefore, the artificial intelligent chip can be in different working states such as multi-load and light-load. The control device can regulate and control the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the artificial intelligent chip.
In one possible implementation, an electronic device is disclosed that includes the artificial intelligence chip described above. The electronic device includes a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, an intelligent terminal, a cell phone, a vehicle recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle comprises an aircraft, a ship and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.
The foregoing may be better understood in light of the following clauses:
A1. a method of determining quantization parameters for a neural network, the method comprising:
traversing operators in a computational graph corresponding to the neural network, and selecting a current operator and an operator to be fused from the computational graph;
determining a split size according to the available storage capacity of the on-chip memory of the artificial intelligence processor;
splitting the output data of the operator to be fused into a plurality of data blocks according to the splitting size;
mapping to obtain the size of the data block of the input data of the current operator and the size of the data block of the intermediate data between the current operator and the operator to be fused based on the size of the data block of the output data of the operator to be fused;
the data blocks of the output data of the operator to be fused, the corresponding data blocks of the input data of the current operator and the data blocks of the intermediate data between the current operator and the operator to be fused are used as data to be quantized, and a statistical result of each type of data to be quantized is obtained; the data to be quantized comprises at least one data of neurons, weights, gradients and biases of the neural network;
Determining corresponding quantization parameters by using the statistical result of each type of data to be quantized and the data bit width; the quantization parameter is used for correspondingly quantizing the data in the operation process of the neural network by the artificial intelligence processor; the quantization parameter is a point location parameter.
A2. The method of A1, the method further comprising:
and quantizing the data to be quantized by using the corresponding quantization parameters.
A3. The method of A1 or A2, the method further comprising:
quantizing the target data by using the corresponding quantization parameters; wherein, the characteristics of the target data and the characteristics of the data to be quantized have similarity.
A4. The method of A1, wherein the neural network operation process comprises at least one operation of neural network training, neural network reasoning and neural network fine tuning.
A5. The method of A1, wherein the statistics are maximum and minimum values in each type of data to be quantized.
A6. The method of A1, wherein the statistics are absolute maximum values in each type of data to be quantized.
A7. The method of A6, wherein the absolute maximum is determined based on a maximum and a minimum in each of the data to be quantized.
A8. The method of A5, wherein the quantization parameter is determined according to a maximum value, a minimum value, and the data bit width in each data to be quantized.
A9. The method of A6 or A7, wherein the quantization parameter is determined according to the maximum absolute value in each data to be quantized, and the data bit width.
A10. The method of A1, wherein the data bit width is a predetermined value.
A11. The method of A1, wherein the data bit width is adjusted according to a corresponding quantization error; wherein the quantization error is determined according to the quantized data and the corresponding pre-quantized data.
A12. The method of a11, the step of adjusting the data bit width comprising:
comparing the quantization error with a threshold value, and adjusting the data bit width according to a comparison result; wherein the threshold comprises at least one of a first threshold and a second threshold.
A13. The method of a12, the step of adjusting the data bit width comprising:
and if the quantization error is greater than or equal to the first threshold value, increasing the data bit width.
A14. The method of a12, the step of adjusting the data bit width comprising:
and if the quantization error is smaller than or equal to the second threshold value, reducing the data bit width.
A15. The method of a12, the step of adjusting the data bit width comprising:
the quantization error is between the first threshold and the second threshold, the data bit width remains unchanged.
A16. The method of a11, the method for obtaining a quantization error includes:
determining a quantization interval according to the data bit width;
and determining quantization errors according to the quantization intervals, the number of the quantized data and the corresponding data before quantization.
A17. The method of a11, the method for obtaining a quantization error includes:
performing inverse quantization on the quantized data to obtain inverse quantized data; wherein, the data format of the inverse quantization data is the same as the data format of the corresponding data before quantization;
and determining quantization errors according to the quantized data and the corresponding inverse quantized data.
A18. The method of a11, wherein the pre-quantized data is the data to be quantized.
A19. The method of a11, wherein the pre-quantized data is data to be quantized involved in a weight update iteration process within a target iteration interval; the target iteration interval comprises at least one weight updating iteration, and the same data bit width is adopted in the quantization process in the same target iteration interval.
A20. The method of a19, the determining of the target iteration interval comprising:
determining the change trend value of the point position parameter of the data to be quantized, which is involved in the weight updating iterative process, at a pre-judging time point; the pre-judging time point is used for judging whether the data bit width needs to be adjusted or not, and corresponds to the time point when the weight updating iteration is completed;
And determining the corresponding target iteration interval according to the change trend value of the point location parameter.
A21. The method of a19, the determining of the target iteration interval comprising:
determining the change trend value of point position parameters and the change trend value of data bit width of the data to be quantized, which are involved in the weight updating iterative process, at a pre-judging time point; the pre-judging time point is used for judging whether the data bit width needs to be adjusted or not, and corresponds to the time point when the weight updating iteration is completed;
and determining the corresponding target iteration interval according to the change trend value of the point location parameter and the change trend value of the data bit width.
A22. The method of a20 or a21, the pre-determined time point comprising a first pre-determined time point; wherein the first predetermined point in time is determined based on the target iteration interval.
A23. The method of a22, the pre-determined time point further comprising a second pre-determined time point; wherein the second pre-judging time point is determined according to a data fluctuation range curve; the data fluctuation range curve is obtained by counting the data fluctuation range conditions in the weight updating iterative process.
A24. The method according to any one of a20 to a23, wherein the change trend value of the point location parameter is determined according to a sliding average value of the point location parameter corresponding to the current pre-determination time point and a sliding average value of the point location parameter corresponding to the previous pre-determination time point.
A25. The method according to any one of a20 to a23, wherein the change trend value of the point location parameter is determined according to the point location parameter corresponding to the current pre-determination time point and the sliding average value of the point location parameter corresponding to the previous pre-determination time point.
A26. The method of a24, wherein the step of determining the sliding average value of the point location parameter corresponding to the current pre-determination time point includes:
determining point position parameters corresponding to the current pre-judging time point according to the point position parameters corresponding to the last pre-judging time point and the adjustment value of the data bit width;
according to the adjustment value of the data bit width, adjusting the sliding average value of the point position parameters corresponding to the last pre-judging time point to obtain an adjustment result;
and determining a sliding average value of the point position parameters corresponding to the current pre-judgment time point according to the point position parameters corresponding to the current pre-judgment time point and the adjustment result.
A27. The method of a24, wherein the step of determining the sliding average value of the point location parameter corresponding to the current pre-determination time point includes:
Determining an intermediate result of the sliding average value of the point position parameter corresponding to the current pre-judging time point according to the point position parameter corresponding to the last pre-judging time point and the sliding average value of the point position parameter corresponding to the last pre-judging time point;
and determining the sliding average value of the point position parameters corresponding to the current pre-judging time point according to the intermediate result of the sliding average value of the point position parameters corresponding to the current pre-judging time point and the adjustment value of the data bit width.
A28. The method of a21, wherein the trend value of the data bit width is determined according to the quantization error.
A29. The method according to any one of a20 to a23, wherein the step of determining the data bit width used in the quantization process in the target iteration interval includes:
determining a corresponding quantization error; the data before quantization corresponding to the quantization error is data to be quantized involved in the weight updating iterative process corresponding to the pre-judging time point;
and determining the data bit width adopted in the quantization process in the target iteration interval according to the corresponding quantization error.
A30. The method of a29, the step of determining the data bit width employed in the quantization process within the target iteration interval comprising:
and comparing the quantization error with a threshold value, and adjusting the data bit width adopted in the quantization process in the previous target iteration interval according to the comparison result, wherein the adjustment result is used as the data bit width adopted in the quantization process in the current target iteration interval.
A31. The method of a11, wherein the pre-quantization data is data to be quantized involved in weight update iteration within a target iteration interval; the target iteration interval comprises at least one weight updating iteration, and the same quantization parameter is adopted in the quantization process in the same target iteration interval.
A32. The method of claim 31, the step of determining the target iteration interval comprising:
determining the change trend value of the point position parameter of the data to be quantized, which is involved in the weight updating iterative process, at a pre-judging time point; the pre-judging time point is used for judging whether the quantization parameter needs to be adjusted or not, and corresponds to the time point when the weight updating iteration is completed;
and determining the corresponding target iteration interval according to the change trend value of the point location parameter.
A33. The method of A1, wherein the point location parameter is determined based on statistics and the data bit width.
A34. The method of A1, the method further comprising:
judging whether to fuse the current operator with an operator to be fused or not based on the splitting size, the size of a data block of input data of the current operator and the size of a data block of intermediate data between the current operator and the operator to be fused;
And determining the splitting size according to the judging result.
A35. The method of a34, wherein the step of determining the split size according to the determination result includes:
the judgment result is that the current operator and the operator to be fused cannot be fused together, the current splitting size is adjusted, and the output data of the operator to be fused is split into corresponding data blocks according to the adjusted splitting size;
and mapping to obtain a data block of input data of the current operator and a data block of intermediate data between the current operator and the operator to be fused based on the data block of the operator to be fused.
A36. The method of A1, wherein a data flow between the current operator and the operator to be fused is unidirectional.
A37. A quantization parameter determination apparatus for a neural network, comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, the processor implementing the steps of the method of any one of A1 to a36 when the computer program is executed.
A38. A computer readable storage medium having stored therein a computer program which, when executed, implements the steps of the method of any of claims A1-a 36.
The embodiments of the present disclosure have been described above, and the above description is exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (38)

1. A method for determining quantization parameters of a neural network, the method comprising:
traversing operators in a computational graph corresponding to the neural network, and selecting a current operator and an operator to be fused from the computational graph;
determining a split size according to the available storage capacity of the on-chip memory of the artificial intelligence processor;
splitting the output data of the operator to be fused into a plurality of data blocks according to the splitting size;
mapping to obtain the size of the data block of the input data of the current operator and the size of the data block of the intermediate data between the current operator and the operator to be fused based on the size of the data block of the output data of the operator to be fused;
The data blocks of the output data of the operator to be fused, the corresponding data blocks of the input data of the current operator and the data blocks of the intermediate data between the current operator and the operator to be fused are used as data to be quantized, and a statistical result of each type of data to be quantized is obtained; the data to be quantized comprises at least one data of neurons, weights, gradients and biases of the neural network;
determining corresponding quantization parameters by using the statistical result of each type of data to be quantized and the data bit width; the quantization parameter is used for correspondingly quantizing the data in the operation process of the neural network by the artificial intelligence processor; the quantization parameter is a point location parameter.
2. The method of claim 1, wherein the method further comprises:
and quantizing the data to be quantized by using the corresponding quantization parameters.
3. The method of claim 1 or 2, wherein the method further comprises:
quantizing the target data by using the corresponding quantization parameters; wherein, the characteristics of the target data and the characteristics of the data to be quantized have similarity.
4. The method of claim 1, wherein the neural network operation process comprises at least one of neural network training, neural network reasoning, neural network fine tuning.
5. The method of claim 1, wherein the statistics are a maximum and a minimum in each type of data to be quantized.
6. The method of claim 1, wherein the statistics are absolute maximum values in each type of data to be quantized.
7. The method of claim 6, wherein the absolute maximum is determined based on a maximum and a minimum in each of the data to be quantized.
8. The method of claim 5, wherein the quantization parameter is determined based on a maximum value, a minimum value, and the data bit width in each type of data to be quantized.
9. The method of claim 6 or 7, wherein the quantization parameter is determined based on the data bit width, the absolute maximum in each data to be quantized.
10. The method of claim 1, wherein the data bit width is a preset value.
11. The method of claim 1, wherein the data bit widths are adjusted according to corresponding quantization errors; wherein the quantization error is determined according to the quantized data and the corresponding pre-quantized data.
12. The method of claim 11, wherein the step of adjusting the data bit width comprises:
Comparing the quantization error with a threshold value, and adjusting the data bit width according to a comparison result; wherein the threshold comprises at least one of a first threshold and a second threshold.
13. The method of claim 12, wherein the step of adjusting the data bit width comprises:
and if the quantization error is greater than or equal to the first threshold value, increasing the data bit width.
14. The method of claim 12, wherein the step of adjusting the data bit width comprises:
and if the quantization error is smaller than or equal to the second threshold value, reducing the data bit width.
15. The method of claim 12, wherein the step of adjusting the data bit width comprises:
the quantization error is between the first threshold and the second threshold, the data bit width remains unchanged.
16. The method of claim 11, wherein the quantization error acquisition method comprises:
determining a quantization interval according to the data bit width;
and determining quantization errors according to the quantization intervals, the number of the quantized data and the corresponding data before quantization.
17. The method of claim 11, wherein the quantization error acquisition method comprises:
Performing inverse quantization on the quantized data to obtain inverse quantized data; wherein, the data format of the inverse quantization data is the same as the data format of the corresponding data before quantization;
and determining quantization errors according to the quantized data and the corresponding inverse quantized data.
18. The method of claim 11, wherein the pre-quantized data is the data to be quantized.
19. The method of claim 11, wherein the pre-quantized data is data to be quantized involved in a weight update iteration process within a target iteration interval; the target iteration interval comprises at least one weight updating iteration, and the same data bit width is adopted in the quantization process in the same target iteration interval.
20. The method of claim 19, wherein the step of determining the target iteration interval comprises:
determining the change trend value of the point position parameter of the data to be quantized, which is involved in the weight updating iterative process, at a pre-judging time point; the pre-judging time point is used for judging whether the data bit width needs to be adjusted or not, and corresponds to the time point when the weight updating iteration is completed;
And determining the corresponding target iteration interval according to the change trend value of the point location parameter.
21. The method of claim 19, wherein the step of determining the target iteration interval comprises:
determining the change trend value of point position parameters and the change trend value of data bit width of the data to be quantized, which are involved in the weight updating iterative process, at a pre-judging time point; the pre-judging time point is used for judging whether the data bit width needs to be adjusted or not, and corresponds to the time point when the weight updating iteration is completed;
and determining the corresponding target iteration interval according to the change trend value of the point location parameter and the change trend value of the data bit width.
22. The method of claim 20 or 21, wherein the pre-determined time point comprises a first pre-determined time point; wherein the first predetermined point in time is determined based on the target iteration interval.
23. The method of claim 22, wherein the pre-determined time point further comprises a second pre-determined time point; wherein the second pre-judging time point is determined according to a data fluctuation range curve; the data fluctuation range curve is obtained by counting the data fluctuation range conditions in the weight updating iterative process.
24. The method according to claim 20 or 21, wherein the change trend value of the point location parameter is determined according to a sliding average value of the point location parameter corresponding to the current pre-determination time point and a sliding average value of the point location parameter corresponding to the previous pre-determination time point.
25. The method according to claim 20 or 21, wherein the change trend value of the point location parameter is determined according to the point location parameter corresponding to the current pre-determination time point and the sliding average value of the point location parameter corresponding to the previous pre-determination time point.
26. The method of claim 24, wherein the step of determining a sliding average of the point location parameters corresponding to the current predetermined point in time comprises:
determining point position parameters corresponding to the current pre-judging time point according to the point position parameters corresponding to the last pre-judging time point and the adjustment value of the data bit width;
according to the adjustment value of the data bit width, adjusting the sliding average value of the point position parameters corresponding to the last pre-judging time point to obtain an adjustment result;
and determining a sliding average value of the point position parameters corresponding to the current pre-judgment time point according to the point position parameters corresponding to the current pre-judgment time point and the adjustment result.
27. The method of claim 24, wherein the step of determining a sliding average of the point location parameters corresponding to the current predetermined point in time comprises:
determining an intermediate result of the sliding average value of the point position parameter corresponding to the current pre-judging time point according to the point position parameter corresponding to the last pre-judging time point and the sliding average value of the point position parameter corresponding to the last pre-judging time point;
and determining the sliding average value of the point position parameters corresponding to the current pre-judging time point according to the intermediate result of the sliding average value of the point position parameters corresponding to the current pre-judging time point and the adjustment value of the data bit width.
28. The method of claim 21, wherein the trend data bit width value is determined based on a corresponding quantization error.
29. The method of claim 20 or 21, wherein the step of determining the data bit width employed in the quantization process within the target iteration interval comprises:
determining a corresponding quantization error; the data before quantization corresponding to the quantization error is data to be quantized involved in the weight updating iterative process corresponding to the pre-judging time point;
and determining the data bit width adopted in the quantization process in the target iteration interval according to the corresponding quantization error.
30. The method of claim 29, wherein the step of determining the data bit width employed in the quantization process within the target iteration interval comprises:
and comparing the quantization error with a threshold value, and adjusting the data bit width adopted in the quantization process in the previous target iteration interval according to the comparison result, wherein the adjustment result is used as the data bit width adopted in the quantization process in the current target iteration interval.
31. The method of claim 11, wherein the pre-quantization data is data to be quantized involved in a weight update iteration within a target iteration interval; the target iteration interval comprises at least one weight updating iteration, and the same quantization parameter is adopted in the quantization process in the same target iteration interval.
32. The method of claim 31, wherein the step of determining the target iteration interval comprises:
determining the change trend value of the point position parameter of the data to be quantized, which is involved in the weight updating iterative process, at a pre-judging time point; the pre-judging time point is used for judging whether the quantization parameter needs to be adjusted or not, and corresponds to the time point when the weight updating iteration is completed;
And determining the corresponding target iteration interval according to the change trend value of the point location parameter.
33. The method of claim 1, wherein the point location parameter is determined based on a result of the statistics and the data bit width.
34. The method of claim 1, wherein the method further comprises:
judging whether to fuse the current operator with an operator to be fused or not based on the splitting size, the size of a data block of input data of the current operator and the size of a data block of intermediate data between the current operator and the operator to be fused;
and determining the splitting size according to the judging result.
35. The method of claim 34, wherein the step of determining the split size based on the determination comprises:
the judgment result is that the current operator and the operator to be fused cannot be fused together, the current splitting size is adjusted, and the output data of the operator to be fused is split into corresponding data blocks according to the adjusted splitting size;
and mapping to obtain a data block of input data of the current operator and a data block of intermediate data between the current operator and the operator to be fused based on the data block of the operator to be fused.
36. The method of claim 1, wherein a data flow between the current operator and the operator to be fused is unidirectional.
37. A quantization parameter determining device for a neural network, comprising a memory and a processor, the memory storing a computer program executable on the processor, the processor implementing the steps of the method according to any one of claims 1 to 36 when the computer program is executed.
38. A computer readable storage medium having a computer program stored therein, wherein the computer program, when executed, implements the steps of the method according to any of claims 1-36.
CN201910888626.3A 2019-06-12 2019-09-19 Method for determining quantization parameter of neural network and related product Active CN112085186B (en)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
CN2019105052397 2019-06-12
CN201910505239 2019-06-12
CN2019105153557 2019-06-14
CN201910515355 2019-06-14
CN2019105285378 2019-06-18
CN201910528537 2019-06-18
CN201910570125 2019-06-27
CN2019105701250 2019-06-27

Publications (2)

Publication Number Publication Date
CN112085186A CN112085186A (en) 2020-12-15
CN112085186B true CN112085186B (en) 2024-03-05

Family

ID=69185300

Family Applications (14)

Application Number Title Priority Date Filing Date
CN201910959851.1A Active CN112085191B (en) 2019-06-12 2019-09-19 Method for determining quantization parameter of neural network and related product
CN201910888626.3A Active CN112085186B (en) 2019-06-12 2019-09-19 Method for determining quantization parameter of neural network and related product
CN201910886577.XA Active CN112085181B (en) 2019-06-12 2019-09-19 Neural network quantification method and device and related products
CN201910888150.3A Active CN112085185B (en) 2019-06-12 2019-09-19 Quantization parameter adjustment method and device and related product
CN201910960314.9A Active CN112085192B (en) 2019-06-12 2019-09-19 Method for determining quantization parameter of neural network and related product
CN201910959831.4A Active CN112085190B (en) 2019-06-12 2019-09-19 Method for determining quantization parameter of neural network and related product
CN201910959360.7A Active CN112085189B (en) 2019-06-12 2019-09-19 Method for determining quantization parameter of neural network and related product
CN201910887544.7A Active CN112085183B (en) 2019-06-12 2019-09-19 Neural network operation method and device and related products
CN201910889339.4A Active CN112085188B (en) 2019-06-12 2019-09-19 Method for determining quantization parameter of neural network and related product
CN201980005061.8A Pending CN112400176A (en) 2019-06-12 2019-09-19 Neural network quantitative parameter determination method and related product
CN201910887861.9A Active CN112085184B (en) 2019-06-12 2019-09-19 Quantization parameter adjustment method and device and related product
CN201910960385.9A Active CN112085193B (en) 2019-06-12 2019-09-19 Method for determining quantization parameter of neural network and related product
CN202010402271.5A Active CN111652368B (en) 2019-06-12 2020-05-13 Data processing method and related product
CN202010401876.2A Active CN111652367B (en) 2019-06-12 2020-05-13 Data processing method and related product

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201910959851.1A Active CN112085191B (en) 2019-06-12 2019-09-19 Method for determining quantization parameter of neural network and related product

Family Applications After (12)

Application Number Title Priority Date Filing Date
CN201910886577.XA Active CN112085181B (en) 2019-06-12 2019-09-19 Neural network quantification method and device and related products
CN201910888150.3A Active CN112085185B (en) 2019-06-12 2019-09-19 Quantization parameter adjustment method and device and related product
CN201910960314.9A Active CN112085192B (en) 2019-06-12 2019-09-19 Method for determining quantization parameter of neural network and related product
CN201910959831.4A Active CN112085190B (en) 2019-06-12 2019-09-19 Method for determining quantization parameter of neural network and related product
CN201910959360.7A Active CN112085189B (en) 2019-06-12 2019-09-19 Method for determining quantization parameter of neural network and related product
CN201910887544.7A Active CN112085183B (en) 2019-06-12 2019-09-19 Neural network operation method and device and related products
CN201910889339.4A Active CN112085188B (en) 2019-06-12 2019-09-19 Method for determining quantization parameter of neural network and related product
CN201980005061.8A Pending CN112400176A (en) 2019-06-12 2019-09-19 Neural network quantitative parameter determination method and related product
CN201910887861.9A Active CN112085184B (en) 2019-06-12 2019-09-19 Quantization parameter adjustment method and device and related product
CN201910960385.9A Active CN112085193B (en) 2019-06-12 2019-09-19 Method for determining quantization parameter of neural network and related product
CN202010402271.5A Active CN111652368B (en) 2019-06-12 2020-05-13 Data processing method and related product
CN202010401876.2A Active CN111652367B (en) 2019-06-12 2020-05-13 Data processing method and related product

Country Status (6)

Country Link
US (2) US11675676B2 (en)
EP (4) EP3772023A1 (en)
JP (3) JP2021530769A (en)
KR (2) KR20210018352A (en)
CN (14) CN112085191B (en)
WO (2) WO2020248423A1 (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11437032B2 (en) 2017-09-29 2022-09-06 Shanghai Cambricon Information Technology Co., Ltd Image processing apparatus and method
US11169803B2 (en) 2018-02-13 2021-11-09 Shanghai Cambricon Information Technology Co., Ltd. Computing device and method
CN116991225A (en) 2018-02-14 2023-11-03 上海寒武纪信息科技有限公司 Control device, method and equipment of processor
EP3798850A4 (en) 2018-06-27 2022-03-23 Shanghai Cambricon Information Technology Co., Ltd On-chip code breakpoint debugging method, on-chip processor, and chip breakpoint debugging system
US11507823B2 (en) * 2019-01-22 2022-11-22 Black Sesame Technologies Inc. Adaptive quantization and mixed precision in a network
US11676028B2 (en) 2019-06-12 2023-06-13 Shanghai Cambricon Information Technology Co., Ltd Neural network quantization parameter determination method and related products
CN110490309B (en) * 2019-08-14 2022-06-07 中科寒武纪科技股份有限公司 Operator fusion method for neural network and related product thereof
WO2021056180A1 (en) * 2019-09-24 2021-04-01 Baidu.Com Times Technology (Beijing) Co., Ltd. Cursor-based adaptive quantization for deep neural networks
JP7354736B2 (en) * 2019-09-30 2023-10-03 富士通株式会社 Information processing device, information processing method, information processing program
US11775611B2 (en) * 2019-11-01 2023-10-03 Samsung Electronics Co., Ltd. Piecewise quantization for neural networks
JP2021111081A (en) * 2020-01-09 2021-08-02 富士通株式会社 Information processing unit, operation program for neural network and operation method for neural network
US20210241183A1 (en) * 2020-01-31 2021-08-05 Hewlett Packard Enterprise Development Lp Adaptively synchronizing learning of multiple learning models
CN113741619B (en) * 2020-05-27 2024-03-12 安徽寒武纪信息科技有限公司 Clock control device and related product
CN112686001B (en) * 2021-01-05 2021-12-03 中科三清科技有限公司 Transformation method and transmission method of meteorological data, server and data transmission system
CN113220606B (en) * 2021-05-07 2021-11-26 珠海市芯动力科技有限公司 Neural network weight storage method, neural network weight reading method and related equipment
JP2023069780A (en) * 2021-11-08 2023-05-18 富士通株式会社 Arithmetic program, arithmetic method, and computing machine
WO2023128024A1 (en) * 2021-12-30 2023-07-06 한국전자기술연구원 Method and system for quantizing deep-learning network
KR20230136572A (en) * 2022-03-18 2023-09-26 인텔렉추얼디스커버리 주식회사 Neural network-based feature tensor compression method and apparatus
CN114611697B (en) * 2022-05-11 2022-09-09 上海登临科技有限公司 Neural network quantification and deployment method, system, electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229681A (en) * 2017-12-28 2018-06-29 郑州云海信息技术有限公司 A kind of neural network model compression method, system, device and readable storage medium storing program for executing
CN108427990A (en) * 2016-01-20 2018-08-21 北京中科寒武纪科技有限公司 Neural computing system and method
CN109740754A (en) * 2018-12-29 2019-05-10 北京中科寒武纪科技有限公司 Neural computing device, neural computing method and Related product
CN109740739A (en) * 2018-12-29 2019-05-10 北京中科寒武纪科技有限公司 Neural computing device, neural computing method and Related product

Family Cites Families (244)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63111768A (en) * 1986-10-30 1988-05-17 Nec Corp Image data quantizer
JPH0375860A (en) 1989-08-18 1991-03-29 Hitachi Ltd Personalized terminal
US5052043A (en) 1990-05-07 1991-09-24 Eastman Kodak Company Neural network with back propagation controlled through an output confidence measure
EP0509576B1 (en) * 1991-04-18 1998-01-28 Ampex Systems Corporation Method and apparatus for determining a quantizing factor for processes involving multiple compression/decompression of data
US6144977A (en) 1995-07-10 2000-11-07 Motorola, Inc. Circuit and method of converting a floating point number to a programmable fixed point number
GB9602701D0 (en) 1996-02-09 1996-04-10 Canon Kk Image manipulation
JPH10233691A (en) * 1998-03-30 1998-09-02 Nec Corp Encoding system and decoding system
US7242414B1 (en) 1999-07-30 2007-07-10 Mips Technologies, Inc. Processor having a compare extension of an instruction set architecture
JP2000293371A (en) 1999-04-09 2000-10-20 Hitachi Ltd Method and device for controlling microprogram
US6671796B1 (en) 2000-02-25 2003-12-30 Sun Microsystems, Inc. Converting an arbitrary fixed point value to a floating point value
US6931639B1 (en) 2000-08-24 2005-08-16 International Business Machines Corporation Method for implementing a variable-partitioned queue for simultaneous multithreaded processors
SK286661B6 (en) 2000-09-07 2009-03-05 Nippon Steel Corporation Hexavalent chromium-free surface-treating agent for Sn- or Al-based coated steel sheet, and surface treated steel sheet
US7062445B2 (en) * 2001-01-26 2006-06-13 Microsoft Corporation Quantization loop with heuristic approach
US20020138714A1 (en) 2001-03-22 2002-09-26 Sun Microsystems, Inc. Scoreboard for scheduling of instructions in a microprocessor that provides out of order execution
WO2002086817A1 (en) 2001-04-19 2002-10-31 Telefonaktiebolaget Lm Ericsson (Publ) Adaptive memory allocation
US20030167460A1 (en) 2002-02-26 2003-09-04 Desai Vipul Anil Processor instruction set simulation power estimation method
JP4148356B2 (en) * 2002-11-18 2008-09-10 学校法人東海大学 Quantization step parameter determination device, quantization step parameter determination method, quantization step parameter determination program, and nonlinear quantization method, nonlinear quantization device, and nonlinear quantization program
US7236995B2 (en) 2002-12-27 2007-06-26 Arm Limited Data processing apparatus and method for converting a number between fixed-point and floating-point representations
DE10316381A1 (en) 2003-04-10 2004-10-28 Bayer Technology Services Gmbh Procedure for training neural networks
JP3889738B2 (en) * 2003-09-26 2007-03-07 三洋電機株式会社 Inverse quantization apparatus, audio decoding apparatus, image decoding apparatus, inverse quantization method, and inverse quantization program
JP4202244B2 (en) 2003-12-22 2008-12-24 Necエレクトロニクス株式会社 VLIW DSP and method of operating the same
US20060161375A1 (en) 2004-12-30 2006-07-20 Allen Duberstein Optimizing processing speed based on measured temperatures
KR100762591B1 (en) * 2005-09-29 2007-10-01 엘지전자 주식회사 Quantization parameter decision of video codec
US7721128B2 (en) 2005-11-29 2010-05-18 International Business Machines Corporation Implementation of thermal throttling logic
WO2007116551A1 (en) * 2006-03-30 2007-10-18 Kabushiki Kaisha Toshiba Image coding apparatus and image coding method, and image decoding apparatus and image decoding method
CN1851668A (en) 2006-06-01 2006-10-25 北京天碁科技有限公司 Sheet system chip, sheet system chip tracking debug system and method
JP5224666B2 (en) * 2006-09-08 2013-07-03 株式会社東芝 Audio encoding device
DE102006059156B4 (en) 2006-12-14 2008-11-06 Advanced Micro Devices, Inc., Sunnyvale Method for testing an integrated circuit chip with at least two circuit cores and integrated circuit chip and test system
US20110060587A1 (en) 2007-03-07 2011-03-10 Phillips Michael S Command and control utilizing ancillary information in a mobile voice-to-speech application
US8560591B2 (en) 2007-04-25 2013-10-15 International Business Machines Corporation Detection of potential need to use a larger data format in performing floating point operations
US8051117B2 (en) 2007-04-26 2011-11-01 International Business Machines Corporation Shift significand of decimal floating point data
US8051118B2 (en) 2007-04-26 2011-11-01 International Business Machines Corporation Composition of decimal floating point data
US8190664B2 (en) 2007-04-26 2012-05-29 International Business Machines Corporation Employing a mask field of an instruction to encode a sign of a result of the instruction
JP5184824B2 (en) 2007-06-15 2013-04-17 キヤノン株式会社 Arithmetic processing apparatus and method
JP2009110353A (en) 2007-10-31 2009-05-21 Hitachi Ltd Microcontroller and control system
US7904287B2 (en) 2007-11-13 2011-03-08 International Business Machines Corporation Method and system for real-time prediction of power usage for a change to another performance state
JP4998794B2 (en) 2007-11-29 2012-08-15 Nkワークス株式会社 Image correction method and image correction apparatus
KR101518237B1 (en) * 2008-09-01 2015-05-15 삼성전자주식회사 Method and apparatus for inverse quantization, and method and apparatus for decoding of image
US20100073068A1 (en) 2008-09-22 2010-03-25 Hanwoo Cho Functional block level thermal control
CN101754490B (en) * 2008-12-17 2012-11-07 电信科学技术研究院 Data transmission method, system and device
CN101572829B (en) 2009-06-10 2011-02-02 中国联合网络通信集团有限公司 Method for monitoring IPTV video quality, device thereof and system thereof
EP2336882A1 (en) 2009-12-18 2011-06-22 Telefonaktiebolaget L M Ericsson (PUBL) Technique for run-time provision of executable code using off-device services
WO2011132277A1 (en) 2010-04-21 2011-10-27 トヨタ自動車株式会社 Controller for internal combustion engine
JP2011253374A (en) 2010-06-02 2011-12-15 Sony Corp Information processing device, information processing method and program
US8452463B2 (en) 2010-06-04 2013-05-28 Apple Inc. Adjusting the thermal behavior of a computing system using indirect information about ambient temperature
US8694572B2 (en) 2010-07-06 2014-04-08 Silminds, Llc, Egypt Decimal floating-point fused multiply-add unit
CN102622207B (en) * 2011-01-30 2015-07-22 中兴通讯股份有限公司 Fixed-point processing method and device
US8924455B1 (en) 2011-02-25 2014-12-30 Xilinx, Inc. Multiplication of matrices using systolic arrays
US9031341B2 (en) 2011-02-28 2015-05-12 Megachips Corporation Image coding apparatus
CN102761509B (en) 2011-04-27 2016-01-06 联芯科技有限公司 The receiving system of ofdm system and the method for reduction receiving system internal memory
AU2012253292B2 (en) 2011-05-12 2015-10-29 Apple Inc. Presence sensing
CN102789413B (en) 2011-05-23 2016-02-17 同济大学 A kind of debug system of concurrent program and method
US8594982B2 (en) 2011-06-09 2013-11-26 Pulsar Informatics, Inc. Systems and methods for distributed calculation of fatigue-risk prediction and optimization
CN102291773B (en) * 2011-07-18 2014-12-10 电信科学技术研究院 Data compression method and equipment
CN102404673B (en) 2011-11-24 2013-12-18 苏州上声电子有限公司 Channel balance and sound field control method and device of digitalized speaker system
CN103152673B (en) 2011-12-07 2015-07-08 中国科学院声学研究所 Digital loudspeaker drive method and device based on quaternary code dynamic mismatch reshaping
CN102684701B (en) 2012-04-27 2014-07-09 苏州上声电子有限公司 Method and device for driving digital speaker based on code conversion
DE102012009502A1 (en) 2012-05-14 2013-11-14 Kisters Ag Method for training an artificial neural network
US9417891B2 (en) 2012-06-11 2016-08-16 Vmware, Inc. Unified storage/VDI provisioning methodology
US9224089B2 (en) * 2012-08-07 2015-12-29 Qualcomm Incorporated Method and apparatus for adaptive bit-allocation in neural systems
US9063731B2 (en) 2012-08-27 2015-06-23 Samsung Electronics Co., Ltd. Ultra low power apparatus and method to wake up a main processor
CN102903089B (en) 2012-09-07 2014-12-17 山东大学 Method for generating remote sensing image quick view under Linux environment
US9412366B2 (en) 2012-09-18 2016-08-09 Adobe Systems Incorporated Natural language image spatial and tonal localization
JP5913059B2 (en) 2012-11-13 2016-04-27 日本電信電話株式会社 Distributed wireless communication base station system, signal processing device, wireless device, and operation method of distributed wireless communication base station system
CN102981854A (en) 2012-11-16 2013-03-20 天津市天祥世联网络科技有限公司 Neural network optimization method based on floating number operation inline function library
CN105026445A (en) 2012-11-22 2015-11-04 学校法人庆应义塾 Acrylic copolymer, optical film, polarizing plate and liquid crystal display device
US9851977B2 (en) 2012-12-06 2017-12-26 Kalray Apparatus and method for combining thread warps with compatible execution masks for simultaneous execution and increased lane utilization
US9720732B1 (en) 2013-02-11 2017-08-01 Amazon Technologies, Inc. Parameter selection for optimization of task execution based on execution history for prior tasks
JP2014170295A (en) 2013-03-01 2014-09-18 Honda Motor Co Ltd Object recognition system and object recognition method
US20190138372A1 (en) 2013-04-29 2019-05-09 Moogsoft, Inc. System for managing an instructure with security
US20150063461A1 (en) * 2013-08-27 2015-03-05 Magnum Semiconductor, Inc. Methods and apparatuses for adjusting macroblock quantization parameters to improve visual quality for lossy video encoding
JP6184891B2 (en) 2014-03-12 2017-08-23 東芝メモリ株式会社 Information processing apparatus, semiconductor chip, information processing method, and program
CN105100810B (en) * 2014-05-16 2018-02-13 中国科学院声学研究所 Compression of images decompressing method and system in a kind of imaging sonar real time processing system
US9507405B2 (en) 2014-06-18 2016-11-29 Oracle International Corporation System and method for managing power in a chip multiprocessor using a proportional feedback mechanism
US10318882B2 (en) * 2014-09-11 2019-06-11 Amazon Technologies, Inc. Optimized training of linear machine learning models
US9575537B2 (en) 2014-07-25 2017-02-21 Intel Corporation Adaptive algorithm for thermal throttling of multi-core processors with non-homogeneous performance states
US10282100B2 (en) 2014-08-19 2019-05-07 Samsung Electronics Co., Ltd. Data management scheme in virtualized hyperscale environments
GB2524126B (en) 2014-08-28 2016-07-27 Imagination Tech Ltd Combining paths
US9916130B2 (en) 2014-11-03 2018-03-13 Arm Limited Apparatus and method for vector processing
FR3030077B1 (en) 2014-12-10 2016-12-02 Arnault Ioualalen METHOD OF ADJUSTING THE ACCURACY OF A COMPUTER PROGRAM HANDLING AT LEAST ONE VIRGUL NUMBER
EP3035204B1 (en) 2014-12-19 2018-08-15 Intel Corporation Storage device and method for performing convolution operations
US20170061279A1 (en) 2015-01-14 2017-03-02 Intel Corporation Updating an artificial neural network using flexible fixed point representation
US10262259B2 (en) * 2015-05-08 2019-04-16 Qualcomm Incorporated Bit width selection for fixed point neural networks
US10373050B2 (en) * 2015-05-08 2019-08-06 Qualcomm Incorporated Fixed point neural network based on floating point neural network quantization
US20160328645A1 (en) 2015-05-08 2016-11-10 Qualcomm Incorporated Reduced computational complexity for fixed point neural network
US10083395B2 (en) 2015-05-21 2018-09-25 Google Llc Batch processing in a neural network processor
CN104899641B (en) 2015-05-25 2018-07-13 杭州朗和科技有限公司 Deep neural network learning method, processor and deep neural network learning system
CN115100017A (en) 2015-06-10 2022-09-23 无比视视觉技术有限公司 Image processor and method for processing image
CN104978303B (en) 2015-06-19 2019-06-04 上海兆芯集成电路有限公司 The sensor hub and multisensor-multitarget tracking method of single-chip integration
CN106469291A (en) 2015-08-19 2017-03-01 中兴通讯股份有限公司 Image processing method and terminal
US10970617B2 (en) * 2015-08-21 2021-04-06 Institute Of Automation Chinese Academy Of Sciences Deep convolutional neural network acceleration and compression method based on parameter quantification
US10031765B2 (en) 2015-09-24 2018-07-24 Intel Corporation Instruction and logic for programmable fabric hierarchy and cache
US10812831B2 (en) 2015-09-30 2020-10-20 Piksel, Inc. Video stream delivery via adaptive quality enhancement using error correction models
US11061672B2 (en) 2015-10-02 2021-07-13 Via Alliance Semiconductor Co., Ltd. Chained split execution of fused compound arithmetic operations
CN106570559A (en) * 2015-10-09 2017-04-19 阿里巴巴集团控股有限公司 Data processing method and device based on neural network
JP2019505149A (en) 2015-11-17 2019-02-21 バヤニ, エマンBAYANI, Eman Digital image photographing apparatus system and method
CN105488565A (en) * 2015-11-17 2016-04-13 中国科学院计算技术研究所 Calculation apparatus and method for accelerator chip accelerating deep neural network algorithm
CN106814639A (en) 2015-11-27 2017-06-09 富泰华工业(深圳)有限公司 Speech control system and method
CN105893419A (en) 2015-11-30 2016-08-24 乐视致新电子科技(天津)有限公司 Generation device, device and equipment of multimedia photo, and mobile phone
US10699186B2 (en) 2015-12-02 2020-06-30 Google Llc Determining orders of execution of a neural network
CN106991478B (en) 2016-01-20 2020-05-08 中科寒武纪科技股份有限公司 Apparatus and method for performing artificial neural network reverse training
CN106997236B (en) 2016-01-25 2018-07-13 亮风台(上海)信息科技有限公司 Based on the multi-modal method and apparatus for inputting and interacting
US10803401B2 (en) 2016-01-27 2020-10-13 Microsoft Technology Licensing, Llc Artificial intelligence engine having multiple independent processes on a cloud based platform configured to scale
US10497089B2 (en) 2016-01-29 2019-12-03 Fotonation Limited Convolutional neural network
JP2017156511A (en) 2016-03-01 2017-09-07 ソニー株式会社 Information processing device, information processing method, and program
US10103714B2 (en) 2016-03-01 2018-10-16 Qualcomm Incorporated Adjust voltage for thermal mitigation
US10019779B2 (en) 2016-03-08 2018-07-10 Amazon Technologies, Inc. Browsing interface for item counterparts having different scales and lengths
CN109073339B (en) * 2016-03-31 2020-08-25 可利尔Px科技有限公司 Temperature control device and system with static cooling capability
CN109934331B (en) 2016-04-29 2020-06-19 中科寒武纪科技股份有限公司 Apparatus and method for performing artificial neural network forward operations
US10552119B2 (en) 2016-04-29 2020-02-04 Intel Corporation Dynamic management of numerical representation in a distributed matrix processor architecture
US10187568B1 (en) 2016-05-02 2019-01-22 Bao Tran Video smart phone
US11055063B2 (en) 2016-05-02 2021-07-06 Marvell Asia Pte, Ltd. Systems and methods for deep learning processor
GB201607713D0 (en) * 2016-05-03 2016-06-15 Imagination Tech Ltd Convolutional neural network
CN105978611B (en) 2016-05-12 2019-09-17 京信通信系统(中国)有限公司 A kind of frequency-region signal compression method and device
AU2016203619A1 (en) 2016-05-31 2017-12-14 Canon Kabushiki Kaisha Layer-based operations scheduling to optimise memory for CNN applications
EP3252949B1 (en) 2016-06-01 2020-03-18 Intel IP Corporation Methods and devices for predistortion of signals
US20170357910A1 (en) 2016-06-10 2017-12-14 Apple Inc. System for iteratively training an artificial intelligence using cloud-based metrics
CN107545889B (en) 2016-06-23 2020-10-23 华为终端有限公司 Model optimization method and device suitable for pattern recognition and terminal equipment
CN106156310A (en) 2016-06-30 2016-11-23 努比亚技术有限公司 A kind of picture processing apparatus and method
US20180005111A1 (en) * 2016-06-30 2018-01-04 International Business Machines Corporation Generalized Sigmoids and Activation Function Learning
US10372588B2 (en) 2016-07-08 2019-08-06 International Business Machines Corporation Providing debug information on production containers using debug containers
DE102016214786A1 (en) 2016-08-09 2018-02-15 Fujitsu Limited Application profiling job management system, program and method
US20180046903A1 (en) 2016-08-12 2018-02-15 DeePhi Technology Co., Ltd. Deep processing unit (dpu) for implementing an artificial neural network (ann)
CN107657316B (en) 2016-08-12 2020-04-07 北京深鉴智能科技有限公司 Design of cooperative system of general processor and neural network processor
CN106354568A (en) 2016-08-23 2017-01-25 京信通信技术(广州)有限公司 Method and device for communication between different processes
CN107797913A (en) 2016-09-07 2018-03-13 大陆汽车电子(连云港)有限公司 A kind of software analysis System and method for of real-time system
US20180075347A1 (en) * 2016-09-15 2018-03-15 Microsoft Technology Licensing, Llc Efficient training of neural networks
US11907760B2 (en) 2016-09-23 2024-02-20 Apple Inc. Systems and methods of memory allocation for neural networks
CN106650922B (en) 2016-09-29 2019-05-03 清华大学 Hardware neural network conversion method, computing device, software and hardware cooperative system
US20180096243A1 (en) 2016-09-30 2018-04-05 General Electric Company Deep learning for data driven feature representation and anomaly detection
WO2018071546A1 (en) 2016-10-11 2018-04-19 The Research Foundation For The State University Of New York System, method, and accelerator to process convolutional neural network layers
US11321609B2 (en) * 2016-10-19 2022-05-03 Samsung Electronics Co., Ltd Method and apparatus for neural network quantization
CN106485316B (en) 2016-10-31 2019-04-02 北京百度网讯科技有限公司 Neural network model compression method and device
CN106502626A (en) 2016-11-03 2017-03-15 北京百度网讯科技有限公司 Data processing method and device
US10216479B2 (en) 2016-12-06 2019-02-26 Arm Limited Apparatus and method for performing arithmetic operations to accumulate floating-point numbers
CN106815551B (en) * 2016-12-08 2019-09-10 新疆农业大学 A kind of optimization method of the variation function parameter fitting of forest inventory control
CN106600070A (en) * 2016-12-20 2017-04-26 郭建峰 Short-period share price prediction algorithm based on IPSO-BP neural network
US10997492B2 (en) 2017-01-20 2021-05-04 Nvidia Corporation Automated methods for conversions to a lower precision data format
CN108345939B (en) * 2017-01-25 2022-05-24 微软技术许可有限责任公司 Neural network based on fixed-point operation
JP7004503B2 (en) * 2017-01-27 2022-01-21 ラピスセミコンダクタ株式会社 Automatic gain control circuit (AGC), despreading circuit and method of reproducing received data
CN106951587A (en) 2017-02-15 2017-07-14 芯启源(南京)半导体科技有限公司 FPGA debugging systems and method
CN106951962B (en) 2017-03-22 2020-09-01 南京地平线机器人技术有限公司 Complex arithmetic unit, method and electronic device for neural network
US10402932B2 (en) 2017-04-17 2019-09-03 Intel Corporation Power-based and target-based graphics quality adjustment
US10332302B2 (en) 2017-04-17 2019-06-25 Intel Corporation Scatter gather engine
KR102258414B1 (en) * 2017-04-19 2021-05-28 상하이 캠브리콘 인포메이션 테크놀로지 컴퍼니 리미티드 Processing apparatus and processing method
CN108734287A (en) * 2017-04-21 2018-11-02 展讯通信(上海)有限公司 Compression method and device, terminal, the storage medium of deep neural network model
CN107025629B (en) 2017-04-27 2021-03-26 维沃移动通信有限公司 Image processing method and mobile terminal
KR102034661B1 (en) * 2017-04-28 2019-10-21 서울대학교산학협력단 Method and apparatus for data quantization for neural network
US11842280B2 (en) * 2017-05-05 2023-12-12 Nvidia Corporation Loss-scaling for deep neural network training with reduced precision
US10019668B1 (en) 2017-05-19 2018-07-10 Google Llc Scheduling neural network processing
KR102526650B1 (en) * 2017-05-25 2023-04-27 삼성전자주식회사 Method and apparatus for quantizing data in a neural network
CN115841137A (en) * 2017-06-06 2023-03-24 格兰菲智能科技有限公司 Method and computing device for fixed-point processing of data to be quantized
CN115688877A (en) * 2017-06-06 2023-02-03 格兰菲智能科技有限公司 Method and computing device for fixed-point processing of data to be quantized
US11144828B2 (en) 2017-06-09 2021-10-12 Htc Corporation Training task optimization system, training task optimization method and non-transitory computer readable medium for operating the same
US10944902B2 (en) 2017-06-20 2021-03-09 Adobe Inc. Digital image generation using capture support data
US9916531B1 (en) * 2017-06-22 2018-03-13 Intel Corporation Accumulator constrained quantization of convolutional neural networks
WO2019005088A1 (en) 2017-06-30 2019-01-03 Intel Corporation Heterogeneous multiplier
CN109214509B (en) * 2017-07-05 2021-07-06 中国科学院沈阳自动化研究所 High-speed real-time quantization structure and operation implementation method for deep neural network
CN107451654B (en) 2017-07-05 2021-05-18 深圳市自行科技有限公司 Acceleration operation method of convolutional neural network, server and storage medium
US10427306B1 (en) 2017-07-06 2019-10-01 X Development Llc Multimodal object identification
CN107729990B (en) 2017-07-20 2021-06-08 上海寒武纪信息科技有限公司 Apparatus and method for performing forward operations in support of discrete data representations
CN107451658B (en) 2017-07-24 2020-12-15 杭州菲数科技有限公司 Fixed-point method and system for floating-point operation
CN107480770B (en) * 2017-07-27 2020-07-28 中国科学院自动化研究所 Neural network quantization and compression method and device capable of adjusting quantization bit width
CN107688849B (en) 2017-07-28 2021-04-13 赛灵思电子科技(北京)有限公司 Dynamic strategy fixed-point training method and device
CN107679618B (en) * 2017-07-28 2021-06-11 赛灵思电子科技(北京)有限公司 Static strategy fixed-point training method and device
US11481218B2 (en) 2017-08-02 2022-10-25 Intel Corporation System and method enabling one-hot neural networks on a machine learning compute platform
CN109388779A (en) * 2017-08-03 2019-02-26 珠海全志科技股份有限公司 A kind of neural network weight quantization method and neural network weight quantization device
KR102601604B1 (en) 2017-08-04 2023-11-13 삼성전자주식회사 Method and apparatus for quantizing parameter of neural network
WO2019031858A1 (en) 2017-08-08 2019-02-14 Samsung Electronics Co., Ltd. Method and apparatus for determining memory requirement in a network
US20190050710A1 (en) * 2017-08-14 2019-02-14 Midea Group Co., Ltd. Adaptive bit-width reduction for neural networks
WO2019050771A1 (en) 2017-09-05 2019-03-14 Panasonic Intellectual Property Corporation Of America Execution method, execution device, learning method, learning device, and program for deep neural network
CN107644254A (en) * 2017-09-09 2018-01-30 复旦大学 A kind of convolutional neural networks weight parameter quantifies training method and system
KR20190034985A (en) * 2017-09-25 2019-04-03 삼성전자주식회사 Method and apparatus of artificial neural network quantization
EP3667487B1 (en) 2017-09-29 2023-11-15 Shanghai Cambricon Information Technology Co., Ltd Image processing apparatus and method
US11450319B2 (en) 2017-09-29 2022-09-20 Cambricon (Xi'an) Semiconductor Co., Ltd. Image processing apparatus and method
US10223114B1 (en) 2017-09-29 2019-03-05 Intel Corporation Fixed point to floating point conversion
US11437032B2 (en) 2017-09-29 2022-09-06 Shanghai Cambricon Information Technology Co., Ltd Image processing apparatus and method
US10224954B1 (en) 2017-09-29 2019-03-05 Intel Corporation Floating point to fixed point conversion
CN107679490B (en) * 2017-09-29 2019-06-28 百度在线网络技术(北京)有限公司 Method and apparatus for detection image quality
JP6540770B2 (en) 2017-10-17 2019-07-10 富士通株式会社 Arithmetic processing circuit, arithmetic processing unit including arithmetic processing circuit, information processing apparatus including arithmetic processing unit, and method
KR102564456B1 (en) * 2017-10-19 2023-08-07 삼성전자주식회사 Method and apparatus for quantizing parameter of neural network
US10410121B2 (en) 2017-10-25 2019-09-10 SparkCognition, Inc. Adjusting automated neural network generation based on evaluation of candidate neural networks
US20210061028A1 (en) 2017-10-26 2021-03-04 Applied Mechatronic Products Apparatus and method for vehicular monitoring, analysis, and control
KR20190054454A (en) * 2017-11-13 2019-05-22 삼성전자주식회사 Method and apparatus of artificial neural network quantization
US10783634B2 (en) 2017-11-22 2020-09-22 General Electric Company Systems and methods to deliver point of care alerts for radiological findings
US10803379B2 (en) 2017-12-12 2020-10-13 Amazon Technologies, Inc. Multi-memory on-chip computational network
CN108053028B (en) * 2017-12-21 2021-09-14 深圳励飞科技有限公司 Data fixed-point processing method and device, electronic equipment and computer storage medium
US11636327B2 (en) 2017-12-29 2023-04-25 Intel Corporation Machine learning sparse computation mechanism for arbitrary neural networks, arithmetic compute microarchitecture, and sparsity for training mechanism
US11373088B2 (en) 2017-12-30 2022-06-28 Intel Corporation Machine learning accelerator mechanism
CN108288089A (en) * 2018-01-29 2018-07-17 百度在线网络技术(北京)有限公司 Method and apparatus for generating convolutional neural networks
CN108229663A (en) * 2018-01-29 2018-06-29 百度在线网络技术(北京)有限公司 For generating the method and apparatus of convolutional neural networks
US20190251429A1 (en) 2018-02-12 2019-08-15 Kneron, Inc. Convolution operation device and method of scaling convolution input for convolution neural network
US11106598B2 (en) 2018-02-13 2021-08-31 Shanghai Cambricon Information Technology Co., Ltd. Computing device and method
US11630666B2 (en) 2018-02-13 2023-04-18 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11169803B2 (en) 2018-02-13 2021-11-09 Shanghai Cambricon Information Technology Co., Ltd. Computing device and method
CN116991225A (en) 2018-02-14 2023-11-03 上海寒武纪信息科技有限公司 Control device, method and equipment of processor
JP7056225B2 (en) 2018-02-26 2022-04-19 富士通株式会社 Arithmetic processing unit, information processing unit, information processing method, and program
US10628275B2 (en) 2018-03-07 2020-04-21 Nxp B.V. Runtime software-based self-test with mutual inter-core checking
US11475306B2 (en) 2018-03-22 2022-10-18 Amazon Technologies, Inc. Processing for multiple input data sets
CN108631727B (en) * 2018-03-26 2019-08-09 河北工业大学 A kind of solar panel defect identification method based on convolutional neural networks
CN108491928B (en) * 2018-03-29 2019-10-25 腾讯科技(深圳)有限公司 Model parameter sending method, device, server and storage medium
CN108509627B (en) * 2018-04-08 2021-08-31 腾讯科技(深圳)有限公司 Data discretization model training method and device and data discretization method
CN108510067B (en) 2018-04-11 2021-11-09 西安电子科技大学 Convolutional neural network quantification method based on engineering realization
US11562213B2 (en) 2018-04-17 2023-01-24 Intel Corporation Methods and arrangements to manage memory in cascaded neural networks
CN108596328B (en) * 2018-04-26 2021-02-02 北京市商汤科技开发有限公司 Fixed point method and device and computer equipment
US10691413B2 (en) 2018-05-04 2020-06-23 Microsoft Technology Licensing, Llc Block floating point computations using reduced bit-width vectors
WO2019218896A1 (en) 2018-05-18 2019-11-21 上海寒武纪信息科技有限公司 Computing method and related product
CN108717570A (en) 2018-05-23 2018-10-30 电子科技大学 A kind of impulsive neural networks parameter quantification method
CN110554500B (en) 2018-05-31 2022-09-16 中强光电股份有限公司 Head-mounted display device
US10360304B1 (en) 2018-06-04 2019-07-23 Imageous, Inc. Natural language processing interface-enabled building conditions control system
CN109062540B (en) 2018-06-06 2022-11-25 北京理工大学 Reconfigurable floating point operation device based on CORDIC algorithm
CN109063820A (en) 2018-06-07 2018-12-21 中国科学技术大学 Utilize the data processing method of time-frequency combination Recognition with Recurrent Neural Network when long
CN108830331A (en) * 2018-06-22 2018-11-16 西安交通大学 A kind of Ground Penetrating Radar object detection method based on full convolutional network
CN109102064B (en) * 2018-06-26 2020-11-13 杭州雄迈集成电路技术股份有限公司 High-precision neural network quantization compression method
CN109146057B (en) * 2018-06-26 2020-12-08 杭州雄迈集成电路技术股份有限公司 High-precision neural network engineering method based on table lookup calculation
CN110728364A (en) 2018-07-17 2020-01-24 上海寒武纪信息科技有限公司 Arithmetic device and arithmetic method
EP3798850A4 (en) 2018-06-27 2022-03-23 Shanghai Cambricon Information Technology Co., Ltd On-chip code breakpoint debugging method, on-chip processor, and chip breakpoint debugging system
CN109002889B (en) * 2018-07-03 2021-12-17 华南理工大学 Adaptive iterative convolution neural network model compression method
CN109214504B (en) * 2018-08-24 2020-09-04 北京邮电大学深圳研究院 FPGA-based YOLO network forward reasoning accelerator design method
WO2020042739A1 (en) 2018-08-28 2020-03-05 中科寒武纪科技股份有限公司 Data preprocessing method and apparatus, computer device, and storage medium
WO2020062392A1 (en) 2018-09-28 2020-04-02 上海寒武纪信息科技有限公司 Signal processing device, signal processing method and related product
CN109472353B (en) 2018-11-22 2020-11-03 浪潮集团有限公司 Convolutional neural network quantization circuit and method
CN109598331A (en) * 2018-12-04 2019-04-09 北京芯盾时代科技有限公司 A kind of fraud identification model training method, fraud recognition methods and device
CN109685202B (en) 2018-12-17 2023-03-21 腾讯科技(深圳)有限公司 Data processing method and device, storage medium and electronic device
CN109754074A (en) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 A kind of neural network quantization method, device and Related product
GB2580171B (en) * 2018-12-21 2021-02-17 Imagination Tech Ltd Methods and systems for selecting quantisation parameters for deep neural networks using back-propagation
CN111383638A (en) 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 Signal processing device, signal processing method and related product
CN109800865B (en) * 2019-01-24 2021-03-23 北京市商汤科技开发有限公司 Neural network generation and image processing method and device, platform and electronic equipment
US20190164057A1 (en) * 2019-01-30 2019-05-30 Intel Corporation Mapping and quantification of influence of neural network features for explainable artificial intelligence
CN109859135B (en) * 2019-01-31 2021-05-07 北京邮电大学 Image enhancement processing method applied to associated imaging
CN109800877B (en) 2019-02-20 2022-12-30 腾讯科技(深圳)有限公司 Parameter adjustment method, device and equipment of neural network
CN109902745A (en) 2019-03-01 2019-06-18 成都康乔电子有限责任公司 A kind of low precision training based on CNN and 8 integers quantization inference methods
CN109993296B (en) 2019-04-01 2020-12-29 安徽寒武纪信息科技有限公司 Quantitative implementation method and related product
CN110059733A (en) 2019-04-01 2019-07-26 苏州科达科技股份有限公司 The optimization and fast target detection method, device of convolutional neural networks
US11847554B2 (en) 2019-04-18 2023-12-19 Cambricon Technologies Corporation Limited Data processing method and related products
CN111832738B (en) 2019-04-18 2024-01-09 中科寒武纪科技股份有限公司 Data processing method and related product
US20200364552A1 (en) * 2019-05-13 2020-11-19 Baidu Usa Llc Quantization method of improving the model inference accuracy
US11531893B2 (en) * 2019-06-03 2022-12-20 Samsung Electronics Co., Ltd. Method and apparatus with neural network parameter quantization
US11676028B2 (en) * 2019-06-12 2023-06-13 Shanghai Cambricon Information Technology Co., Ltd Neural network quantization parameter determination method and related products
JP7146954B2 (en) 2019-08-23 2022-10-04 安徽寒武紀信息科技有限公司 DATA PROCESSING METHOD, APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
WO2021036904A1 (en) 2019-08-23 2021-03-04 安徽寒武纪信息科技有限公司 Data processing method, apparatus, computer device, and storage medium
JP7146955B2 (en) 2019-08-23 2022-10-04 安徽寒武紀信息科技有限公司 DATA PROCESSING METHOD, APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
WO2021036905A1 (en) 2019-08-27 2021-03-04 安徽寒武纪信息科技有限公司 Data processing method and apparatus, computer equipment, and storage medium
CN110780845B (en) 2019-10-17 2021-11-30 浙江大学 Configurable approximate multiplier for quantization convolutional neural network and implementation method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427990A (en) * 2016-01-20 2018-08-21 北京中科寒武纪科技有限公司 Neural computing system and method
CN108229681A (en) * 2017-12-28 2018-06-29 郑州云海信息技术有限公司 A kind of neural network model compression method, system, device and readable storage medium storing program for executing
CN109740754A (en) * 2018-12-29 2019-05-10 北京中科寒武纪科技有限公司 Neural computing device, neural computing method and Related product
CN109740739A (en) * 2018-12-29 2019-05-10 北京中科寒武纪科技有限公司 Neural computing device, neural computing method and Related product

Also Published As

Publication number Publication date
JP7167405B2 (en) 2022-11-09
CN112085183B (en) 2024-04-02
JP2021177369A (en) 2021-11-11
CN112085181B (en) 2024-03-29
EP3998554A1 (en) 2022-05-18
CN112085193A (en) 2020-12-15
CN112085186A (en) 2020-12-15
JP7166704B2 (en) 2022-11-08
KR20210011462A (en) 2021-02-01
CN112085190A (en) 2020-12-15
US11675676B2 (en) 2023-06-13
CN112085188A (en) 2020-12-15
WO2020248424A1 (en) 2020-12-17
CN112085191B (en) 2024-04-02
CN112085189A (en) 2020-12-15
US20220261634A1 (en) 2022-08-18
EP3998554A4 (en) 2023-11-15
CN112085185B (en) 2024-04-02
KR20210018352A (en) 2021-02-17
KR20210011461A (en) 2021-02-01
JP2021530769A (en) 2021-11-11
EP3770823A4 (en) 2021-01-27
CN112085184B (en) 2024-03-29
US20210286688A1 (en) 2021-09-16
CN112085193B (en) 2024-03-29
EP3772022A1 (en) 2021-02-03
CN112085183A (en) 2020-12-15
EP3770823A1 (en) 2021-01-27
CN112085185A (en) 2020-12-15
EP3772023A1 (en) 2021-02-03
WO2020248423A1 (en) 2020-12-17
KR102609719B1 (en) 2023-12-04
CN112085181A (en) 2020-12-15
CN112085192B (en) 2024-03-29
CN111652367A (en) 2020-09-11
CN111652368A (en) 2020-09-11
CN112085188B (en) 2024-04-02
CN112085189B (en) 2024-03-29
CN112085184A (en) 2020-12-15
CN112085192A (en) 2020-12-15
CN111652367B (en) 2024-04-09
CN111652368B (en) 2024-03-29
CN112400176A (en) 2021-02-23
CN112085191A (en) 2020-12-15
JP2021179966A (en) 2021-11-18
CN112085190B (en) 2024-04-02

Similar Documents

Publication Publication Date Title
CN112085186B (en) Method for determining quantization parameter of neural network and related product
US11676028B2 (en) Neural network quantization parameter determination method and related products
JP7146954B2 (en) DATA PROCESSING METHOD, APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
JP7146953B2 (en) DATA PROCESSING METHOD, APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
JP7146955B2 (en) DATA PROCESSING METHOD, APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
JP7146952B2 (en) DATA PROCESSING METHOD, APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
CN110717585B (en) Training method of neural network model, data processing method and related product
JPWO2020248424A5 (en)
CN113947177A (en) Quantization calibration method, calculation device and computer readable storage medium
CN112085176B (en) Data processing method, device, computer equipment and storage medium
US20220222041A1 (en) Method and apparatus for processing data, and related product
WO2023201424A1 (en) System and method for adaptation of containers for floating-point data for training of a machine learning model
CN112085177A (en) Data processing method, data processing device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant